On Jan 19, 2013, at 11:59 AM, Norbert Hartl wrote:

> 
> 
> Am 19.01.2013 um 09:45 schrieb Stéphane Ducasse <[email protected]>:
> 
>> Norbert 
>> 
>> soup produces a limited ast containing all the information and not really a 
>> ful and nice AST for html
>> 
> Thanks! If something like an AST is emitted that will help a lot. I don't 
> need a full HTML AST right now (well, at the moment I think I don't).

we soup you get a ast and a query system I used it to scrap magic cards :)

> But pharo will need one in the mid-term, right?

yes it would be good.

> 
> thanks,
> 
> Norbert
>> 
>>> 
>>> Am 17.01.2013 um 23:38 schrieb Sean P. DeNigris <[email protected]>:
>>> 
>>>> fstephany wrote
>>>>> http://www.squeaksource.com/Soup.html
>>>> 
>>>> Def works in 1.4... Soup is a must if you may have to deal with ill-formed
>>>> HTML (i.e. web scraping in general) because it's the only library I know of
>>>> that handles it robustly. I've used it a lot and it's pretty
>>>> straightforward.
>>> Ok, thanks for the update. I'm not sure handling ill-formedness is a major 
>>> requirement but it is good to have. Do you know if HTML5 would be handled 
>>> as ill-formedness? 
>>> Apart from that I'm interested if kind of a document model is emitted or 
>>> what it does. Well, I'll have a look. 
>>> 
>>> thanks,
>>> 
>>> Norbert
>> 
>> 
> 


Reply via email to