Am 19.01.2013 um 09:45 schrieb Stéphane Ducasse <[email protected]>:

> Norbert 
> 
> soup produces a limited ast containing all the information and not really a 
> ful and nice AST for html
> 
Thanks! If something like an AST is emitted that will help a lot. I don't need 
a full HTML AST right now (well, at the moment I think I don't). But pharo will 
need one in the mid-term, right?

thanks,

Norbert
> 
>> 
>> Am 17.01.2013 um 23:38 schrieb Sean P. DeNigris <[email protected]>:
>> 
>>> fstephany wrote
>>>> http://www.squeaksource.com/Soup.html
>>> 
>>> Def works in 1.4... Soup is a must if you may have to deal with ill-formed
>>> HTML (i.e. web scraping in general) because it's the only library I know of
>>> that handles it robustly. I've used it a lot and it's pretty
>>> straightforward.
>> Ok, thanks for the update. I'm not sure handling ill-formedness is a major 
>> requirement but it is good to have. Do you know if HTML5 would be handled as 
>> ill-formedness? 
>> Apart from that I'm interested if kind of a document model is emitted or 
>> what it does. Well, I'll have a look. 
>> 
>> thanks,
>> 
>> Norbert
> 
> 

Reply via email to