Norbert soup produces a limited ast containing all the information and not really a ful and nice AST for html
Stef > > Am 17.01.2013 um 23:38 schrieb Sean P. DeNigris <[email protected]>: > >> fstephany wrote >>> http://www.squeaksource.com/Soup.html >> >> Def works in 1.4... Soup is a must if you may have to deal with ill-formed >> HTML (i.e. web scraping in general) because it's the only library I know of >> that handles it robustly. I've used it a lot and it's pretty >> straightforward. >> > Ok, thanks for the update. I'm not sure handling ill-formedness is a major > requirement but it is good to have. Do you know if HTML5 would be handled as > ill-formedness? > Apart from that I'm interested if kind of a document model is emitted or what > it does. Well, I'll have a look. > > thanks, > > Norbert > >
