On Jan 19, 2013, at 11:59 AM, Norbert Hartl wrote: > > > Am 19.01.2013 um 09:45 schrieb Stéphane Ducasse <[email protected]>: > >> Norbert >> >> soup produces a limited ast containing all the information and not really a >> ful and nice AST for html >> > Thanks! If something like an AST is emitted that will help a lot. I don't > need a full HTML AST right now (well, at the moment I think I don't).
we soup you get a ast and a query system I used it to scrap magic cards :) > But pharo will need one in the mid-term, right? yes it would be good. > > thanks, > > Norbert >> >>> >>> Am 17.01.2013 um 23:38 schrieb Sean P. DeNigris <[email protected]>: >>> >>>> fstephany wrote >>>>> http://www.squeaksource.com/Soup.html >>>> >>>> Def works in 1.4... Soup is a must if you may have to deal with ill-formed >>>> HTML (i.e. web scraping in general) because it's the only library I know of >>>> that handles it robustly. I've used it a lot and it's pretty >>>> straightforward. >>> Ok, thanks for the update. I'm not sure handling ill-formedness is a major >>> requirement but it is good to have. Do you know if HTML5 would be handled >>> as ill-formedness? >>> Apart from that I'm interested if kind of a document model is emitted or >>> what it does. Well, I'll have a look. >>> >>> thanks, >>> >>> Norbert >> >> >
