Stef and other wrote this book a while ago: http://books.pharo.org/booklet-Scraping/html/scrapingbook.html
Basically XMLHtmlParser + XPath To me, far better than using Soup. Google chrome pharo integration helps top to scrap complex full JS web site like google ;) Cheers, Cedrick > Le 29 nov. 2019 à 15:41, Esteban Maringolo <emaring...@gmail.com> a écrit : > > Thank you Torsten, > > I wasn't aware of this tool, I'm already using it to scrap content > from a website and fed a Pharo driven system :) > > The XML integration in the Inspector is great too. > > Regards! > > Esteban A. Maringolo > >> On Tue, Nov 19, 2019 at 8:40 AM Torsten Bergmann <asta...@gmx.de> wrote: >> >> Hi, >> >> the STHub -> PharoExtras project "XMLParserHTML" >> >> was now moved from http://smalltalkhub.com/#!/~PharoExtras/XMLParserHTML to >> https://github.com/pharo-contributions/XML-XMLParserHTML including the FULL >> HISTORY >> >> The old STHub repo was marked as obsolete - but is linking to the new one. >> I've also >> setup an CI job: https://travis-ci.org/pharo-contributions/XML-XMLParserHTML >> which is green for Pharo 7. Some cleanups, class comments and docu was >> applied as you can >> see from commit history. >> >> The new version is tagged in git as version 1.6.0 (with a moveable tag 1.6.x >> in case further >> hotfixes are required). >> >> You can load using >> >> Metacello new >> baseline: 'XMLParserHTML'; >> repository: 'github://pharo-contributions/XML-XMLParserHTML/src'; >> load. >> >> or from catalog in Pharo 7 or 8. >> >> Attached is current dependency graph. >> >> More to come soon ... >> >> Bye >> T. >