Stef and other wrote this book a while ago:

http://books.pharo.org/booklet-Scraping/html/scrapingbook.html

Basically XMLHtmlParser + XPath

To me, far better than using Soup. 
Google chrome pharo integration helps top to scrap complex full JS web site 
like google ;)


Cheers,

Cedrick 

> Le 29 nov. 2019 à 15:41, Esteban Maringolo <emaring...@gmail.com> a écrit :
> 
> Thank you Torsten,
> 
> I wasn't aware of this tool, I'm already using it to scrap content
> from a website and fed a Pharo driven system :)
> 
> The XML integration in the Inspector is great too.
> 
> Regards!
> 
> Esteban A. Maringolo
> 
>> On Tue, Nov 19, 2019 at 8:40 AM Torsten Bergmann <asta...@gmx.de> wrote:
>> 
>> Hi,
>> 
>> the STHub -> PharoExtras project "XMLParserHTML"
>> 
>> was now moved from http://smalltalkhub.com/#!/~PharoExtras/XMLParserHTML to
>> https://github.com/pharo-contributions/XML-XMLParserHTML including the FULL 
>> HISTORY
>> 
>> The old STHub repo was marked as obsolete - but is linking to the new one. 
>> I've also
>> setup an CI job:  https://travis-ci.org/pharo-contributions/XML-XMLParserHTML
>> which is green for Pharo 7. Some cleanups, class comments and docu was 
>> applied as you can
>> see from commit history.
>> 
>> The new version is tagged in git as version 1.6.0 (with a moveable tag 1.6.x 
>> in case further
>> hotfixes are required).
>> 
>> You can load using
>> 
>>   Metacello new
>>        baseline: 'XMLParserHTML';
>>        repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
>>        load.
>> 
>> or from catalog in Pharo 7 or 8.
>> 
>> Attached is current dependency graph.
>> 
>> More to come soon ...
>> 
>> Bye
>> T.
> 

Reply via email to