Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Cédrick Béler Sat, 30 Nov 2019 13:28:59 -0800

> cedreek wrote
>> To me, far better than using Soup. 
> 
> Ah, interesting! I use Soup almost exclusively. What did you find superior
> about XMLParserHTML? I may give it a try...
>


It’s mainly xpath which I find easier than navigating the html tree with soup 
or even The xmlHtmlparser. 

I usually copy the xpath form a web inspector. I have to tweak it a bit though.

> 
> cedreek wrote
>> Google chrome pharo integration helps top to scrap complex full JS web
>> site like google ;)
> 
> Also interesting! Any publicly available examples? How does one load "Google
> chrome pharo integration"? Also, there is often the "poor man's" way (albeit
> requiring manual intervention) by inspecting the Ajax http requests in a
> developer console and then recreating directly in Pharo.
> 

I just tried it once. 

There is a google chrome plugin that allows to use chrome headless to get the 
fully loaded html page. 

I need to try it again. A simple example I’d like to do is to scrap google and 
remove advertised content ^^

This is btw Torsten package:

https://github.com/astares/Pharo-Chrome

Happy scrapping ;-)

And thx Torsten for all ^^

Cedrick 

> 
> 
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Reply via email to