Re: [racket-users] Crash when installing 'boris' package from GitHub URL on OSX 10.6.8

Neil Van Dyke Wed, 09 Dec 2015 18:33:20 -0800

David K. Storrs wrote on 12/09/2015 08:50 PM:

1) Is there a web-spidering package that people recommend?  I could use wget 
and then parse things from disk, but I'd like to have something that's easily 
composable into CLI scripts.

I've done a lot of Web crawling and scraping successfully with Racketand Scheme, over the last 14-15 years. I released an HTML parser("http://www.neilvandyke.org/racket-html-parsing/";), which I still usetoday. From that parse, you might then extract the info you need with`sxml-match`("http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim";)and/or SXPath. For HTTP, the client modules in Racket are oftensatisfactory, and other times I've used my own packages that implementHTTP in pure Racket or that wrap `curl` or `wget` for specialrequirements. For storing pages and links/metadata, there's thefilesystem, the core Racket RDBMS database support, and cloud storeslike AWS S3. The un-AJAX-ing and site-specific scraping behavior youmight have to do yourself, if you need it. (I have a backlog of relatedtools to release someday.)

P.S., Fortunately, the `sxml-match` Racket package has been preserved onthe official Racket PLaneT package server, :) since the author's Website with the package home page is down/disappeared.


Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Crash when installing 'boris' package from GitHub URL on OSX 10.6.8

Reply via email to