I have a screen scraping library called Scraper that just needs a
little cleanup and release.. I just haven't found a chance to do it
yet. I'm not too familiar with WWW::Mechanize, but what I remember is
the usage wasn't very clean
here's what Scraper looks like
# a session tracks cookies so you can login to web based
applications that don't use Basic
s = Scraper::Session.new
# use the session to grab a url
r = s.get "http://www.google.com/"
# fill out a form
form = r.form
form[:q] = "ruby scraper"
# submit the form and get the resulting page
results = form.submit
# browse the web by following a couple links, and show the
resulting page
puts results.links.last.follow.links.last.follow.html
Anyone interested?
On Oct 6, 2005, at 9:36 AM, John Labovitz wrote:
Also, if anyone knows of alternatives to Waitr for cross-platform
browsers, let me know.
It's not exactly the same thing, but WWW::Mechanize is pretty good
for scraping and controlling websites. I've had a bunch of
experience using it to implement front-ends and data-suckers for a
client.
Unfortunately, WWW::Mechanize doesn't have very good docs or
support. It's one of those back-burner ideas of mine to write up a
how-to article.
You can get it as a gem -- "mechanize".
The homepage is currently busted, but is usually at http://
www.ntecs.de/blog/Blog/WWW-Mechanize.rdoc
--John
_______________________________________________
PDXRuby mailing list
[email protected]
IRC: #pdx.rb on irc.freenode.net
http://lists.pdxruby.org/mailman/listinfo/pdxruby
_______________________________________________
PDXRuby mailing list
[email protected]
IRC: #pdx.rb on irc.freenode.net
http://lists.pdxruby.org/mailman/listinfo/pdxruby