I think you'll find you need to know _something_ about the page layout.  If
there are a finite number of places you need to scrape from you could do
this pretty simply.

Assume you had a css selector to find the desired content in each URL of
interest, and it was stored in an active record (ish) model.

# ...
# lookup the selector
@selector = Selector.find_by_url @the_url_to_scrape

doc = Nokogiri::HTML(open(@the_url_to_scrape))

# Search for nodes by css
doc.css(@selector).each do |link|
  puts link.content 
end
#...


I did a write up on simple scraping with nokogiri and selectorgadget here:
http://joemcglynn.wordpress.com/2009/12/10/five-minute-introduction-to-nokog
iri/


--

You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.


Reply via email to