Here are my, off the top of my head suggestions: Different thor scripts for each website, perhaps a single script to call the rest of them.
I did something similar for scraping shopping cart information. Since I needed the same data on every page I wrote a generic crawler which would read the XPath string from the database for each item I wanted to scrape. Worked well. On Jul 12, 5:02 am, aupayo <[email protected]> wrote: > Hi, > > I want to screen scrape information from some websites (I have > permission to do it). > > I am using the Mechanize plugin. The websites are different from each > other, so I need to write a different RoR code to screen scrape each > website. There would be hundreds of different websites. > > Ok, the problem is that I don't know how to implement this in an > elegant and efficient way. My current quick and dirty solution is a > model that I call when I want to screen scrape a website: > > I call it like: Spider.crawl(website_id) > > It looks like: > > class Spider < ActiveRecord::Base > > require 'mechanize' > > def crawl(website_id) > > if(website_id == 1) > //Mechanize code for screen scraping website 1 > end > > if(website_id == 2) > //Mechanize code for screen scraping website 2 > end > > ..... > > end > > end > > How can I improve that? > Is there at least a way to put the code for each website in an > external file, so then I can call just the code I need? That way I > would avoid working with a model that has thousands of lines... > > Thanks for your help! -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

