Here are my, off the top of my head suggestions:

Different thor scripts for each website, perhaps a single script to
call the rest of them.

I did something similar for scraping shopping cart information. Since
I needed the same data on every page I wrote a generic crawler which
would read the XPath string from the database for each item I wanted
to scrape. Worked well.





On Jul 12, 5:02 am, aupayo <[email protected]> wrote:
> Hi,
>
> I want to screen scrape information from some websites (I have
> permission to do it).
>
> I am using the Mechanize plugin. The websites are different from each
> other, so I need to write a different RoR code to screen scrape each
> website. There would be hundreds of different websites.
>
> Ok, the problem is that I don't know how to implement this in an
> elegant and efficient way. My current quick and dirty solution is a
> model that I call when I want to screen scrape a website:
>
> I call it like: Spider.crawl(website_id)
>
> It looks like:
>
> class Spider < ActiveRecord::Base
>
>   require 'mechanize'
>
>   def crawl(website_id)
>
>           if(website_id == 1)
>                  //Mechanize code for screen scraping website 1
>           end
>
>           if(website_id == 2)
>                  //Mechanize code for screen scraping website 2
>           end
>
>            .....
>
>    end
>
> end
>
> How can I improve that?
> Is there at least a way to put the code for each website in an
> external file, so then I can call just the code I need? That way I
> would avoid working with a model that has thousands of lines...
>
> Thanks for your help!

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to