Hi Joseph,

Please take a look at the commercial support page:
http://scrapy.org/support/
there are several companies that could potentially help out. I work for
Scrapinghub, and I can tell you that we have done several projects with
similar scraping requirements to yours.

As you realized, scraping the ecommerce data from a lot of sites can be
time consuming. The common approach with Scrapy is to build a spider per
website, as the developer you spoke to pointed out. Like Randall said, it's
good practice to reuse code, and there's certainly no silver bullet! Other
options are to use 'crawl by example' (e.g. our
autoscraping<http://scrapinghub.com/autoscraping>tool) to more quickly
build an extractor for each website, or take some
machine learning approach to build something capable of extracting your
data from many websites. There are different trade-offs with each approach
- cost, development time, accuracy, coverage, crawl efficiency, complexity
of schema extracted etc. and you can sometimes mix approaches.

Hope that helped,

Shane





On 11 March 2014 19:54, Joseph Piscal <[email protected]> wrote:

> Greetings Scrapy Enthusiasts,
>
> I have been searching for a reliable and experienced developer to develop
> a web crawler/data scraper for some time. I work for a marketing company
> and have a list of 200+ consumer product URLS (blogs, e-commerce stores,
> home shopping networks, & big sites like amazon/pintrest) that I would be
> interested in scraping. Information I desire is product image, product
> price, product description, link to purchase, etc. I would like the
> information presented to me in a web RSS feed type format (attached) for
> ease of sifting through products quickly. After the base program is built I
> would also be interested in making it more intelligent. Utilizing a keyword
> or weighting system to filter out products I don't want to see.
>
> I spoke to one developer who claimed scraping blogs would be easy since
> they mostly run on Wordpress, or at least provide an RSS feed so monitoring
> would be simple. He also mentioned that it would be difficult to do the
> e-commerce sites because almost all of them utilize a different platform.
> So we would pretty much require custom code for every single site. Not sure
> how true this is but it made sense.
>
> The goal is for me to see new data daily. After a URL has been scraped the
> first time, I would assume all the products (data) would be stored in a
> data base. So the next time that URL is scraped it only collects the NEW
> products (data) for my viewing pleasure.
>
> I am looking for any and all advice or suggestions on this project. If you
> are interested please feel free to respond to this post or contact me
> directly:
>
> [email protected]
>
> Thank you in advance!
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to