Running scrapy in scale

adi . lavi Wed, 03 Dec 2014 04:02:26 -0800

Hi,
I am building a back-end which one of its modules needs to do web scraping 
of various sites. The URL is originated by an end user, therefore the 
domain is known before-hand, but the full URL is dynamic.


The back-end is planned to support thousands of requests per second.
I like what I see for scrapy regarding feature coverage, extensibility, 
ease of use and more, but I am concerned of those 2 points:

1. Passing the URL  in real-time as an argument to scrapy, where only the 
domain (therefore, the specific spider) is known
2. I've read that in order to invoke scrapy via API one should use scrapyd 
with json API, which invokes a process per scraping. It means that a 
process per request runs, and this is not scalable (imagine each request 
takes 1.5 second).

Please advise,

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Running scrapy in scale

Reply via email to