are you getting 100+ fetch/parse operations per sec? or just the fetch
side..



On Wed, Mar 9, 2016 at 5:11 PM, Dimitris Kouzis - Loukas <[email protected]>
wrote:

> Thanks a lot @Tsouras :) indeed it might be useful here!
>
> Here are some quick tips... 178 requests/second isn't bad at all. Yes, it
> might well be CPU. Indeed GIL is not a problem because it's single threaded
> BUT the fact that it's single-threaded might be the problem :) Your machine
> likely has another e.g. 3 cores you're not using. Try running in parallel 4
> Scrapy's with 1/4th of the problem. Then you will use more cores. Does it
> finish faster?
>
> Regarding AWS and bandwidth, it doesn't say much. The latency the remotes
> give you says many more.
>
> Measure the average latency of your target pages (e.g. by using `time
> curl` with a few URLs) . If it's e.g. 0.2s multiply it with 1600. That
> means that your job is "worth" 320 seconds.
>
> If you set :
>
> CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_IP = 10
>
> you would the job done in 320/10 = 32 sec. Obviously you've already hacked
> those values since you're getting something much higher (or your avg.
> latency is smaller than 0.2s)
>
> If you set:
>
> CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_IP = 100
>
> you should get your answer in more or less 3.2 seconds + some startup time.
>
> Try intermediate values... where does reality start to diverge with the
> ideal?
>
>
>
> On Saturday, March 5, 2016 at 9:06:53 AM UTC, Tsouras wrote:
>>
>> Maybe the book of Dimitrios Kouzis-Loukas
>> https://www.packtpub.com/big-data-and-business-intelligence/learning-scrapy
>> will help you. It has a chapter about performance.
>>
>>
>> On Thursday, March 3, 2016 at 10:13:01 AM UTC+2, Berkant AYDIN wrote:
>>>
>>> Hi everyone,
>>>
>>> I have to do realtime scraping. I try optimization options on
>>> documentation but still slowly. 1600 page crawling only 9 seconds. Yea its
>>> very speedy but still not enough. 860 mb/s AWS machine. How can increase
>>> performance ? I have to use distributed options ? If yes, which one ? It's
>>> a GIL problem ? I have to continue with PyPy ?
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to