are you getting 100+ fetch/parse operations per sec? or just the fetch side..
On Wed, Mar 9, 2016 at 5:11 PM, Dimitris Kouzis - Loukas <[email protected]> wrote: > Thanks a lot @Tsouras :) indeed it might be useful here! > > Here are some quick tips... 178 requests/second isn't bad at all. Yes, it > might well be CPU. Indeed GIL is not a problem because it's single threaded > BUT the fact that it's single-threaded might be the problem :) Your machine > likely has another e.g. 3 cores you're not using. Try running in parallel 4 > Scrapy's with 1/4th of the problem. Then you will use more cores. Does it > finish faster? > > Regarding AWS and bandwidth, it doesn't say much. The latency the remotes > give you says many more. > > Measure the average latency of your target pages (e.g. by using `time > curl` with a few URLs) . If it's e.g. 0.2s multiply it with 1600. That > means that your job is "worth" 320 seconds. > > If you set : > > CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_IP = 10 > > you would the job done in 320/10 = 32 sec. Obviously you've already hacked > those values since you're getting something much higher (or your avg. > latency is smaller than 0.2s) > > If you set: > > CONCURRENT_REQUESTS = CONCURRENT_REQUESTS_PER_IP = 100 > > you should get your answer in more or less 3.2 seconds + some startup time. > > Try intermediate values... where does reality start to diverge with the > ideal? > > > > On Saturday, March 5, 2016 at 9:06:53 AM UTC, Tsouras wrote: >> >> Maybe the book of Dimitrios Kouzis-Loukas >> https://www.packtpub.com/big-data-and-business-intelligence/learning-scrapy >> will help you. It has a chapter about performance. >> >> >> On Thursday, March 3, 2016 at 10:13:01 AM UTC+2, Berkant AYDIN wrote: >>> >>> Hi everyone, >>> >>> I have to do realtime scraping. I try optimization options on >>> documentation but still slowly. 1600 page crawling only 9 seconds. Yea its >>> very speedy but still not enough. 860 mb/s AWS machine. How can increase >>> performance ? I have to use distributed options ? If yes, which one ? It's >>> a GIL problem ? I have to continue with PyPy ? >>> >>> -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
