Hello, This might be useful to those who store thumbnails/images on S3. I noticed that the current implementation of S3FilesStore is issuing blocking boto requests to check if images are existing in the S3 bucket. This check is done in a Twisted thread pool to avoid blocking the main thread but that pool is capped to something like 20.
As a quick experiment I replaced boto by a Twisted library (for reads only) and I noticed an immediate 2x throughput increase in the same crawl. The code is available here https://github.com/Curbside/scrapy/commit/2b544df2bfb347de9963fed4f3546da19ca3cc8f The txaws library is somewhat outdated and I couldn't make the upload part working. If someone is interested in updating it/making it work natively or has an alternative implementation I'd be interested to learn about it. Cheers Denis -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
