Do pipelines block Scrapy from crawling?

Lee H. Sat, 29 Aug 2015 08:49:07 -0700

If I have a *really *slow pipeline, like let's say I'm writing out items to 
a database on a remote server that is really slow, what would happen? Would 
the items just stack up in memory until finally they are processed (meaning 
my only problem might be memory) or would Scrapy crawling of pages halt 
because of this too?


I'm thinking that when an item is passed to `process_item` method of the 
pipeline, Scrapy just carries on to the next request regardless of what 
happens in the pipeline? 

I'm using a MS-SQL writer pipeline based on dirbot-mysql 
<https://github.com/darkrho/dirbot-mysql>, but adapted to MS-SQL. I'm 
trying to understand the real advantages of using twisted adbapi though. I 
understand it will speed up the writing of items to the db: since 
asynchronously  it will switch between connections in the pool; if a 
connection starts blocking, jump to another that isn't blocking. But this 
is just the writing of items phase right? If the pipeline isn't blocking 
the crawl then so what? 


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Do pipelines block Scrapy from crawling?

Reply via email to