There is actually a simpler way to simulate this kind of blocking class ThreadBlocker(object): """ runInteraction starts a new thread for each interaction, and you can simulate a limited number of connection pools using this class and REACTOR_THREADPOOL_SIZE = 1 say """ def __init__(self): self.icnt = 0
def blocker(self, item, delay): log.error('begin blocking for: %s' % delay) if self.icnt == 0: # Big initial delay BOOM = delay else: # Small delay for next items BOOM = 2 while BOOM > 0: time.sleep(2) log.error('BOOM:%i' % BOOM) BOOM += -1 self.icnt += 1 return True def finished(result): log.error('success finished: %s' % result) def process_item(self, item, spider): d1 = threads.deferToThread(self.blocker, item, 50) d1.addCallback(self.finished) d1.addBoth(lambda _: item) return d1 Use this as the pipeline, and set REACTOR_THREADPOOL_MAXSIZE = 1 (or whatever db conn pool size you normally have). I actually see results independent of my spiders, so long as each has enough items to scrape. Scrapy seems to behave in the following way: the work between processing new requests to get new responses and items is interleaved with the blocking pipeline up until a point. When there are around 30 or 40 items/htmlresponses, it stops processing new requests and attention is entirely on the blocking pipeline. This seems to continue until the items get processed, at which point twisted will seem to start diving time between processing new requests to get new responses/items, and blocking pipelines. Does this seem correct? -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.