There is actually a simpler way to simulate this kind of blocking

class ThreadBlocker(object):
    """
    runInteraction starts a new thread for each interaction, and
    you can simulate a limited number of connection pools using this
    class and REACTOR_THREADPOOL_SIZE = 1 say
    """
    def __init__(self):
        self.icnt = 0

    def blocker(self, item, delay):
        log.error('begin blocking for: %s' % delay)
        if self.icnt == 0:
            # Big initial delay
            BOOM = delay
        else:
            # Small delay for next items
            BOOM = 2
        while BOOM > 0:
            time.sleep(2)
            log.error('BOOM:%i' % BOOM)
            BOOM += -1
        self.icnt += 1
        return True

    def finished(result):
        log.error('success finished: %s' % result)

    def process_item(self, item, spider):
        d1 = threads.deferToThread(self.blocker, item, 50)
        d1.addCallback(self.finished)
        d1.addBoth(lambda _: item)
        return d1


Use this as the pipeline, and set REACTOR_THREADPOOL_MAXSIZE = 1 (or 
whatever db conn pool size you normally have). I actually see results 
independent of my spiders, so long as each has enough items to scrape. 

Scrapy seems to behave in the following way: the work between processing 
new requests to get new responses and items is interleaved with the 
blocking pipeline up until a point. When there are around 30 or 40 
items/htmlresponses, it stops processing new requests and attention is 
entirely on the blocking pipeline. This seems to continue until the items 
get processed, at which point twisted will seem to start diving time 
between processing new requests to get new responses/items, and blocking 
pipelines.

Does this seem correct?

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to