What *is* happening? Underneath, callFromThread is basically just setting a flag and writing to a file descriptor or some similar thing to wake the reactor from its polling sleep. Even at very high load, the reactor should be multiplexing reads from that file descriptor (which can act as a form of batching) with actual scraping.
Dustin On Sun, Dec 21, 2014 at 6:47 AM, Adi Lavi <adi.l...@cortica.com> wrote: > Hi, > I am using Pika's asynchronous consumer implementation with Scrapy and > Twisted. I have twisted reactor running on the main thread, and Rabbit > consumer running on a background thread. When I get a message and want to > start my spider, I use 'callFromThread' to wake the reactor thread, init the > spider and start crawling. > > Alas, on high load of Q messages, I find that because 'callFromThread' is > called all the time, Scrapy does not start downloading until there is some > 'break' in these calls. > > I am wondering what is the best approach to gain high scale with Scrapy, > Twisted and RabbitMQ. Should I continue using the current design, and simply > do some buffering or batching to reduce the 'callFromThread' frequency? > Perhaps I should use a synchronous design? > > Thanks > > _______________________________________________ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python