>From HERE <https://github.com/scrapy/scrapy/blob/master/scrapy/core/engine.py#L121> I found that Scrapy engine fetch requests from scheduler before the start_urls generated ones.
In my usage, I enqueued thousands of start urls (which from various domains) to the queue and the crawling goes not so fast (maybe networking issues). The problems comes up with me was that the spider itself extracts links and follows them, then Scrapy will fetch the requests from scheduler. It makes the concurrency lower. I would like to learn about the design purpose of this mechanism. BRs. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
