If they are from different domains override start_requests and use meta['download_slot'] = <some_name>
El martes, 7 de octubre de 2014 18:17:11 UTC-2, [email protected] escribió: > > It look like Scrapy just run all start_urls at the same time. How do I > tell scrapy to start with url1 , wait 30s, then fetch url2 > > Here is my setting: > > AUTOTHROTTLE_ENABLED = True > AUTOTHROTTLE_DEBUG = True > > DOWNLOAD_DELAY = 60 > DOWNLOAD_TIMEOUT = 30 > CONCURRENT_REQUESTS_PER_DOMAIN = 1 > AUTOTHROTTLE_START_DELAY = 10 > > > And this is spider > > start_urls = [ > "url1", > "url2", > "url3", > "url4", > "url5", > ] > > > Here is the log: > > 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET > url1> (referer: None) > 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET > url2> (referer: None) > 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET > url3> (referer: None) > 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET > url4> (referer: None) > 2014-10-07 14:04:53-0600 [craigslist_spider] DEBUG: Crawled (200) <GET > url5> (referer: None) > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
