You could break it into separate spiders. One to discover all the media urls, push those into a queue. Have another simultaneously running spider (or one triggered at the end) that then downloads your media.
On Fri, Apr 22, 2016 at 8:55 AM, Antoine Brunel <[email protected]> wrote: > Hello (awesome) scrapy community, > > According to scrapy media pipeline docs ( > http://doc.scrapy.org/en/latest/topics/media-pipeline.html), after one > url is scraped, all of its media files are downloaded, with a higher > priority so that no other url is scraped before all media files were > downloaded. > >> - When the item reaches the FilesPipeline, the URLs in the file_urls >> field are scheduled for download using the standard Scrapy scheduler and >> downloader (which means the scheduler and downloader middlewares are >> reused), but with a higher priority, processing them before other pages are >> scraped. The item remains “locked” at that particular pipeline stage until >> the files have finish downloading (or fail for some reason). > > > I just want to do the exact opposite: Scrape all urls first, then, > download all media files at once. > How could I do that? > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
