You could break it into separate spiders.  One to discover all the media
urls, push those into a queue.  Have another simultaneously running spider
(or one triggered at the end) that then downloads your media.

On Fri, Apr 22, 2016 at 8:55 AM, Antoine Brunel <[email protected]>
wrote:

> Hello (awesome) scrapy community,
>
> According to scrapy media pipeline docs (
> http://doc.scrapy.org/en/latest/topics/media-pipeline.html), after one
> url is scraped, all of its media files are downloaded, with a higher
> priority so that no other url is scraped before all media files were
> downloaded.
>
>> - When the item reaches the FilesPipeline, the URLs in the file_urls
>> field are scheduled for download using the standard Scrapy scheduler and
>> downloader (which means the scheduler and downloader middlewares are
>> reused), but with a higher priority, processing them before other pages are
>> scraped. The item remains “locked” at that particular pipeline stage until
>> the files have finish downloading (or fail for some reason).
>
>
> I just want to do the exact opposite: Scrape all urls first, then,
> download all media files at once.
> How could I do that?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to