Re: How to supply spider with a list of paths to ignore?

Italo Maia Tue, 03 Mar 2015 01:14:56 -0800

Hello Morad, thanks for your answer.

My deny list would a little to big to handle if I did that. Something 
around 300.000 records to add. Memory would probably go down on it's knees 
too. 
Looking at this group's history, there are some suggestions regarding the 
duplicate filter. I'll try that first. Maybe preloading fingerprints from 
the database.


Em quinta-feira, 26 de fevereiro de 2015 16:27:45 UTC-3, Italo Maia 
escreveu:
>
> I have a few spiders here that scrape quite a lot of links. I now that 
> scrapy uses by default a "fingerprint" approach to avoid visiting the same 
> URL more than once. Is there a way for me to supply a previously harvest 
> list of fingerprints/urls to it in order to speed up scraping?
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to supply spider with a list of paths to ignore?

Reply via email to