Hello Morad, thanks for your answer. My deny list would a little to big to handle if I did that. Something around 300.000 records to add. Memory would probably go down on it's knees too. Looking at this group's history, there are some suggestions regarding the duplicate filter. I'll try that first. Maybe preloading fingerprints from the database.
Em quinta-feira, 26 de fevereiro de 2015 16:27:45 UTC-3, Italo Maia escreveu: > > I have a few spiders here that scrape quite a lot of links. I now that > scrapy uses by default a "fingerprint" approach to avoid visiting the same > URL more than once. Is there a way for me to supply a previously harvest > list of fingerprints/urls to it in order to speed up scraping? > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
