Scrapy confusion about Item Pipeline and Middlewares

Mrudul Tarwatkar Thu, 26 Dec 2013 05:09:47 -0800

Are *Downloader Middleware processed before the downloader? Before the url 
is scrapped?*


Are *Pipelines processed after the url is crawled (downloaded) and the 
spider items are set?*

Now, Let's say* I store the fingerprint of every response in an visit_id 
item* using the request_fingerprint in scrapy.

So If I want to write a *downloader middleware which avoids visiting of 
already visited url's in subsequent runs of a spider* , how would it be?

I don't want to write a pipeline because that would cause me to query into 
the database and that too after the page has been crawled. I want to first 
check if the url is already been crawled and then save it in items or db, 
most importantly in subsequent runs of spider (not in a single run,which I 
have already achieved).


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Scrapy confusion about Item Pipeline and Middlewares

Reply via email to