Are *Downloader Middleware processed before the downloader? Before the url is scrapped?*
Are *Pipelines processed after the url is crawled (downloaded) and the spider items are set?* Now, Let's say* I store the fingerprint of every response in an visit_id item* using the request_fingerprint in scrapy. So If I want to write a *downloader middleware which avoids visiting of already visited url's in subsequent runs of a spider* , how would it be? I don't want to write a pipeline because that would cause me to query into the database and that too after the page has been crawled. I want to first check if the url is already been crawled and then save it in items or db, most importantly in subsequent runs of spider (not in a single run,which I have already achieved). -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
