Hello, 

I scrape sites and now I want to drop the scraped data if there's 'no 
update'. 

Fortunately I have a unique ID per scraped data record so I could use this 
ID field to compare if the data has changed or not. 

I run the scrapy with scrapy crawl in crontabs so every time I scrape I 
startup a new instance, meaning if I would hold the scraped data in memory 
using python code that wouldn't work. 

I don't think this is possible with item pipelines? A solution is just that 
I post everything in a database and then use the item pipelines to check 
the database using the unique ID and compare the data if it's new or not, 
and drop the scraped data if it is the same. 

Thanks for the help, 

Cheers

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to