It sounds like the delta fetch middleware might do what you're after: https://github.com/scrapinghub/scrapylib/blob/master/scrapylib/deltafetch.py It avoids re-scraping pages that have produced data. So it should revisit start urls and any internal pages.
I see there's already a '_dont_cache' request meta. You could try that as a work around. Agree there should be a proper 'dont_cache' meta for these cases. On 31 December 2013 15:17, Pablo Hoffman <[email protected]> wrote: > We should add a dont_cache request meta. > > > On Thu, Oct 31, 2013 at 8:18 AM, Alvaro Moe <[email protected]>wrote: > >> Hi list, >> >> I want to to avoid caching the start_urls, but not the inner pages. Is >> this possible? >> >> The use case: I'm scraping articles from a news website, I assume >> articles don't change, but the home page is my source of new articles. So I >> need to run the scraper regularly, hit the start_urls, get all the fresh >> links and ignore the old ones. >> >> How would you go about this? >> >> Thanks in advance!! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
