It sounds like the delta fetch middleware might do what you're after:
https://github.com/scrapinghub/scrapylib/blob/master/scrapylib/deltafetch.py
It avoids re-scraping pages that have produced data. So it should revisit
start urls and any internal pages.

I see there's already a '_dont_cache' request meta. You could try that as a
work around. Agree there should be a proper 'dont_cache' meta for these
cases.



On 31 December 2013 15:17, Pablo Hoffman <[email protected]> wrote:

> We should add a dont_cache request meta.
>
>
> On Thu, Oct 31, 2013 at 8:18 AM, Alvaro Moe <[email protected]>wrote:
>
>> Hi list,
>>
>> I want to to avoid caching the start_urls, but not the inner pages. Is
>> this possible?
>>
>> The use case: I'm scraping articles from a news website, I assume
>> articles don't change, but the home page is my source of new articles. So I
>> need to run the scraper regularly, hit the start_urls, get all the fresh
>> links and ignore the old ones.
>>
>> How would you go about this?
>>
>> Thanks in advance!!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to