Re: How to identify Cache location and delete it ?

Paul Tremberth Tue, 28 Feb 2017 03:42:43 -0800

>From your crawl startup logs, there's 'HTTPCACHE_DIR': 'httpcache33' in
your settings.
Does that match the expected location of your HTTP cache?


On Tue, Feb 28, 2017 at 12:35 PM, <cristimoc...@gmail.com> wrote:

> Hello Paul.
> Thank you very much for trying to help. For the moment I am just setting
> Cache Timeout to 24 hours to get over this problem but I would very much
> prefer to be able to just completelly remove old cache.
>
> Here are my logs with LOG_LEVEL='DEBUG'
>
> Start log
>
> 2017-02-28 13:21:20 [scrapy] INFO: Scrapy 1.2.1 started (bot: testSpider)
> 2017-02-28 13:21:20 [scrapy] INFO: Overridden settings:
> {'AUTOTHROTTLE_MAX_DELAY': 50, 'NEWSPIDER_MODULE': 'testSpider.spiders',
> 'FEED_URI': 'items.csv', 'CONCURRENT_REQUESTS_PER_DOMAIN': 3,
> 'AUTOTHROTTLE_TARGET_CONCURRENCY': 0.8, 'HTTPCACHE_IGNORE_HTTP_CODES':
> [300, 301, 302, 400, 401, 403, 404, 500, 502, 503], 'SPIDER_MODULES':
> ['testSpider.spiders'], 'AUTOTHROTTLE_START_DELAY': 2, 'HTTPCACHE_ENABLED':
> True, 'CONCURRENT_REQUESTS_PER_IP': 1, 'BOT_NAME': 'testSpider',
> 'LOG_FILE': 'logfile.log', 'HTTPCACHE_DIR': 'httpcache33', 'FEED_FORMAT':
> 'csv', 'AUTOTHROTTLE_ENABLED': True, 'DOWNLOAD_DELAY': 1}
> 2017-02-28 13:21:20 [scrapy] INFO: Enabled extensions:
> ['scrapy.extensions.feedexport.FeedExporter',
>  'scrapy.extensions.logstats.LogStats',
>  'scrapy.extensions.telnet.TelnetConsole',
>  'scrapy.extensions.corestats.CoreStats',
>  'scrapy.extensions.throttle.AutoThrottle']
> 2017-02-28 13:21:20 [py.warnings] WARNING: 
> c:\python27\lib\site-packages\scrapy\utils\deprecate.py:156:
> ScrapyDeprecationWarning: 
> `scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware`
> class is deprecated, use 
> `scrapy.downloadermiddlewares.useragent.UserAgentMiddleware`
> instead
>   ScrapyDeprecationWarning)
>
> 2017-02-28 13:21:20 [scrapy] INFO: Enabled downloader middlewares:
> ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
>  'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
>  'testSpider.middlewares.RandomUserAgentMiddleware',
>  'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
>  'testSpider.middlewares.ProxyMiddleware',
>  'scrapy.downloadermiddlewares.retry.RetryMiddleware',
>  'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
>  'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
>  'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
>  'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
>  'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
>  'scrapy.downloadermiddlewares.stats.DownloaderStats',
>  'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
> 2017-02-28 13:21:20 [scrapy] INFO: Enabled spider middlewares:
> ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
>  'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
>  'scrapy.spidermiddlewares.referer.RefererMiddleware',
>  'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
>  'scrapy.spidermiddlewares.depth.DepthMiddleware']
> 2017-02-28 13:21:20 [scrapy] INFO: Enabled item pipelines:
> []
> 2017-02-28 13:21:21 [scrapy] INFO: Spider opened
> 2017-02-28 13:21:21 [scrapy] INFO: Crawled 0 pages (at 0 pages/min),
> scraped 0 items (at 0 items/min)
> 2017-02-28 13:21:21 [scrapy] DEBUG: Telnet console listening on
> 127.0.0.1:6024
> 2017-02-28 13:21:22 [scrapy] DEBUG: Crawled (200) <GET https://
> ***.com/catalog/products/appraisal-form?sku=7619002> (referer: https://
> ***.com/client/)
> 2017-02-28 13:21:22 [scrapy] DEBUG: Crawled (200) <GET https://
> ***.com/catalog/products/appraisal-form?sku=7619002> (referer: 
> https://***/client/)
> ['cached']
>
>
> then it goes on like this ...all requests are ['cached']
>
>
> End log
>
> [scrapy] INFO: Closing spider (finished)
> 2017-02-28 13:23:49 [scrapy] INFO: Stored csv feed (5379 items) in:
> items.csv
> 2017-02-28 13:23:49 [scrapy] INFO: Dumping Scrapy stats:
> {'downloader/request_bytes': 10802020,
>  'downloader/request_count': 21516,
>  'downloader/request_method_count/GET': 21516,
>  'downloader/response_bytes': 31302380,
>  'downloader/response_count': 21516,
>  'downloader/response_status_count/200': 21516,
>  'finish_reason': 'finished',
>  'finish_time': datetime.datetime(2017, 2, 28, 11, 23, 49, 220000),
>  'httpcache/firsthand': 32,
>  'httpcache/hit': 21484,
>  'httpcache/miss': 32,
>  'httpcache/store': 32,
>  'item_scraped_count': 5379,
>  'log_count/DEBUG': 26896,
>  'log_count/INFO': 10,
>  'log_count/WARNING': 1,
>  'request_depth_max': 3,
>  'response_received_count': 21516,
>  'scheduler/dequeued': 21516,
>  'scheduler/dequeued/memory': 21516,
>  'scheduler/enqueued': 21516,
>  'scheduler/enqueued/memory': 21516,
>  'start_time': datetime.datetime(2017, 2, 28, 11, 21, 21, 134000)}
> 2017-02-28 13:23:49 [scrapy] INFO: Spider closed (finished)
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scrapy-users+unsubscr...@googlegroups.com.
> To post to this group, send email to scrapy-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to identify Cache location and delete it ?

Reply via email to