>From your crawl startup logs, there's 'HTTPCACHE_DIR': 'httpcache33' in your settings. Does that match the expected location of your HTTP cache?
On Tue, Feb 28, 2017 at 12:35 PM, <cristimoc...@gmail.com> wrote: > Hello Paul. > Thank you very much for trying to help. For the moment I am just setting > Cache Timeout to 24 hours to get over this problem but I would very much > prefer to be able to just completelly remove old cache. > > Here are my logs with LOG_LEVEL='DEBUG' > > Start log > > 2017-02-28 13:21:20 [scrapy] INFO: Scrapy 1.2.1 started (bot: testSpider) > 2017-02-28 13:21:20 [scrapy] INFO: Overridden settings: > {'AUTOTHROTTLE_MAX_DELAY': 50, 'NEWSPIDER_MODULE': 'testSpider.spiders', > 'FEED_URI': 'items.csv', 'CONCURRENT_REQUESTS_PER_DOMAIN': 3, > 'AUTOTHROTTLE_TARGET_CONCURRENCY': 0.8, 'HTTPCACHE_IGNORE_HTTP_CODES': > [300, 301, 302, 400, 401, 403, 404, 500, 502, 503], 'SPIDER_MODULES': > ['testSpider.spiders'], 'AUTOTHROTTLE_START_DELAY': 2, 'HTTPCACHE_ENABLED': > True, 'CONCURRENT_REQUESTS_PER_IP': 1, 'BOT_NAME': 'testSpider', > 'LOG_FILE': 'logfile.log', 'HTTPCACHE_DIR': 'httpcache33', 'FEED_FORMAT': > 'csv', 'AUTOTHROTTLE_ENABLED': True, 'DOWNLOAD_DELAY': 1} > 2017-02-28 13:21:20 [scrapy] INFO: Enabled extensions: > ['scrapy.extensions.feedexport.FeedExporter', > 'scrapy.extensions.logstats.LogStats', > 'scrapy.extensions.telnet.TelnetConsole', > 'scrapy.extensions.corestats.CoreStats', > 'scrapy.extensions.throttle.AutoThrottle'] > 2017-02-28 13:21:20 [py.warnings] WARNING: > c:\python27\lib\site-packages\scrapy\utils\deprecate.py:156: > ScrapyDeprecationWarning: > `scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware` > class is deprecated, use > `scrapy.downloadermiddlewares.useragent.UserAgentMiddleware` > instead > ScrapyDeprecationWarning) > > 2017-02-28 13:21:20 [scrapy] INFO: Enabled downloader middlewares: > ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', > 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', > 'testSpider.middlewares.RandomUserAgentMiddleware', > 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', > 'testSpider.middlewares.ProxyMiddleware', > 'scrapy.downloadermiddlewares.retry.RetryMiddleware', > 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', > 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', > 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', > 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', > 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', > 'scrapy.downloadermiddlewares.stats.DownloaderStats', > 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware'] > 2017-02-28 13:21:20 [scrapy] INFO: Enabled spider middlewares: > ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', > 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', > 'scrapy.spidermiddlewares.referer.RefererMiddleware', > 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', > 'scrapy.spidermiddlewares.depth.DepthMiddleware'] > 2017-02-28 13:21:20 [scrapy] INFO: Enabled item pipelines: > [] > 2017-02-28 13:21:21 [scrapy] INFO: Spider opened > 2017-02-28 13:21:21 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), > scraped 0 items (at 0 items/min) > 2017-02-28 13:21:21 [scrapy] DEBUG: Telnet console listening on > 127.0.0.1:6024 > 2017-02-28 13:21:22 [scrapy] DEBUG: Crawled (200) <GET https:// > ***.com/catalog/products/appraisal-form?sku=7619002> (referer: https:// > ***.com/client/) > 2017-02-28 13:21:22 [scrapy] DEBUG: Crawled (200) <GET https:// > ***.com/catalog/products/appraisal-form?sku=7619002> (referer: > https://***/client/) > ['cached'] > > > then it goes on like this ...all requests are ['cached'] > > > End log > > [scrapy] INFO: Closing spider (finished) > 2017-02-28 13:23:49 [scrapy] INFO: Stored csv feed (5379 items) in: > items.csv > 2017-02-28 13:23:49 [scrapy] INFO: Dumping Scrapy stats: > {'downloader/request_bytes': 10802020, > 'downloader/request_count': 21516, > 'downloader/request_method_count/GET': 21516, > 'downloader/response_bytes': 31302380, > 'downloader/response_count': 21516, > 'downloader/response_status_count/200': 21516, > 'finish_reason': 'finished', > 'finish_time': datetime.datetime(2017, 2, 28, 11, 23, 49, 220000), > 'httpcache/firsthand': 32, > 'httpcache/hit': 21484, > 'httpcache/miss': 32, > 'httpcache/store': 32, > 'item_scraped_count': 5379, > 'log_count/DEBUG': 26896, > 'log_count/INFO': 10, > 'log_count/WARNING': 1, > 'request_depth_max': 3, > 'response_received_count': 21516, > 'scheduler/dequeued': 21516, > 'scheduler/dequeued/memory': 21516, > 'scheduler/enqueued': 21516, > 'scheduler/enqueued/memory': 21516, > 'start_time': datetime.datetime(2017, 2, 28, 11, 21, 21, 134000)} > 2017-02-28 13:23:49 [scrapy] INFO: Spider closed (finished) > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.