Hi. I use scrapyd and got a problem with jobs closing by itself
using crawler.engine.close_spider method. They stuck into 'running' state.
class MysqlPipeline(object):
def __init__(self, crawler):
dispatcher.connect(self.spider_opened, signals.spider_opened)
dispatcher.connect(self.spider_closed, signals.spider_closed)
self.crawler = crawler
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def spider_opened(self, spider):
if spider.mode is None:
self.crawler.engine.close_spider(spider, 'Run mode not defined')
return
Scrapyd job log
2015-04-06 00:30:18+0300 [scrapy] INFO: Scrapy 0.24.5 started (bot:
scrapybot)
2015-04-06 00:30:18+0300 [scrapy] INFO: Optional features available: ssl,
http11, django
2015-04-06 00:30:18+0300 [scrapy] INFO: Overridden settings: {
'NEWSPIDER_MODULE': 'ololo.spiders', 'FEED_URI':
'/var/lib/scrapyd/items/ololo/myspider/eaccc874dbda11e4ab04000c29ca58d6.jl',
'CONCURRENT_REQUESTS_PER_DOMAIN': 50, 'CONCURRENT_REQUESTS': 800,
'RANDOMIZE_DOWNLOAD_DELAY': False, 'SPIDER_MODULES': ['ololo.spiders'],
'RETRY_TIMES': 100, 'DOWNLOAD_TIMEOUT': 600, 'CONCURRENT_ITEMS': 200,
'COOKIES_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (compatible; Yahoo!
Slurp; http://help.yahoo.com/help/us/ysearch/slurp)', 'DEFAULT_ITEM_CLASS':
'ololo.items.ololoPricesItem', 'LOG_FILE':
'/var/log/scrapyd/ololo/myspider/eaccc874dbda11e4ab04000c29ca58d6.log'}
2015-04-06 00:30:18+0300 [scrapy] INFO: Enabled extensions: FeedExporter,
LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-04-06 00:30:18+0300 [scrapy] INFO: Enabled downloader middlewares:
HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware,
RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware,
HttpCompressionMiddleware, RedirectMiddleware, TorMiddleware,
ChunkedTransferMiddleware, DownloaderStats
2015-04-06 00:30:18+0300 [scrapy] INFO: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2015-04-06 00:30:18+0300 [scrapy] INFO: Enabled item pipelines:
ProductPipeline, MysqlPipeline
2015-04-06 00:30:18+0300 [myspider] INFO: Spider opened
2015-04-06 00:30:18+0300 [myspider] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min)
2015-04-06 00:30:18+0300 [myspider] INFO: Closing spider (Run mode not
defined)
2015-04-06 00:30:18+0300 [myspider] INFO: Dumping Scrapy stats:
{'finish_reason': 'Bot is already running',
'finish_time': datetime.datetime(2015, 4, 5, 21, 30, 18, 499333),
'log_count/INFO': 7,
'start_time': datetime.datetime(2015, 4, 5, 21, 30, 18, 496027)}
2015-04-06 00:30:18+0300 [myspider] INFO: Spider closed (Run mode not
defined)
2015-04-06 00:30:18+0300 [scrapy] DEBUG: Telnet console listening on 127.0.
0.1:6027
2015-04-06 00:30:18+0300 [scrapy] DEBUG: Web service listening on 127.0.0.1:
6084
Looks like job is finished, but I can see it in system process list and in
scrapyd web interface.
What am I doing wrong?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.