i must get 600 items , but like is closed in 34, i cannot obtain all. The web scanned have many sub links and i have filer by rule the url allow and domain allowed
Can obteis more information in some way , for known why abort?, exist some limit in item filter? El jueves, 18 de diciembre de 2014 00:06:30 UTC+1, Travis Leleu escribió: > > What makes you think it's closing prematurely? I see a lot of duplicate > requests filtered out by scrapy; if you aren't getting as many items as you > expected, that could be why. Check your assumptions. > > On Wed, Dec 17, 2014 at 2:31 PM, ROBERTO ANGUITA MARTIN < > [email protected] <javascript:>> wrote: >> >> I am trying my first crawl >> i launch my scrap with this command: >> >> nohup scrapy crawl prueba -o prueba.csv -t csv -s LOG_FILE=salida.out -s >> JOBDIR=work -L DEBUG & >> >> >> and i have configure CsvExportPipeline.py like in manual example, but >> when spider has scraped 34 FEEDS finish. >> >> Why? i are surfing in internet and everybody said memory problem but i >> don't found any about memory in log. >> >> Log level is in DEBUG but i cannot known reason why only read 34 items >> >> >> Final log is this: >> >> >> >> 2014-12-17 17:02:32+0100 [prueba] INFO: Closing spider (finished) >> >> 2014-12-17 17:02:32+0100 [prueba] INFO: Stored csv feed (34 items) >> in: prueba.csv >> >> 2014-12-17 17:02:32+0100 [prueba] INFO: Dumping Scrapy stats: >> >> {'downloader/request_bytes': 14603, >> >> 'downloader/request_count': 35, >> >> 'downloader/request_method_count/GET': 35, >> >> 'downloader/response_bytes': 551613, >> >> 'downloader/response_count': 35, >> >> 'downloader/response_status_count/200': 35, >> >> 'dupefilter/filtered': 363, >> >> 'finish_reason': 'finished', >> >> 'finish_time': datetime.datetime(2014, 12, 17, 16, 2, 32, 392134), >> >> 'item_scraped_count': 34, >> >> 'log_count/DEBUG': 72, >> >> 'log_count/ERROR': 1, >> >> 'log_count/INFO': 48, >> >> 'request_depth_max': 5, >> >> 'response_received_count': 35, >> >> 'scheduler/dequeued': 35, >> >> 'scheduler/dequeued/disk': 35, >> >> 'scheduler/enqueued': 35, >> >> 'scheduler/enqueued/disk': 35, >> >> 'start_time': datetime.datetime(2014, 12, 17, 15, 21, 55, 218630)} >> >> 2014-12-17 17:02:32+0100 [bodegas] INFO: Spider closed (finished) >> >> >> Can anybody help me? >> >> >> Regards >> >> Roberto >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
