Re: spider closing

ROBERTO ANGUITA MARTIN Sun, 21 Dec 2014 12:22:24 -0800

Hi Nicolas, if rules were wrong, i cannot obtain any feeds.
my rule is this


rules = 
(Rule(SgmlLinkExtractor(allow=('/detalle.asp\?idb\=\d+')),'parse_espia',follow=True),)

i start in www.domain.com and only i want capture data in 
url: http://www.domain.com/detalle.asp?idb=<number> for example

http://www.domain.com/detalle.asp?idb=2856


El jueves, 18 de diciembre de 2014 22:26:31 UTC+1, Nicolás Alejandro 
Ramírez Quiros escribió:
>
> That limit doesn't exist; the problem lives in your code. You mention that 
> you are using Rules, are your regex correct?
>
> El jueves, 18 de diciembre de 2014 06:49:36 UTC-2, ROBERTO ANGUITA MARTIN 
> escribió:
>>
>> i must get 600 items , but like is closed in 34, i cannot obtain all.
>> The web scanned have many sub links and i have filer by rule the url 
>> allow and domain allowed
>>
>> Can obteis more information in some way , for known why abort?, exist 
>> some limit in item filter?
>>
>> El jueves, 18 de diciembre de 2014 00:06:30 UTC+1, Travis Leleu escribió:
>>>
>>> What makes you think it's closing prematurely?  I see a lot of duplicate 
>>> requests filtered out by scrapy; if you aren't getting as many items as you 
>>> expected, that could be why.  Check your assumptions.
>>>
>>> On Wed, Dec 17, 2014 at 2:31 PM, ROBERTO ANGUITA MARTIN <
>>> [email protected]> wrote:
>>>>
>>>> I am trying my first crawl
>>>> i launch my scrap with this command:
>>>>
>>>> nohup scrapy crawl prueba -o prueba.csv -t csv -s LOG_FILE=salida.out  
>>>> -s JOBDIR=work -L DEBUG &
>>>>
>>>>
>>>> and i have configure CsvExportPipeline.py like in manual example, but 
>>>> when spider has scraped 34 FEEDS finish.
>>>>
>>>> Why? i are surfing in internet and everybody said memory problem but i 
>>>> don't found any about memory in log.
>>>>
>>>> Log level is in DEBUG but i cannot known reason why only read 34 items
>>>>
>>>>
>>>> Final log is this:
>>>>
>>>>
>>>>
>>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Closing spider (finished)
>>>>
>>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Stored csv feed (34 items) 
>>>> in: prueba.csv
>>>>
>>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Dumping Scrapy stats:
>>>>
>>>> {'downloader/request_bytes': 14603,
>>>>
>>>>  'downloader/request_count': 35,
>>>>
>>>>  'downloader/request_method_count/GET': 35,
>>>>
>>>>  'downloader/response_bytes': 551613,
>>>>
>>>>  'downloader/response_count': 35,
>>>>
>>>>  'downloader/response_status_count/200': 35,
>>>>
>>>>  'dupefilter/filtered': 363,
>>>>
>>>>  'finish_reason': 'finished',
>>>>
>>>>  'finish_time': datetime.datetime(2014, 12, 17, 16, 2, 32, 392134),
>>>>
>>>>  'item_scraped_count': 34,
>>>>
>>>>  'log_count/DEBUG': 72,
>>>>
>>>>  'log_count/ERROR': 1,
>>>>
>>>>  'log_count/INFO': 48,
>>>>
>>>>  'request_depth_max': 5,
>>>>
>>>>  'response_received_count': 35,
>>>>
>>>>  'scheduler/dequeued': 35,
>>>>
>>>>  'scheduler/dequeued/disk': 35,
>>>>
>>>>  'scheduler/enqueued': 35,
>>>>
>>>>  'scheduler/enqueued/disk': 35,
>>>>
>>>>  'start_time': datetime.datetime(2014, 12, 17, 15, 21, 55, 218630)}
>>>>
>>>> 2014-12-17 17:02:32+0100 [bodegas] INFO: Spider closed (finished)
>>>>
>>>>
>>>> Can anybody help me?
>>>>
>>>>
>>>> Regards
>>>>
>>>> Roberto
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: spider closing

Reply via email to