Re: Scheduling spider

vanam raghu Thu, 12 Jan 2017 06:11:49 -0800

HI paul, 

with version 1.2.2 looks like i got rid of problem but i have one question 
regarding the start url that will be provided to the spider


i am using this command for reading the url for spider

scrapy crawl <spidername> -s ALLOWED_DOMAINS=seeds.txt -s 
SEEDS_SOURCE=seeds.txt where my seeds.txt has the url 


can i include that while scheduling the spider like


*curl http://localhost:6800/schedule.json -d project=<name> -d 
spider=<spidername> -s ALLOWED_DOMAINS=seeds.txt -s SEEDS_SOURCE=seeds.txt*


*or do i need to explicitly mention under start_url?*


On Thursday, 12 January 2017 16:18:03 UTC+5:30, Paul Tremberth wrote:
>
> You can force installing the latest version  with `pip install --upgrade 
> scrapy`
> See https://pip.pypa.io/en/stable/reference/pip_install/#cmdoption-U
>
> On Thu, Jan 12, 2017 at 11:37 AM, vanam raghu <raghavendr...@gmail.com 
> <javascript:>> wrote:
>
>> Ok Paul i will do it , but i have one question when i try pip install 
>> scrapy i think its installing 1.2.1 version i am not sure though why is it?
>>
>> Is pip not installing the latest version??
>>
>>
>>
>>
>> On Thursday, 12 January 2017 15:53:11 UTC+5:30, Paul Tremberth wrote:
>>>
>>> Can you try updating scrapy to at least 1.2.2?
>>>
>>> On Thu, Jan 12, 2017 at 11:11 AM, vanam raghu <raghavendr...@gmail.com> 
>>> wrote:
>>>
>>>> HI Paul
>>>>
>>>> Its
>>>>
>>>> Scrapy 1.2.1
>>>>
>>>>
>>>> On Thursday, 12 January 2017 15:22:22 UTC+5:30, Paul Tremberth wrote:
>>>>>
>>>>> Hello Vanam,
>>>>>
>>>>> what version of Scrapy are you running?
>>>>> Scrapy 1.2.2 has a fix for a very similar error (if not the same): 
>>>>> https://github.com/scrapy/scrapy/issues/2011
>>>>>
>>>>> /Paul.
>>>>>
>>>>> On Thursday, January 12, 2017 at 10:43:14 AM UTC+1, vanam raghu wrote:
>>>>>>
>>>>>> While scheduling spider using scrapyd, i am seeing below error 
>>>>>>
>>>>>> 2017-01-12 14:28:30 [scrapy] INFO: Spider opened
>>>>>> 2017-01-12 14:28:30 [manager] INFO: 
>>>>>> --------------------------------------------------------------------------------
>>>>>> 2017-01-12 14:28:30 [manager] INFO: Starting Frontier Manager...
>>>>>> 2017-01-12 14:28:30 [manager] INFO: Frontier Manager Started!
>>>>>> 2017-01-12 14:28:30 [manager] INFO: 
>>>>>> --------------------------------------------------------------------------------
>>>>>> 2017-01-12 14:28:30 
>>>>>> [frontera.contrib.scrapy.schedulers.FronteraScheduler] INFO: Starting 
>>>>>> frontier
>>>>>> 2017-01-12 14:28:30 [scrapy] INFO: Closing spider (shutdown)
>>>>>> 2017-01-12 14:28:30 [twisted] CRITICAL: Unhandled error in Deferred:
>>>>>> 2017-01-12 14:28:30 [twisted] CRITICAL: 
>>>>>> Traceback (most recent call last):
>>>>>>   File 
>>>>>> "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 
>>>>>> 1258, in _inlineCallbacks
>>>>>>     result = result.throwExceptionIntoGenerator(g)
>>>>>>   File 
>>>>>> "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 
>>>>>> 389, in throwExceptionIntoGenerator
>>>>>>     return g.throw(self.type, self.value, self.tb)
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 
>>>>>> 87, in crawl
>>>>>>     yield self.engine.close()
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", 
>>>>>> line 100, in close
>>>>>>     return self._close_all_spiders()
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", 
>>>>>> line 340, in _close_all_spiders
>>>>>>     dfds = [self.close_spider(s, reason='shutdown') for s in 
>>>>>> self.open_spiders]
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", 
>>>>>> line 298, in close_spider
>>>>>>     dfd = slot.close()
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", 
>>>>>> line 44, in close
>>>>>>     self._maybe_fire_closing()
>>>>>>   File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", 
>>>>>> line 51, in _maybe_fire_closing
>>>>>>     self.heartbeat.stop()
>>>>>>   File 
>>>>>> "/usr/local/lib/python2.7/dist-packages/twisted/internet/task.py", line 
>>>>>> 202, in stop
>>>>>>     assert self.running, ("Tried to stop a LoopingCall that was "
>>>>>> AssertionError: Tried to stop a LoopingCall that was not running.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can any body help me why i am getting this error??
>>>>>>
>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to scrapy-users...@googlegroups.com.
>>>> To post to this group, send email to scrapy...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to scrapy-users...@googlegroups.com <javascript:>.
>> To post to this group, send email to scrapy...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scheduling spider

Reply via email to