So, putting my scraper under the fresh project and executing directly from
scrapy rather than the django manage.py works.
I realize this is the scrapy user group however would anyone have an idea
as to why inspect_response doesn't work under django management commands? I
have also noticed some problems with getting pdb.set_trace() to work in the
same situation.
Here is how I'm setting up and calling the scraper from django (maybe
something has changed in 0.22? This worked on an older version):
settings_module =
importlib.import_module('scrapers_2014.scrapers_2014.settings')
settings = CrawlerSettings(settings_module)
settings.overrides['ITEM_PIPELINES'] = self.select_pipeline(options)
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(self._spider)
crawler.start()
log.start()
reactor.run()
On Saturday, 22 February 2014 08:20:46 UTC+10, John wrote:
>
> Hmm, it does not fail in a fresh project... That is interesting. I am
> calling it from within my parse callback. The only difference I see is that
> I am using 'BaseSpider' as the parent class and not 'Spider'. I've tried
> changing this but that hasn't made a difference.
>
> Scrapy version info:
> Scrapy : 0.22.2
> lxml : 3.3.1.0
> libxml2 : 2.7.8
> Twisted : 13.2.0
> Python : 2.7.3 (default, Apr 10 2013, 06:20:15) - [GCC 4.6.3]
> Platform: Linux-3.2.0-57-generic-x86_64-with-Ubuntu-12.04-precise
>
> from scrapy.project import crawler is the line causing the import error.
>
> The other major difference between my fresh project and the project I'm
> working on is that my spider is called from inside a django command... I
> think that is an avenue that needs further investigation. Initially I had
> wanted my scraper to dump straight to the django db but now I'm using an
> intermediary JSON dump so that may no longer be necessary...
>
> On Saturday, 22 February 2014 00:24:06 UTC+10, Rolando Espinoza La fuente
> wrote:
>>
>> Does it fails in a fresh project? How/Where are you calling the function?
>> What's the output of "scrapy version -v"?
>>
>> Sometimes the crawler import error is due to a "from scrapy.project
>> import crawler".
>>
>> Alternatively, I like to use
>>
>> from IPython import embed; embed()
>>
>> instead of inspect_response because it gives me access to the current
>> variables. Although inspect_response gives you the handy shell shortcuts
>> like view(response).
>>
>> Rolando
>>
>>
>> On Fri, Feb 21, 2014 at 7:20 AM, John <[email protected]> wrote:
>>
>>> Hi Everyone,
>>>
>>> I'm trying to debug my scraper and have discovered the
>>> inspect_response() function which looks quite useful.
>>>
>>> However when importing it I get the following exception:
>>> exceptions.ImportError:
>>> cannot import name crawler
>>>
>>> I have also attempted using inspect_response from the python shell and
>>> get the same error.
>>>
>>> I'm using scrapy 0.22.2. Has anyone else encountered this error? What
>>> more information can I provide to investigate this?
>>>
>>> Cheers,
>>> John
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.