I am using dirbot from https://github.com/scrapy/dirbot and generating two
spiders without actually changing anything at all (i.e. no custom code yet).
Here is the file structure of newly unzipped dirbot archive.
Wed Mar 05 - 02:12 PM > find .
.
./.gitignore
./dirbot
./dirbot/items.py
./dirbot/pipelines.py
./dirbot/settings.py
./dirbot/spiders
./dirbot/spiders/dmoz.py
./dirbot/spiders/__init__.py
./dirbot/__init__.py
./README.rst
./scrapy.cfg
./setup.py
It doesn't contain reference to scrapybot yet.
Wed Mar 05 - 02:13 PM > find . -type f -print0 | xargs -0 grep -i
scrapybot
Then I generate my first spider:
Wed Mar 05 - 02:14 PM > scrapy genspider confluenceChildPages confluence
Created spider 'confluenceChildPages' using template 'crawl' in module:
dirbot.spiders.confluenceChildPages
The new directory structure is:
Wed Mar 05 - 02:14 PM > find .
.
./.gitignore
./dirbot
./dirbot/items.py
./dirbot/items.pyc
./dirbot/pipelines.py
./dirbot/settings.py
./dirbot/settings.pyc
./dirbot/spiders
./dirbot/spiders/confluenceChildPages.py
./dirbot/spiders/dmoz.py
./dirbot/spiders/dmoz.pyc
./dirbot/spiders/__init__.py
./dirbot/spiders/__init__.pyc
./dirbot/__init__.py
./dirbot/__init__.pyc
./README.rst
./scrapy.cfg
./setup.py
And I find that it references scrapybot
Wed Mar 05 - 02:17 PM > find . -type f -print0 | xargs -0 grep -i
scrapybot
./dirbot/spiders/confluenceChildPages.py:from scrapybot.items import
ScrapybotItem
./dirbot/spiders/confluenceChildPages.py: i = ScrapybotItem()
When I try to generate a second spider, it fails:
Thu Feb 27 - 01:59 PM > scrapy genspider xxx confluence
Traceback (most recent call last):
File "/usr/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.22.2', 'scrapy')
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 505,
in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1245,
in run_script
execfile(script_filename, namespace, namespace)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/EGG-INFO/scripts/scrapy",
line 4, in <module>
execute()
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py",
line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py",
line 89, in _run_print_help
func(*a, **kw)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py",
line 150, in _run_command
cmd.run(args, opts)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/commands/genspider.py",
line 68, in run
crawler = self.crawler_process.create_crawler()
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py",
line 87, in create_crawler
self.crawlers[name] = Crawler(self.settings)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py",
line 25, in __init__
self.spiders = spman_cls.from_crawler(self)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
line 35, in from_crawler
sm = cls.from_settings(crawler.settings)
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
line 31, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
line 22, in __init__
for module in walk_modules(name):
File
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/utils/misc.py",
line 68, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in
import_module
__import__(name)
File
"/d/some/dir/dirbot-master/dirbot/spiders/confluenceChildPages.py", line 4,
in <module>
from scrapybot.items import ScrapybotItem
ImportError: No module named scrapybot.items
So, I understand that the second one is failing because it is referencing
scrapybot.items, which doesn't exist.
1) Have I done something wrong with my first genspider command for it to
create a reference to items that don't exist? Should I do something
differently?
2) What is that the second genspider failed exactly? Why is choking on a
reference to something that doesn't exist in the *first* spider?
3) How do I fix this? Should I change the commands I use to generate the
spiders or modify the files that are created by the first command?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.