I posted a related question to Stack Overflow at http://stackoverflow.com/questions/33084480/scrapy-error-can-t-find-callback, but so far it has no answers.
I am not able to get a spider to crawl past the first page of any site I have tried, despite many iterations and many re-reads of the docs. I decided to test it against the example code from the docs. The only change I made was to the name, so I could tell it apart. ''' Copied from Scrapy 1.03 docs at pdf page 15, section 2.3, Scrapy Tutorial Run this, as is, on Dmoz. ''' import scrapy from tutorial.items import DmozItem class DmozSpider(scrapy.Spider): name = "tutfollinks" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/", ] def parse(self, response): for href in response.css("ul.directory.dir-col > li > a::attr('href')"): url = response.urljoin(href.extract()) yield scrapy.Request(url, callback=self.parse_dir_contents) def parse_dir_contents(self, response): for sel in response.xpath('//ul/li'): item = DmozItem() item['title'] = sel.xpath('a/text()').extract() item['link'] = sel.xpath('a/@href').extract() item['desc'] = sel.xpath('text()').extract() yield item And here is what I got: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/pymodules/python2.7/scrapy/spiders/__init__.py", line 76, in parse raise NotImplementedError NotImplementedError 2015-10-12 19:31:21 [scrapy] INFO: Closing spider (finished) When I googled the error, my first hit was: http://stackoverflow.com/questions/5264829/why-does-scrapy-throw-an-error-for-me-when-trying-to-spider-and-parse-a-site The answer, according to the OP, was to change from BaseSpider to CrawlSpider. But, I repeat, this is copied verbatim from the example in the docs. Then how can it throw an error? In fact, the whole point of the example in the docs is to show how to crawl a site WITHOUT CrawlSpider, which is introduced for the first time in a note at the end of section 2.3.4 Another SO post had a similar issue, but in that case the original code was subclassed from CrawlSpider, and the OP was told he had accidentally overwritten parse(). But I see parse() being used in various examples in the docs, including this one. What, exactly, constitutes 'overwriting parse()'? Is it adding variables like the example in the docs do? How can that be? Furthermore, the callback in this case is explicitly not parse, but parse_dir_contents. What is going on here? Please, I'd like a why explanation as well as the hopefully simple answer. Thanks. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.