This is my third or fourth post in the last 24 hours. I freely admit that I don’t know what I am doing, and that over the last several hours for this particular issue I have been guessing, because I didn’t know what scrapy wanted from me and I couldn’t find an answer.
Here are just a few lines from my log today. It runs over 100 pages when pasted into my word processor. I was just trying to make this work with the pipeline. It started with this error: SavePipeline(item) > TypeError: object() takes no parameters and never got better. I read on SO that this was because my pipeline class did not have its own __init__ method, and so python was searching in the parent object for one. I thought that made sense, so I put an __init__ in there, and hell ensued. It was the usual ‘how many arguments’ problem, but when I tried giving it only self, and leaving the rest blank or with ‘pass’, I got indentation errors. So I tried putting something innocuous like self.name = name, and we were back to the how many arguments error. I tried giving it process_item as an attribute, and after many go rounds and variations, that worked, but then it wouldn’t take my call to the process_item method – back to the number of arguments again. I imported my spider, and that helped, but still the errors kept coming. It’s been about 6 hours. I have Googled all over the place. I give up. I don’t get it. I need help. Here is one full traceback, typical of most but hardly the only one, followed by an abbreviated version of some others, including the last: Traceback (most recent call last): > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/twisted/internet/defer.py", > > line 1301, in _inlineCallbacks > result = g.send(result) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py", > > line 72, in crawl > self.engine = self._create_engine() > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py", > > line 97, in _create_engine > return ExecutionEngine(self, lambda _: self.stop()) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/engine.py", > > line 70, in __init__ > self.scraper = Scraper(crawler) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/scraper.py", > > line 71, in __init__ > self.itemproc = itemproc_cls.from_crawler(crawler) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py", > > line 58, in from_crawler > return cls.from_settings(crawler.settings, crawler) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py", > > line 34, in from_settings > mwcls = load_object(clspath) > File > "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/utils/misc.py", > > line 44, in load_object > mod = import_module(module) > File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module > return _bootstrap._gcd_import(name[level:], package, level) > File "<frozen importlib._bootstrap>", line 986, in _gcd_import > File "<frozen importlib._bootstrap>", line 969, in _find_and_load > File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked > File "<frozen importlib._bootstrap>", line 673, in _load_unlocked > File "<frozen importlib._bootstrap_external>", line 665, in exec_module > File "<frozen importlib._bootstrap>", line 222, in > _call_with_frames_removed > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 87, in <module> > class SavePipeline(object): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 96, in SavePipeline > SavePipeline(process_item) > NameError: name 'SavePipeline' is not defined > 2017-05-28 02:43:30,386:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 87, in <module> > class SavePipeline(object): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 96, in SavePipeline > SavePipeline(process_item) > NameError: name 'SavePipeline' is not defined > 2017-05-28 > 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error > in Deferred: > 2017-05-28 > 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error > in Deferred: > 2017-05-28 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 96, in <module> > SavePipeline(process_item) > NameError: name 'process_item' is not defined > 2017-05-28 02:44:46,862:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 96, in <module> > SavePipeline(process_item) > NameError: name 'process_item' is not defined > 2017-05-28 03:10:29,174:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 100 > return cls(name = =crawler.settings.get('ITEM_PIPELINES'),) > ^ > SyntaxError: invalid syntax > 2017-05-28 03:10:51,021:middleware.py:53:from_settings:INFO:Enabled > downloader middlewares: > 2017-05-28 > 03:10:51,024:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error > in Deferred: > 2017-05-28 > 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error > in Deferred: > 2017-05-28 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 100, in from_crawler > return cls(name = crawler.settings.get('ITEM_PIPELINES'),) > NameError: name 'crawler' is not defined > 2017-05-28 03:10:51,026:_legacy.py:154:publishToNewObserver:CRITICAL: > Traceback (most recent call last): > File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", > line 100, in from_crawler > return cls(name = crawler.settings.get('ITEM_PIPELINES'),) > NameError: name 'crawler' is not defined PIPELINE.PY > from items import Acquire2Item > item = Acquire2Item() > from acquire2.spiders import testerapp2 > class SavePipeline(object): > def __init__(self, name): > self.name = name > def process_item(self, item, testerapp2): > item.save() > return > process_item(self, item, testerapp2) > @classmethod > def from_crawler(cls, testerapp2): > return cls(name = crawler.settings.get('ITEM_PIPELINES'),) I notice there is something in there about crawler settings. I read this http://mengyangyang.org/scrapy/topics/item-pipeline.html#from_crawler among many other things. Obviously I don’t get it. Perhaps this is related to my other question about settings earlier today? I just noticed that url. This must be a Chinese copy of the docs. Don’t think that makes a difference here. Any help at all will be appreciated. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.