init method on pipeline class

Malik Rumi Sat, 27 May 2017 21:03:31 -0700

 

This is my third or fourth post in the last 24 hours. I freely admit that I 
don’t know what I am doing, and that over the last several hours for this 
particular issue I have been guessing, because I didn’t know what scrapy 
wanted from me and I couldn’t find an answer.



Here are just a few lines from my log today. It runs over 100 pages when 
pasted into my word processor. I was just trying to make this work with the 
pipeline. It started with this error:


SavePipeline(item)
> TypeError: object() takes no parameters


and never got better.


I read on SO that this was because my pipeline class did not have its own 
__init__ method, and so python was searching in the parent object for one. 
I thought that made sense, so I put an __init__ in there, and hell ensued. 
It was the usual ‘how many arguments’ problem, but when I tried giving it 
only self, and leaving the rest blank or with ‘pass’, I got indentation 
errors.


So I tried putting something innocuous like self.name = name, and we were 
back to the how many arguments error. I tried giving it process_item as an 
attribute, and after many go rounds and variations, that worked, but then 
it wouldn’t take my call to the process_item method – back to the number of 
arguments again. I imported my spider, and that helped, but still the 
errors kept coming. It’s been about 6 hours. I have Googled all over the 
place. I give up. I don’t get it. I need help. 


Here is one full traceback, typical of most but hardly the only one, 
followed by an abbreviated version of some others, including the last:


Traceback (most recent call last):
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/twisted/internet/defer.py",
>  
> line 1301, in _inlineCallbacks
> result = g.send(result)
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py",
>  
> line 72, in crawl
> self.engine = self._create_engine()
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py",
>  
> line 97, in _create_engine
> return ExecutionEngine(self, lambda _: self.stop())
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/engine.py",
>  
> line 70, in __init__
> self.scraper = Scraper(crawler)
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/scraper.py",
>  
> line 71, in __init__
> self.itemproc = itemproc_cls.from_crawler(crawler)
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py",
>  
> line 58, in from_crawler
> return cls.from_settings(crawler.settings, crawler)
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py",
>  
> line 34, in from_settings
> mwcls = load_object(clspath)
> File 
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/utils/misc.py",
>  
> line 44, in load_object
> mod = import_module(module)
> File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
> return _bootstrap._gcd_import(name[level:], package, level)
> File "<frozen importlib._bootstrap>", line 986, in _gcd_import
> File "<frozen importlib._bootstrap>", line 969, in _find_and_load
> File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
> File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
> File "<frozen importlib._bootstrap_external>", line 665, in exec_module
> File "<frozen importlib._bootstrap>", line 222, in 
> _call_with_frames_removed
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 87, in <module>
> class SavePipeline(object):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 96, in SavePipeline
> SavePipeline(process_item)
> NameError: name 'SavePipeline' is not defined
> 2017-05-28 02:43:30,386:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 87, in <module>
> class SavePipeline(object):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 96, in SavePipeline
> SavePipeline(process_item)
> NameError: name 'SavePipeline' is not defined
> 2017-05-28 
> 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error 
> in Deferred:
> 2017-05-28 
> 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error 
> in Deferred:
> 2017-05-28 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 96, in <module>
> SavePipeline(process_item)
> NameError: name 'process_item' is not defined
> 2017-05-28 02:44:46,862:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 96, in <module>
> SavePipeline(process_item)
> NameError: name 'process_item' is not defined
> 2017-05-28 03:10:29,174:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 100
> return cls(name = =crawler.settings.get('ITEM_PIPELINES'),)
> ^
> SyntaxError: invalid syntax
> 2017-05-28 03:10:51,021:middleware.py:53:from_settings:INFO:Enabled 
> downloader middlewares:
> 2017-05-28 
> 03:10:51,024:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error 
> in Deferred:
> 2017-05-28 
> 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error 
> in Deferred:
> 2017-05-28 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 100, in from_crawler
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)
> NameError: name 'crawler' is not defined
> 2017-05-28 03:10:51,026:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py", 
> line 100, in from_crawler
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)
> NameError: name 'crawler' is not defined



PIPELINE.PY
> from items import Acquire2Item
> item = Acquire2Item()
> from acquire2.spiders import testerapp2
> class SavePipeline(object):
> def __init__(self, name):
> self.name = name
> def process_item(self, item, testerapp2):
> item.save()
> return
> process_item(self, item, testerapp2)
> @classmethod
> def from_crawler(cls, testerapp2):
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)


I notice there is something in there about crawler settings. I read this 
http://mengyangyang.org/scrapy/topics/item-pipeline.html#from_crawler among 
many other things. Obviously I don’t get it. Perhaps this is related to my 
other question about settings earlier today?


I just noticed that url. This must be a Chinese copy of the docs. Don’t 
think that makes a difference here. 


Any help at all will be appreciated.  

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

init method on pipeline class

Reply via email to