Hi,

Just have a quick question about itempipelines and nulls / missing values

I have a field in my spider to extract a model number. Now in some cases 
this is null.

I had a pipeline set-up like the following however I always got an error. I 
essentially wanted to either add in a default value or skip over this 
record.

This example pipeline below would always fail!

class ModelNumberPipeline(object):
    def process_item(self, item, spider):
        if item['modelnumber']:
            return item
        else:
    raise DropItem("Missing Model Number in %s" % item)


The error was the following:

'price': u'$29.95',
         'product_id': u'3231002',
         'site_id': u'1',
         'site_type': u'1'}
        Traceback (most recent call last):
          File 
"/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 62, in 
_process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File 
"/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 65, in 
process_chain
            d.callback(input)
          File 
"/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 
382, in callback
            self._startRunCallbacks(result)
          File 
"/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 
490, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File 
"/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 
577, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/home/brendanb/scrapy/cbs-crawler/cbs/pipelines.py", line 
15, in process_item
            if item['model_number']:
          File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", 
line 50, in __getitem__
            return self._values[key]
        exceptions.KeyError: 'model_number'


If I changed this pipeline to just use a default value pipeline I can fudge 
it.

class ModelNumberPipeline(object):
    def process_item(self, item, spider):
item.setdefault('model_number', '')
return item


My question is why would this pipeline fail on checking for a null value.?

thanks
Brendan


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to