Shouldn't ItemLoader return a list(array) of dicts according to the item definition?

Daniel Fernández Lestón Mon, 14 Mar 2016 19:38:45 -0700

Hi all, 

I am relatively new to the scrapy framework and may have interpreted 
something incorrectly on how to work with ItemLoaders.


Wouldn't it be more practical if itemLoader generated an array of dicts 
loaded with the scraped data according to the item model definition?
As I see it working now, itemLoader dumps all found occurrences, according 
to the defined selectors, into the fields of the item model.

This makes it difficult to reconstruct the original data.

I put an example and hope someone can give some light onto it:
For a web page that contains recipes as follow:
 

*Recipe1*

*Ingredients*

*Ingredient 1 *

*Ingredient 2*

*Instructions*

*Step11*

*Step22*

*Recpe2*

*Ingredients*

*Ingredient 3*

*Ingredient 4*

*Instructions*

*Step21*

*Step22*


With a 'out-of-the box' spider, ItemLoader builds a dict such as output1:
*output1*

{'Recipe':[u'Recipe1',u'Recipe2],
'Ingredients':[u'Ingredient1',u'Ingredient2',u'Ingredient3',u'Ingredient4'],
'Steps':[u'Ingredient1',u'Ingredient2',u'Ingredient3',u'Ingredient4']}


Wouldn't it be more practical if itemLoader directly returned an array of 
dicts as in *output2:*
*output2*

[{'Recipe':[u'Recipe1'],
'Ingredients':[u'Ingredient1',u'Ingredient2'],
'Steps':[u'Ingredient1',u'Ingredient2']},

{'Recipe':[u'Recipe2'],
'Ingredients':[u'Ingredient3',u'Ingredient4'],
'Steps':[u'Ingredient21',u'Ingredient22']}]

 
Am I interpreting something wrong about how ItemLoader? 
Hope someone can help understand how to easily go from outoput1 format to 
output2's?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Shouldn't ItemLoader return a list(array) of dicts according to the item definition?

Reply via email to