Hi all,
I am relatively new to the scrapy framework and may have interpreted
something incorrectly on how to work with ItemLoaders.
Wouldn't it be more practical if itemLoader generated an array of dicts
loaded with the scraped data according to the item model definition?
As I see it working now, itemLoader dumps all found occurrences, according
to the defined selectors, into the fields of the item model.
This makes it difficult to reconstruct the original data.
I put an example and hope someone can give some light onto it:
For a web page that contains recipes as follow:
*Recipe1*
*Ingredients*
*Ingredient 1 *
*Ingredient 2*
*Instructions*
*Step11*
*Step22*
*Recpe2*
*Ingredients*
*Ingredient 3*
*Ingredient 4*
*Instructions*
*Step21*
*Step22*
With a 'out-of-the box' spider, ItemLoader builds a dict such as output1:
*output1*
{'Recipe':[u'Recipe1',u'Recipe2],
'Ingredients':[u'Ingredient1',u'Ingredient2',u'Ingredient3',u'Ingredient4'],
'Steps':[u'Ingredient1',u'Ingredient2',u'Ingredient3',u'Ingredient4']}
Wouldn't it be more practical if itemLoader directly returned an array of
dicts as in *output2:*
*output2*
[{'Recipe':[u'Recipe1'],
'Ingredients':[u'Ingredient1',u'Ingredient2'],
'Steps':[u'Ingredient1',u'Ingredient2']},
{'Recipe':[u'Recipe2'],
'Ingredients':[u'Ingredient3',u'Ingredient4'],
'Steps':[u'Ingredient21',u'Ingredient22']}]
Am I interpreting something wrong about how ItemLoader?
Hope someone can help understand how to easily go from outoput1 format to
output2's?
Thanks
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.