Your problem is that you are using spider attributes to save the data, if your check at the Request documentation <http://doc.scrapy.org/en/latest/topics/request-response.html#request-objects>, you will notice that there is a meta attribute to move information from one request to his specific callback instance.
I usually do something like this: https://gist.github.com/nramirezuy/1bf4a6d635d98a1e4df0 El lunes, 11 de agosto de 2014 13:19:00 UTC-3, Gaurang shah escribió: > > Hi Guys, > > I have just started the scrapy and trying to scrap a website which > requires to crawl pages at multiple level. > > From this following website I require to map brand with category and > category with product. i.e like this > Nutripe,Dog Products,prod1 > Nutripe,Dog Products,prod2 > Nutripe,Dog Products,prod3 > Nutripe,Cat Products,prod1 > Nutripe,Cat Products,prod2 > However scrapy is really pissing me off. it looks easy however it's really > messy. I am not even able to map product with category. > > I am getting something like this > > Nutripe Dog Products,Cat Products > > I would really appreciate is someone would help me understand what's wrong > I am doing. > > > > > def get_url(self,string): > """Return complete url""" > return "http://link2linkco.com/" + string > > > def parse(self, response): > hxs = HtmlXPathSelector(response) > brands = hxs.select("//div[@id='contentFull']/div/p/a/@href") > # self.item = Link2LinkItem() > for brand in brands: > brand_page = brand.extract() > # print self.complete_url(brand_page) > yield Request(self.get_url(brand_page), > callback=self.parse_brands) > > > > def parse_brands(self, response): > > index = 1 > hxs = HtmlXPathSelector(response) > item = Link2LinkItem() > self.brand_name = > hxs.select("//*[@id='contentFull']/h1/text()").extract() > brands = > hxs.select("//div[@id='contentFull']/fieldset[2]/div/p/a/@href") > for brand in brands: > > brand_link = brand.extract() > > self.products_category = > hxs.select("//*[@id='contentFull']/fieldset[2]/div/p[2]/a/text()").extract() > print self.get_url(brand_link) > # yield Request(self.complete_url(brand_name), callback= > self.parse_catatories) > item['Brand'] = self.brand_name > item['Products_Category'] = self.products_category > return item > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.