scrapy multilevel web scrapping

Gaurang shah Mon, 11 Aug 2014 09:19:31 -0700

Hi Guys, 

I have just started the scrapy and trying to scrap a website which requires 
to crawl pages at multiple level.


>From this following website I require to map brand with category and 
category with product. i.e like this 
Nutripe,Dog Products,prod1
Nutripe,Dog Products,prod2
Nutripe,Dog Products,prod3
Nutripe,Cat Products,prod1
Nutripe,Cat Products,prod2
However scrapy is really pissing me off. it looks easy however it's really 
messy. I am not even able to map product with category. 

I am  getting something like this 

Nutripe     Dog Products,Cat Products

I would really appreciate is someone would help me understand what's wrong 
I am doing. 




    def get_url(self,string):
        """Return complete url"""
        return "http://link2linkco.com/"; + string


    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        brands = hxs.select("//div[@id='contentFull']/div/p/a/@href")
        # self.item = Link2LinkItem()
        for brand in brands:
            brand_page = brand.extract()
            # print self.complete_url(brand_page)
            yield Request(self.get_url(brand_page), 
callback=self.parse_brands)



    def parse_brands(self, response):

        index = 1
        hxs = HtmlXPathSelector(response)
        item = Link2LinkItem()
        self.brand_name = 
hxs.select("//*[@id='contentFull']/h1/text()").extract()
        brands = 
hxs.select("//div[@id='contentFull']/fieldset[2]/div/p/a/@href")
        for brand in brands:

            brand_link = brand.extract()

            self.products_category = 
hxs.select("//*[@id='contentFull']/fieldset[2]/div/p[2]/a/text()").extract()
            print self.get_url(brand_link)
            # yield Request(self.complete_url(brand_name), callback= 
self.parse_catatories)
            item['Brand'] = self.brand_name
            item['Products_Category'] = self.products_category
        return item


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

scrapy multilevel web scrapping

Reply via email to