I'm trying to make a spider that crawls through a website, eventually multiple websites, and lets me know if the CSS includes any "@media" queries. If none are included within the internal styling, I would like it to load the external stylesheets so that I can loop through the sources and search them. Right now I'm trying to save the responses in a list and loop through them all at once, but I'm starting to think it's a bad approach. Would anyone mind steering me in the right direction?
# -*- coding: utf-8 -*- import scrapy from scrapy.http import Request class SsSpider(scrapy.Spider): name = "ss" allowed_domains = [*"**scrapy.org**"*] #Example domain start_urls = ( *'**http://scrapy.org/**'*, ) cssResponses = [] cssResponseCount = 0 def parse(self, response): cssPaths = response.xpath("//link/@href[contains(., '.css')]").extract() cssRequestCount = len(cssPaths) for cssPath in cssPaths: yield Request(cssPath, callback=self.saveCssResponse) while cssRequestCount != self.cssResponseCount: continue #When all responses are received, loop through and determine if CSS is responsive def saveCssResponse(self, response): self.cssResponses.append(response.body) self.cssResponseCount += 1 -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.