Crawl through CSS and look for media queries?

scrapy . help Sat, 05 Sep 2015 11:15:08 -0700


I'm trying to make a spider that crawls through a website, eventually multiple 
websites, and lets me know if the CSS includes any "@media" queries. If none 
are included within the internal styling, I would like it to load the external 
stylesheets so that I can loop through the sources and search them. Right now 
I'm trying to save the responses in a list and loop through them all at once, 
but I'm starting to think it's a bad approach. Would anyone mind steering me in 
the right direction?


 
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request

class SsSpider(scrapy.Spider):
    name = "ss"
    allowed_domains = [*"**scrapy.org**"*] #Example domain
    start_urls = (
        *'**http://scrapy.org/**'*,
    )
    cssResponses = []
    cssResponseCount = 0

    def parse(self, response):
        cssPaths = response.xpath("//link/@href[contains(., '.css')]").extract()
        cssRequestCount = len(cssPaths)
        for cssPath in cssPaths:
            yield Request(cssPath, callback=self.saveCssResponse)
        while cssRequestCount != self.cssResponseCount:
            continue
        #When all responses are received, loop through and determine if CSS is 
responsive

    def saveCssResponse(self, response):
        self.cssResponses.append(response.body)
        self.cssResponseCount += 1





-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Crawl through CSS and look for media queries?

Reply via email to