If I dont find "No Information" in response.body I want to write the good 
urls to a file.
I am struggling to build the filter.
Also, maybe there is a better way of storing the good urls and then 
crawling back through them once the raw_urls have been 'filtered'?
 
   
               def start_requests(self):

                  raw_urls = generate_result_urls(self.YEAR, self.YEARS)
                 for url in raw_urls:
                     yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        #generate good urls
        # gors to pages, if not info write url to file
        f_ = 'goodurls.txt'

        #look for
        if b"No Information." not in response.body:
            print(response.url)
            yield

       else:
          #write response.url to file here


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to