If I dont find "No Information" in response.body I want to write the good
urls to a file.
I am struggling to build the filter.
Also, maybe there is a better way of storing the good urls and then
crawling back through them once the raw_urls have been 'filtered'?
raw_urls = generate_result_urls(self.YEAR, self.YEARS)
for url in raw_urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
#generate good urls
# gors to pages, if not info write url to file
f_ = 'goodurls.txt'
if b"No Information." not in response.body:
#write response.url to file here
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.