Does seem to work. Using deferToThread I run into the same problem, where it doesnt get into the parse() method until the program closes. I'm open to other ideas of how to organically get a URL for scrapy to crawl that isn't through a message queue, though this seems to be the most sensible option, if I can get it to work.
This is pretty messy, but here's what I have (I've never used deferToThread, or much threading in general for that matter so I may be doing this wrong) Full pastebin here (*exactly *what I have, minus AWS creds): http://pastebin.com/4cebXyTc def start_requests(self): self.logger.error("STARTING QUEUE") while True: queue = deferToThread(self.queue) self.logger.error(self.cpuz_url) if self.cpuz_url is None: time.sleep(10) continue yield Request(self.cpuz_url, self.parse) I've then changed my queue() function to have a try catch after it gets the try: message = message[0] message_body = message.get_body() self.logger.error(message_body) message_body = str(message_body).split(',') message.delete() self.cpuz_url = message_body[0] self.uid = message_body[1] except: self.logger.error(message) self.logger.error(self.cpuz_url) self.cpuz_url = None On Thu, Jun 16, 2016 at 8:23 PM, Neverlast N <neverla...@hotmail.com> wrote: > Thanks for bringing this up. I answered in SO. As a methodology - I would > say - try to make the simplest working thing possible and then build up > towards the more complex code you have. See at which point it breaks. Is it > when you add an API call? Is it when you return something? What I did was > to replace your queue() with this and it seems to work: > > def queue(self): > return 'http://www.example.com/?{}'.format(random.randint(0,100000)) > > What can we infer from this? > > > ------------------------------ > From: jdavis....@gmail.com > Date: Thu, 16 Jun 2016 13:43:28 -0400 > Subject: Trying to read from message queue, not parsing response in > make_requests_from_url loop > To: scrapy-users@googlegroups.com > > > I have this question on SO, but no answers unfortunately. Figured Id try > my luck here. > > > https://stackoverflow.com/questions/37770678/scrapy-not-parsing-response-in-make-requests-from-url-loop > > I'm trying to get scrapy to grab a URL from a message queue, and then > scrape that URL. I have the loop going just fine and grabbing the URL from > the queue, but it never enters the parse() method once it has a url, it > just continues to loop (and sometimes the url comes back around even though > I've deleted it from the queue...) > > While it's running in terminal, if I CTRL+C and force it to end, it enters > the parse() method and crawls the page, then ends. I'm not sure what's > wrong here. Scrapy needs to be running at all times to catch a url as it > enters the queue. Anyone have ideas or have done something like this? > > > class my_Spider(Spider): > name = "my_spider" > allowed_domains = ['domain.com'] > > def __init__(self): > super(my_Spider, self).__init__() > self.url = None > > def start_requests(self): > while True: > # Crawl the url from queue > yield self.make_requests_from_url(self._pop_queue()) > > def _pop_queue(self): > # Grab the url from queue > return self.queue() > > def queue(self): > url = None > while url is None: > conf = { > "sqs-access-key": "", > "sqs-secret-key": "", > "sqs-queue-name": "crawler", > "sqs-region": "us-east-1", > "sqs-path": "sqssend" > } > # Connect to AWS > conn = boto.sqs.connect_to_region( > conf.get('sqs-region'), > aws_access_key_id=conf.get('sqs-access-key'), > aws_secret_access_key=conf.get('sqs-secret-key') > ) > q = conn.get_queue(conf.get('sqs-queue-name')) > message = conn.receive_message(q) > # Didn't get a message back, wait. > if not message: > time.sleep(10) > url = None > else: > url = message > if url is not None: > message = url[0] > message_body = str(message.get_body()) > message.delete() > self.url = message_body > return self.url > > def parse(self, response): > ... > yield item > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to scrapy-users+unsubscr...@googlegroups.com. > To post to this group, send email to scrapy-users@googlegroups.com. > Visit this group at https://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.