On May 9, 8:36 am, Piet van Oostrum <p...@cs.uu.nl> wrote: > >>>>> grocery_stocker <cdal...@gmail.com> (gs) wrote: > >gs> The following code gets data from 5 different websites at the "same > >gs> time". > >gs> #!/usr/bin/python > >gs> import Queue > >gs> import threading > >gs> import urllib2 > >gs> import time > >gs> hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", > >gs> "http://ibm.com", "http://apple.com"] > >gs> queue = Queue.Queue() > >gs> class MyUrl(threading.Thread): > >gs> def __init__(self, queue): > >gs> threading.Thread.__init__(self) > >gs> self.queue = queue > >gs> def run(self): > >gs> while True: > >gs> host = self.queue.get() > >gs> if host is None: > >gs> break > >gs> url = urllib2.urlopen(host) > >gs> print url.read(1024) > >gs> #self.queue.task_done() > >gs> start = time.time() > >gs> def main(): > >gs> for i in range(5): > >gs> t = MyUrl(queue) > >gs> t.setDaemon(True) > >gs> t.start() > >gs> for host in hosts: > >gs> print "pushing", host > >gs> queue.put(host) > >gs> for i in range(5): > >gs> queue.put(None) > >gs> t.join() > >gs> if __name__ == "__main__": > >gs> main() > >gs> print "Elapsed Time: %s" % (time.time() - start) > >gs> How does the parallel download work if each thread has a lock? When > >gs> the program openswww.yahoo.com, it places a lock on the thread, > >gs> right? If so, then doesn't that mean the other 4 sites have to wait > >gs> for the thread to release the lock? > > No. Where does it set a lock? There is only a short lock period in the queue > when an item is put in the queue or got from the queue. And of course we > have the GIL, but this is released as soon as a long during operation is > started - in this case when the Internet communication is done. > --
Maybe I'm being a bit daft, but what prevents the data from www.yahoo.com from being mixed up with the data from www.google.com? Doesn't using queue() prevent the data from being mixed up? -- http://mail.python.org/mailman/listinfo/python-list