On Friday, September 2, 2016 at 6:05:05 AM UTC-7, Peter Otten wrote: > Sumeet Sandhu wrote: > > > Hi, > > > > I use urllib2 to grab google.com webpages on my Mac over my Comcast home > > network. > > > > I see about 1 error for every 50 pages grabbed. Most exceptions are > > ssl.SSLError, very few are socket.error and urllib2.URLError. > > > > The problem is - after a first exception, urllib2 occasionally stalls for > > upto an hour (!), at either the urllib2.urlopen or response.read stages. > > > > Apparently the urllib2 and socket timeouts are not effective here - how do > > I fix this? > > > > ---------------- > > import urllib2 > > import socket > > from sys import exc_info as sysExc_info > > timeout = 2 > > socket.setdefaulttimeout(timeout) > > > > try : > > req = urllib2.Request(url,None,headers) > > response = urllib2.urlopen(req,timeout=timeout) > > html = response.read() > > except : > > e = sysExc_info()[0] > > open(logfile,'a').write('Exception: %s \n' % e) > > < code that follows this : after the first exception, I try again for a > > few tries > > > I'd use separate try...except-s for response = urlopen() and > response.read(). If the problem originates with read() you could try to > replace it with select.select([response.fileno()], [], [], timeout) calls in > a loop.
thanks Peter. I will try this. However, I suspect Comcast is rate limiting my home use. Is there a workaround for that? What I really need is to somehow monitor the full url-read loop and break it if Comcast is stalling me too long... -- https://mail.python.org/mailman/listinfo/python-list