n00b confusion re: local variable referenced before assignment error

2009-06-19 Thread Wells Oliver
Writing a class which essentially spiders a site and saves the files
locally. On a URLError exception, it sleeps for a second and tries again (on
404 it just moves on). The relevant bit of code, including the offending
method:

class Handler(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url

def save(self, uri, location):
try:
handler = urllib2.urlopen(uri)
except urllib2.HTTPError, e:
if e.code == 404:
return
else:
print retrying %s (HTTPError) % uri
time.sleep(1)
self.save(uri, location)
except urllib2.URLError, e:
print retrying %s % uri
time.sleep(1)
self.save(uri, location)

if not os.path.exists(os.path.dirname(location)):
os.makedirs(os.path.dirname(location))

file = open(location, w)
file.write(handler.read())
file.close()

...

But what I am seeing is that after a retry (on catching a URLError
exception), I see bunches of UnboundLocalError: local variable 'handler'
referenced before assignment errors on line 38, which is the
file.write(handler.read()) line..

What gives?

-- 
Wells Oliver
we...@submute.net
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: n00b confusion re: local variable referenced before assignment error

2009-06-19 Thread Mike Kazantsev
On Fri, 19 Jun 2009 11:16:38 -0500
Wells Oliver we...@submute.net wrote:

 def save(self, uri, location):
 try:
 handler = urllib2.urlopen(uri)
 except urllib2.HTTPError, e:
 if e.code == 404:
 return
 else:
 print retrying %s (HTTPError) % uri
 time.sleep(1)
 self.save(uri, location)
 except urllib2.URLError, e:
 print retrying %s % uri
 time.sleep(1)
 self.save(uri, location)
 
 if not os.path.exists(os.path.dirname(location)):
 os.makedirs(os.path.dirname(location))
 
 file = open(location, w)
 file.write(handler.read())
 file.close()


 But what I am seeing is that after a retry (on catching a URLError
 exception), I see bunches of UnboundLocalError: local variable 'handler'
 referenced before assignment errors on line 38, which is the
 file.write(handler.read()) line..
 
 What gives?

Why not?
Try fails, except calls retry and after the retry code execution
continues to the undefined handler, since the try has failed here.
You might want to insert return or avoid (possibly endless) recursion
altogether - just wrap it into while loop with some counter (aka
max_tries).

Also, you can get rid of code duplication by catching some basic
urllib2 exception, then checking if it's urllib2.HTTPError and it's code
is 404, retrying (continue for the loop case) otherwise.

-- 
Mike Kazantsev // fraggod.net


signature.asc
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: n00b confusion re: local variable referenced before assignment error

2009-06-19 Thread Falcolas
On Jun 19, 10:16 am, Wells Oliver we...@submute.net wrote:
 Writing a class which essentially spiders a site and saves the files
 locally. On a URLError exception, it sleeps for a second and tries again (on
 404 it just moves on). The relevant bit of code, including the offending
 method:

 [snip]

 But what I am seeing is that after a retry (on catching a URLError
 exception), I see bunches of UnboundLocalError: local variable 'handler'
 referenced before assignment errors on line 38, which is the
 file.write(handler.read()) line..

 What gives?

'Handler' is only assigned in the try statement, so if you error into
your exception clause, nothing will have been bound to the name
'handler', causing the exception you're seeing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: n00b confusion re: local variable referenced before assignment error

2009-06-19 Thread Diez B. Roggisch

Wells Oliver schrieb:
Writing a class which essentially spiders a site and saves the files 
locally. On a URLError exception, it sleeps for a second and tries again 
(on 404 it just moves on). The relevant bit of code, including the 
offending method:


class Handler(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url

def save(self, uri, location):
try:
handler = urllib2.urlopen(uri)
except urllib2.HTTPError, e:
if e.code == 404:
return
else:
print retrying %s (HTTPError) % uri
time.sleep(1)
self.save(uri, location)
except urllib2.URLError, e:
print retrying %s % uri
time.sleep(1)
self.save(uri, location)

if not os.path.exists(os.path.dirname(location)):
os.makedirs(os.path.dirname(location))

file = open(location, w)
file.write(handler.read())
file.close()

...

But what I am seeing is that after a retry (on catching a URLError 
exception), I see bunches of UnboundLocalError: local variable 
'handler' referenced before assignment errors on line 38, which is the 
file.write(handler.read()) line..


Your code defines the name handler only if the urllib2.urlopen is 
successful. But you try later to access it uncoditionally, and of course 
that fails.


You need to put the file-stuff after the urlopen, inside the try-except.

Also note that python has no tail-recursion-optimization, so your method 
will recurse and at some point exhaust the stack if there are many errors.


You should consider writing it rather as while-loop, with breaking out 
of it when the page could be fetched.


Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: n00b confusion re: local variable referenced before assignment error

2009-06-19 Thread Dave Angel

Wells Oliver wrote:

Writing a class which essentially spiders a site and saves the files
locally. On a URLError exception, it sleeps for a second and tries again (on
404 it just moves on). The relevant bit of code, including the offending
method:

class Handler(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url

def save(self, uri, location):
try:
handler = urllib2.urlopen(uri)
except urllib2.HTTPError, e:
if e.code == 404:
return
else:
print retrying %s (HTTPError) % uri
time.sleep(1)
self.save(uri, location)
except urllib2.URLError, e:
print retrying %s % uri
time.sleep(1)
self.save(uri, location)

if not os.path.exists(os.path.dirname(location)):
os.makedirs(os.path.dirname(location))

file = open(location, w)
file.write(handler.read())
file.close()

...

But what I am seeing is that after a retry (on catching a URLError
exception), I see bunches of UnboundLocalError: local variable 'handler'
referenced before assignment errors on line 38, which is the
file.write(handler.read()) line..

What gives?

  
Your save() method is recursive in the case of that error, which is a 
poor excuse for what should have been a loop.  You should have some 
retry depth check, just in case you get hundreds of such errors on the 
same site.


But ignoring that issue, the specific symptom is caused when returning 
from the recursive call.  You still fall through to the end of your 
outer method call, and try to write stuff from that handler (also wiping 
out the file location in the process).  Without a handler, that causes 
an exception.  So you should follow those calls to self.save() with 
return's.


Much better would be to write a loop in the first place, and break out 
of the loop when you succeed.  Then the flow is much easier to follow, 
and you can easily avoid the risk of looping forever, nor of running out 
of stack space.


Something like (untested):

   def save(self, uri, location):
   for tries in xrange(10):#try up to 10 times
   try:
   handler = urllib2.urlopen(uri)
   except urllib2.HTTPError, e:
   if e.code == 404:
   break
   else:
   print retrying %s (HTTPError) % uri
   time.sleep(1)
   continue
   except urllib2.URLError, e:
   print retrying %s % uri
   time.sleep(1)
   continue

   if not os.path.exists(os.path.dirname(location)):
   os.makedirs(os.path.dirname(location))

   file = open(location, w)
   file.write(handler.read())
   file.close()
   break

Other refinements are possible, of course.  For example, if you have any 
cleanup code at the end you may need an additional flag variable to tell 
whether you've succeeded or not.


--
http://mail.python.org/mailman/listinfo/python-list