Bugs item #1712522, was opened at 2007-05-04 06:11
Message generated for change (Comment added) made by nagle
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1712522&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Nagle (nagle)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.quote throws exception on Unicode URL

Initial Comment:
The code in urllib.quote fails on Unicode input, when
called by robotparser with a Unicode URL.

Traceback (most recent call last):
File "./sitetruth/InfoSitePage.py", line 415, in run
pagetree = self.httpfetch() # fetch page
File "./sitetruth/InfoSitePage.py", line 368, in httpfetch
if not self.owner().checkrobotaccess(self.requestedurl) : # if access 
disallowed by robots.txt file
File "./sitetruth/InfoSiteContent.py", line 446, in checkrobotaccess
return(self.robotcheck.can_fetch(config.kuseragent, url)) # return can fetch
File "/usr/local/lib/python2.5/robotparser.py", line 159, in can_fetch
url = urllib.quote(urlparse.urlparse(urllib.unquote(url))[2]) or "/"
File "/usr/local/lib/python2.5/urllib.py", line 1197, in quote
res = map(safe_map.__getitem__, s)
KeyError: u'\xe2'

   That bit of code needs some attention.  
- It still assumes ASCII goes up to 255, which hasn't been true in Python for a 
while now.
- The initialization may not be thread-safe; a table is being initialized on 
first use.

"robotparser" was trying to check if a URL with a Unicode character in it was 
allowed.  Note the "KeyError: u'\xe2'" 

----------------------------------------------------------------------

>Comment By: John Nagle (nagle)
Date: 2007-06-06 16:49

Message:
Logged In: YES 
user_id=5571
Originator: YES

As a workaround, you can surround calls to "can_fetch" with an try-block
and catch KeyError exceptions.  That's what I'm doing.  

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-06-05 23:39

Message:
Logged In: YES 
user_id=1344176
Originator: NO

Could you possibly provide a patch to fix this?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1712522&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to