Bugs item #1712522, was opened at 2007-05-04 06:11 Message generated for change (Comment added) made by nagle You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1712522&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Nagle (nagle) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.quote throws exception on Unicode URL Initial Comment: The code in urllib.quote fails on Unicode input, when called by robotparser with a Unicode URL. Traceback (most recent call last): File "./sitetruth/InfoSitePage.py", line 415, in run pagetree = self.httpfetch() # fetch page File "./sitetruth/InfoSitePage.py", line 368, in httpfetch if not self.owner().checkrobotaccess(self.requestedurl) : # if access disallowed by robots.txt file File "./sitetruth/InfoSiteContent.py", line 446, in checkrobotaccess return(self.robotcheck.can_fetch(config.kuseragent, url)) # return can fetch File "/usr/local/lib/python2.5/robotparser.py", line 159, in can_fetch url = urllib.quote(urlparse.urlparse(urllib.unquote(url))[2]) or "/" File "/usr/local/lib/python2.5/urllib.py", line 1197, in quote res = map(safe_map.__getitem__, s) KeyError: u'\xe2' That bit of code needs some attention. - It still assumes ASCII goes up to 255, which hasn't been true in Python for a while now. - The initialization may not be thread-safe; a table is being initialized on first use. "robotparser" was trying to check if a URL with a Unicode character in it was allowed. Note the "KeyError: u'\xe2'" ---------------------------------------------------------------------- >Comment By: John Nagle (nagle) Date: 2007-06-06 16:49 Message: Logged In: YES user_id=5571 Originator: YES As a workaround, you can surround calls to "can_fetch" with an try-block and catch KeyError exceptions. That's what I'm doing. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-06-05 23:39 Message: Logged In: YES user_id=1344176 Originator: NO Could you possibly provide a patch to fix this? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1712522&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com