> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf > Of Jeroen Ruigrok van der Werven > Sent: Wednesday, May 07, 2008 05:20 > To: Tom Pinckney > Cc: python-dev@python.org > Subject: Re: [Python-Dev] urllib unicode handling > > -On [20080507 04:06], Tom Pinckney ([EMAIL PROTECTED]) wrote: > >While in theory UTF-8 is not a standard, sites like Last.fm, Facebook > and > >Wikipedia seem to have embraced it (as have pretty much all other > major web > >sites). As with HTML, there is what the standard says and what the > actual > >browsers have to accept in order to work in the real world. >
FYI, here is how we have patched urrlib2 for use in EVE: --- C:\p4\sdk\stackless25\Lib\urllib.py 2008-03-21 14:47:23.000000000 -0000 +++ C:\p4\eve\KALI\common\stdlib\urllib.py 2007-11-06 11:18:01.000000000 -0000 @@ -1158,12 +1158,29 @@ except KeyError: res[i] = '%' + item except UnicodeDecodeError: res[i] = unichr(int(item[:2], 16)) + item[2:] return "".join(res) +unquote_inner = unquote +def unquote(s): + """CCP attempt at making sensible choices in unicode quoteing / unquoting """ + s = unquote_inner(s) + try: + u = s.decode("utf-8") + try: + s2 = s.decode("ascii") + except UnicodeDecodeError: + s = u #yes, s was definitely utf8, which isn't pure ascii + else: + if u != s: + s = u + except UnicodeDecodeError: + pass #can't have been utf8 + return s + def unquote_plus(s): """unquote('%7e/abc+def') -> '~/abc def'""" s = s.replace('+', ' ') return unquote(s) always_safe = ('ABCDEFGHIJKLMNOPQRSTUVWXYZ' @@ -1201,12 +1218,20 @@ for i in range(256): c = chr(i) safe_map[c] = (c in safe) and c or ('%%%02X' % i) _safemaps[cachekey] = safe_map res = map(safe_map.__getitem__, s) return ''.join(res) + +quote_inner = quote +def quote(s, safe = '/'): + """CCP addition, to try to sensibly support / circumvent issues with unicode in urls""" + try: + return quote_inner(s, safe) + except KeyError: + return quote_inner(s.encode("utf-8", safe)) def quote_plus(s, safe = ''): """Quote the query fragment of a URL; replacing ' ' with '+'""" if ' ' in s: s = quote(s, safe + ' ') return s.replace(' ', '+') _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com