En Tue, 13 Nov 2007 13:14:18 -0300, koara <[EMAIL PROTECTED]> escribió:
> i am using urllib.unquote_plus to unquote a string. Sometimes i get a > strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here > the problem is that some application decided to quote a non-ascii > character as %uxxxx directly, instead of using an encoding and quoting > byte per byte. > > Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely > not what the application meant. > > My question is, is this %u quoting a standard (i.e., urllib is in the > wrong), Not that I know of (and that doesn't prove anything). > is it not (i.e., the application is in the wrong and urllib > silently ignores the '%u0' - why?), and most importantly, is there a > simple workaround to get it working as expected? Try this (untested): def unquote_plus_u(source): result = unquote_plus(source) if '%u' in result: result = result.replace('%u','\\u').decode('unicode_escape') return result -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list