2009/8/22 AK <a...@nothere.com>: > Vlastimil Brom wrote: >> >> 2009/8/22 AK <a...@nothere.com>: >>> >>> Steven D'Aprano wrote: >>>> >>>> On Sat, 22 Aug 2009 04:20:23 -0400, AK wrote: >>>> >>>>> Hi, if I have a string '\\303\\266', how can I convert it to '\303\266' >>>>> in a general way? >>>> >>>> It's not clear what you mean. >>>> >>>> Do you mean you have a string '\\303\\266', that is: >>>> >>>> backslash backslash three zero three backslash backslash two six six >>>> >>>> If so, then the simplest way is: >>>> >>>>>>> s = r'\\303\\266' # note the raw string >>>>>>> len(s) >>>> >>>> 10 >>>>>>> >>>>>>> print s >>>> >>>> \\303\\266 >>>>>>> >>>>>>> print s.replace('\\\\', '\\') >>>> >>>> \303\266 >>>> >>>> >>>> Another possibility: >>>> >>>>>>> s = '\\303\\266' # this is not a raw string >>>>>>> len(s) >>>> >>>> 8 >>>>>>> >>>>>>> print s >>>> >>>> \303\266 >>>> >>>> So s is: >>>> backslash three zero three backslash two six six >>>> >>>> and you don't need to do any more. >>> >>> Well, I need the string itself to become '\303\266', not to print >>> that way. In other words, when I do 'print s', it should display >>> unicode characters if my term is set to show them, instead of >>> showing \303\266. >>> >>>> >>>>> The problem I'm running into is that I'm connecting with pygresql to a >>>>> postgres database and when I get fields that are of 'char' type, I get >>>>> them in unicode, but when I get fields of 'byte' type, I get the text >>>>> with quoted slashes, e.g. '\303' becomes '\\303' and so on. >>>> >>>> Is pygresql quoting the backslash, or do you just think it is quoting >>>> the >>>> backslashes? How do you know? E.g. if you have '\\303', what is the >>>> length >>>> of that? 4 or 5? >>> >>> Length is 4, and I need it to be length of 1. E.g.: >>> >>>>>> s = '\303' >>>>>> s >>> >>> '\xc3' >>>>>> >>>>>> x = '\\303' >>>>>> x >>> >>> '\\303' >>>>>> >>>>>> len(x) >>> >>> 4 >>>>>> >>>>>> len(s) >>> >>> 1 >>> >>> >>> What I get from pygresql is x, what I need is s. Either by asking >>> pygresql >>> to do this or convert it afterwards. I can't do replace('\\303', '\303') >>> because it can be any unicode character. >>> >>>> >>> >>> -- >>> AK >>> -- >>> http://mail.python.org/mailman/listinfo/python-list >>> >> >> >> Hi, >> do you mean something like >> >>>>> u"\u0303" >> >> u'\u0303' >>>>> >>>>> print u"\u0303" >> >> ̃ >> ̃ (dec.: 771) (hex.: 0x303) ̃ COMBINING TILDE (Mark, Nonspacing) >> ? >> >> vbr > > Yes, something like that except that it starts out as '\\303\\266', and it's > good enough for me if it turns into '\303\266', in fact that's rendered as > one unicode char. In other words, when you do: > >>>> print "\\303\\266" > '\303\266' > > I need that result to become a python string, i.e. the slashes need to > be converted from literal slashes to escape slashes. > > > > > -- > AK > -- > http://mail.python.org/mailman/listinfo/python-list >
Not sure, whether it is the right way of handling the such text data, but maybe: >>> decoded = '\\303\\266'.decode("string_escape") >>> decoded '\xc3\xb6' >>> print decoded ö >>> print '\303\266' ö >>> It might be an IDLE issue, but it still isn't one unicode glyph. I guess, you have to ensure, that the input data is valid and the right encoding is used. hth vbr -- http://mail.python.org/mailman/listinfo/python-list