On Mon, 2008-11-17 at 15:55 +0000, Darren Mansell wrote: > On Mon, 2008-11-17 at 15:24 +0000, Tim Golden wrote: > > Darren Mansell wrote: > > > Hi. > > > > > > I'm relatively new to python so please be gentle :) > > > > > > I'm trying to write a £ symbol to an MS SQL server using pymsssql . This > > > works but when selecting the data back (e.g. using SQL management > > > studio) the £ symbol is replaced with £ (latin capital letter A with > > > circumflex). > > > > > > This is a bit of a non-answer but... use pyodbc[*], > > use NVARCHAR cols, and use unicode values on insert: > > > > Thanks for the help. Unfortunately pyodbc seems to only work on Windows. > I need to connect to the SQL server from a Linux box. > > The db schema is very set in stone, I can't do anything with it. I'm > currently opening autogenerated SQL scripts, decoding them from utf-16 > and then back into utf-8 for pymssql to run them. > > It's been working great for ages until someone noticed the £ symbols had > this extra character in there.. >
As I was trying to explain in my other email, the £ does *not* have an "extra symbol" attached to it. It is being encoded at UTF-8 and then decoded as Latin-1 (ISO-8859-1). If you had other higher-order (> ASCII) characters in your text, they would also be mis-decoded, but would probably not show the original character in the output. That was just a coincidence. For example, if you had the character u'\xe6' (æ) in your input, which has the binary representation 1110 0110, it would be encoded in UTF-8 as follows: mask: 110x xxxx 10xx xxxx byte: 11 10 0110 encoding: 1100 0011 1010 0110 hex: c 3 a 6 bytestring: '\xc3\xa6' If you decode it as UTF-8, you get u'æ', but if you decode it as latin-1, you get u'æ'. Note that the latin-1 decoding here does not include æ. So what you are seeing is best thought of as two garbage characters, one of which happens (by coincidence only) to be the same as your original character. If you decode the bytes returned properly (as UTF-8), you will get the bytes you put in, for all characters. Cheers, Cliff -- http://mail.python.org/mailman/listinfo/python-list