Mark Davis wrote: > > > > > Hello all, > > I have been trying to input unicode from a browser and store it in a database. >The problem is the different encodings used to represent the unicode. > > The input text is in the UTF-8 format. I have read on the Microsoft support site >that SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does not recognize >UTF-8 > > as valid character data. Of the solutions offered only two were of any use: > > 1) Convert between the two on input and output > > 2) Store as raw data in binary form > > I have been unable to get the raw data into the database correctly so decided to >try the first option. However although I keep reading that round conversion between >the 2 > > formats is quick, easy and reliable, i have been unable to accomplish this. I am >using JSPs, so the Session.Codepage command doesn't work, and anyway I would prefer a > > less platform specific solution. Does anyone know of a way of converting a java >string in UTF-8 to UTF-16 format. > > > I talk about it a bit in an older paper of mine, at > http://www.ibm.com/java/education/globalapps/Converting.html > > You can either use the String API or Stream API. For Strings use: > > String utf16chars = new String(utf8bytes[],"UTF8"); > > utf16bytes = utf16chars.getBytes("UTF8"); > > For Streams, use InputStreamReader > (http://java.sun.com/j2se/1.3/docs/api/java/io/InputStreamReader.html) > or OutputStreamWriter. > > > Also I was wondering if anyone knows why the UTF-8 can't be treated as a regular >Latin1 string. My database is set to use the Cp1252 code page, and so should this not > > > Whenever you mark bytes with the wrong codepage, you are likely to get > errors; any software that interprets or converts those bytes will get > the wrong answer. Using Cp1252 when what you are storing is either > UTF-8 or UTF-16 will give you problems. > > > recognise the characters input to it? eg A japanese character in UTF-8 was broken >down to ??? and these three characters are in the windows character set. However by > > the time it reaches the database it is changed to ? Does this mean that >somewhere along the way the string is being changed into a different form where the >character set > > doesn't support certain characters? Does the fact that Java internally uses >UTF-16(I think) cause any problems? > > > Java supports UCS-2, but UTF-16 is simply an extension of UCS-2, and > shares the same storage. The difference is not relevant to you here. > > > > > Thanks for any suggestions, > > Stephen > > (If you have just gotten this message already I apologise but I was having >difficulty with registration) > >

