Nooo - Java's old UTF functions do not process UTF-8! They are there for String serialization, a
Java-internal format.
Use the Java Reader/Writer classes instead of these old ones!
See the Java tutorials on Internationalization:
http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html
Jain, Pankaj (MED, TCS) schreef:
I modified my program as per your suggestion(modified to
byChunk127) ,
Sorry, I was much too hasty with my reply. First of all, I should
have written byChunk255. And secondly, solutions like the one
Markus proposes are much better thought out.
My apologies.
Pim
rom: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 11, 2003 6:09PM
To: Jain, Pankaj (MED, TCS)
Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'
Subject: Re: Unicode character transformationthrough XSLT
Because the following code got apply to
Pim Blokland scripsit:
As I understand it, char is a signed 16 bits type in Java; any of
the others may be unsigned. Hence the problem.
Char is *unsigned*, all the others are always signed.
--
May the hair on your toes never fall out! John Cowan
--Thorin Oakenshield (to Bilbo)
Generally, try instantiating an InputStreamReader or similar from your input, with an explicit
encoding=UTF8. That will perform the conversion from UTF-8 to the internal 16-bit Unicode that
Java processes.
Always use XYZReader classes for text input and XYZWriter classes for text output.
Kenneth Whistler wrote:
Unicode character (\uFFE2\uFF80\uFF93)
...
What you are actually looking for is the UTF-8 sequence:
0xE2 0x80 0x93
The 8-bit UTF-8 bytes E2 80 93 (all with the most significant bit set) get *sign-extended* to 16
bits, producing FFE2 FF80 FF93. It should suffice in a
7:59 PM
To: Jain, Pankaj (MED, TCS)
Cc: '[EMAIL PROTECTED]'
Subject: Re: Unicode character transformation through XSLT
.
Pankaj Jain wrote,
My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93)
from resource bundle property file which is equivalent to ndash(-) and
its
U
Jain, Pankaj (MED, TCS) schreef:
But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving
ndash in
html.
In html? No way! Html can't interpret series of hex bytes. Try
ndash; or #8211;.
Pim Blokland
--Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Monday, March 10, 2003 7:59 PM
To: Jain, Pankaj (MED, TCS)
Cc: '[EMAIL PROTECTED]'
Subject: Re: Unicode character transformation through XSLT
.
Pankaj Jain wrote,
My problem is that, I am getting Unicode charac
Hi
My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) from resource bundle property file which is
equivalent to ndash(-) and its works fine in html and XML but whileTransformation through XSLT, it unable to
interpret it. and hence in I am getting???in stead of ndash.
.
Pankaj Jain wrote,
My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93)
from resource bundle property file which is equivalent to ndash(-) and
its
U+2013 is the ndash (–). It is represented in UTF-8 by three
hex bytes: E2 80 93.
But, \uFFE2 is fullwidth pound sign
Well, I can't diagnose exactly what is going wrong, but
Unicode character (\uFFE2\uFF80\uFF93)
is a sequence of a full-width not sign, followed by a
half-width katakana ta and a half-width katakana mo.
What you are actually looking for is the UTF-8 sequence:
0xE2 0x80 0x93
which is the UTF-8
12 matches
Mail list logo