Re: Unicode character transformation through XSLT

2003-03-14 Thread Markus Scherer
Nooo - Java's old UTF functions do not process UTF-8! They are there for String serialization, a Java-internal format. Use the Java Reader/Writer classes instead of these old ones! See the Java tutorials on Internationalization: http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html

Re: Unicode character transformation through XSLT

2003-03-13 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: I modified my program as per your suggestion(modified to byChunk127) , Sorry, I was much too hasty with my reply. First of all, I should have written byChunk255. And secondly, solutions like the one Markus proposes are much better thought out. My apologies. Pim

Re: Unicode character transformation through XSLT

2003-03-13 Thread Yung-Fong Tang
rom: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 11, 2003 6:09PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: Re: Unicode character transformationthrough XSLT Because the following code got apply to

Re: Unicode character transformation through XSLT

2003-03-12 Thread John Cowan
Pim Blokland scripsit: As I understand it, char is a signed 16 bits type in Java; any of the others may be unsigned. Hence the problem. Char is *unsigned*, all the others are always signed. -- May the hair on your toes never fall out! John Cowan --Thorin Oakenshield (to Bilbo)

Re: Unicode character transformation through XSLT

2003-03-12 Thread Markus Scherer
Generally, try instantiating an InputStreamReader or similar from your input, with an explicit encoding=UTF8. That will perform the conversion from UTF-8 to the internal 16-bit Unicode that Java processes. Always use XYZReader classes for text input and XYZWriter classes for text output.

Re: Unicode character transformation through XSLT

2003-03-11 Thread Markus Scherer
Kenneth Whistler wrote: Unicode character (\uFFE2\uFF80\uFF93) ... What you are actually looking for is the UTF-8 sequence: 0xE2 0x80 0x93 The 8-bit UTF-8 bytes E2 80 93 (all with the most significant bit set) get *sign-extended* to 16 bits, producing FFE2 FF80 FF93. It should suffice in a

RE: Unicode character transformation through XSLT

2003-03-11 Thread Jain, Pankaj (MED, TCS)
7:59 PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]' Subject: Re: Unicode character transformation through XSLT . Pankaj Jain wrote, My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) from resource bundle property file which is equivalent to ndash(-) and its U

Re: Unicode character transformation through XSLT

2003-03-11 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving ndash in html. In html? No way! Html can't interpret series of hex bytes. Try ndash; or #8211;. Pim Blokland

Re: Unicode character transformation through XSLT

2003-03-11 Thread Yung-Fong Tang
--Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, March 10, 2003 7:59 PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]' Subject: Re: Unicode character transformation through XSLT . Pankaj Jain wrote, My problem is that, I am getting Unicode charac

Unicode character transformation through XSLT

2003-03-10 Thread Jain, Pankaj (MED, TCS)
Hi My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) from resource bundle property file which is equivalent to ndash(-) and its works fine in html and XML but whileTransformation through XSLT, it unable to interpret it. and hence in I am getting???in stead of ndash.

Re: Unicode character transformation through XSLT

2003-03-10 Thread jameskass
. Pankaj Jain wrote, My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) from resource bundle property file which is equivalent to ndash(-) and its U+2013 is the ndash (–). It is represented in UTF-8 by three hex bytes: E2 80 93. But, \uFFE2 is fullwidth pound sign

Re: Unicode character transformation through XSLT

2003-03-10 Thread Kenneth Whistler
Well, I can't diagnose exactly what is going wrong, but Unicode character (\uFFE2\uFF80\uFF93) is a sequence of a full-width not sign, followed by a half-width katakana ta and a half-width katakana mo. What you are actually looking for is the UTF-8 sequence: 0xE2 0x80 0x93 which is the UTF-8