> From: André Warnier [mailto:a...@ice-sa.com]
> Subject: Re: [OT] Basic int/char conversion question
>
> Suppose I do this :
>
> String knownEncoding = "ISO-8859-1"; // or "ISO-8859-2"
> InputStreamReader fromApp;
> fromApp =  = new InputStreamReader(socket.getInputStream(),
> Charset.forName(knownEncoding));
> int ic = 0;
> StringBuffer buf = new StringBuffer(2000);
> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
>            buf.append((char)ic);
>
> .. then I'm still appending the same char (really, byte) to my
> buffer, right ?

No, it's not the same.  It's the proper Unicode equivalent of the input byte 
(or bytes, for multi-byte character sets), not the original 8-bit value.  
You're responsible for setting the appropriate character set on 
InputStreamReader constructor to insure that conversion takes place.

> But by doing
>         buf.append((char) ic)
> I am still interpreting ic as being, by platform default, ISO-8859-1,
> thus I am still appending the Unicode codepoint U00B5.

That's not correct.  The interpretation occurs on the read() operation on the 
InputStreamReader, not the cast to a char.  The read() already converted the 
byte according to the specified Charset; if your input is 8859-2, you must use 
that on the InputStreamReader constructor.

> Or, can I / do I have to now also say :
> char ic = 0;
> while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
>            buf.append(ic);

That can't ever work, since a char is unsigned, so can never have a value of 
-1; you will get a compilation error since the result of the read() is an int, 
not a char.

> In other words, in order to keep my changes and post-festivities
> headaches to a minimum, I would like to keep buf being a StringBuffer.

Which is exactly why you should use an InputStreamReader, not an InputStream, 
and not change anything else.

> So what I was really looking for was the correct alternative to
>            buf.append((char) ic);

You're looking in the wrong place; the conversion should occur as the input is 
being read, not during the append().

> A cursory examination of the webapp code seems to show that
> the byte in question is only ever compared to either -1 or
> integers below 127, or characters in the lower ASCII range
> "A-Za-z".

Excellent; then wrappering the InputStream with an InputStreamReader set to the 
appropriate character set is *exactly* what you need.

> But is
> if (char == some-integer)
> always valid as a replacement for
> if (int == some-integer)

No; a char is unsigned, which is why all read() methods return an int, not a 
byte or a char.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to