> From: André Warnier [mailto:a...@ice-sa.com] > Subject: Re: [OT] Basic int/char conversion question > > Suppose I do this : > > String knownEncoding = "ISO-8859-1"; // or "ISO-8859-2" > InputStreamReader fromApp; > fromApp = = new InputStreamReader(socket.getInputStream(), > Charset.forName(knownEncoding)); > int ic = 0; > StringBuffer buf = new StringBuffer(2000); > while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB) > buf.append((char)ic); > > .. then I'm still appending the same char (really, byte) to my > buffer, right ?
No, it's not the same. It's the proper Unicode equivalent of the input byte (or bytes, for multi-byte character sets), not the original 8-bit value. You're responsible for setting the appropriate character set on InputStreamReader constructor to insure that conversion takes place. > But by doing > buf.append((char) ic) > I am still interpreting ic as being, by platform default, ISO-8859-1, > thus I am still appending the Unicode codepoint U00B5. That's not correct. The interpretation occurs on the read() operation on the InputStreamReader, not the cast to a char. The read() already converted the byte according to the specified Charset; if your input is 8859-2, you must use that on the InputStreamReader constructor. > Or, can I / do I have to now also say : > char ic = 0; > while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB) > buf.append(ic); That can't ever work, since a char is unsigned, so can never have a value of -1; you will get a compilation error since the result of the read() is an int, not a char. > In other words, in order to keep my changes and post-festivities > headaches to a minimum, I would like to keep buf being a StringBuffer. Which is exactly why you should use an InputStreamReader, not an InputStream, and not change anything else. > So what I was really looking for was the correct alternative to > buf.append((char) ic); You're looking in the wrong place; the conversion should occur as the input is being read, not during the append(). > A cursory examination of the webapp code seems to show that > the byte in question is only ever compared to either -1 or > integers below 127, or characters in the lower ASCII range > "A-Za-z". Excellent; then wrappering the InputStream with an InputStreamReader set to the appropriate character set is *exactly* what you need. > But is > if (char == some-integer) > always valid as a replacement for > if (int == some-integer) No; a char is unsigned, which is why all read() methods return an int, not a byte or a char. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org