Re: [OT] Basic int/char conversion question
Hi. Christopher Schultz wrote: André, André Warnier wrote: an existing webapp reads from a socket connected to an external program. The input stream is created as follows : fromApp = socket.getInputStream(); The read is as follows : StringBuffer buf = new StringBuffer(2000); int ic; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); This is wrong, because it assumes that the input stream is always in an 8-bit default platform encoding, which it isn't. Does it? The only assumption I see here is that the byte code 0x1a has a special meaning. Since ASCII is usually the lowest common denominator for character encodings, is this a bad assumption? Considering the often devious ways in which character encoding questions can come back to bite one, I am not so sure. By doing a read(), the app currently consumes one byte, whether it matches 0x1A or not. If the input stream was UTF-8 for instance, that byte might be the 2d, or 3rd byte of a multi-byte UTF-8 character sequence, which might happen to have the integer value 0x1A, although it's meaning would be totally different. (I have not re-checked the UTF-8 encoding to verify if that is a possible value for a 2d or 3rd byte, but I think it is). How do I do this correctly, assuming that I do know that the incoming stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is being used (such as iso-8859-1 or iso-8859-2) ? I cannot change the InputStream into something else, because there are a zillion other places where this webapp tests on the read byte's value, numerically. and there are other places where the byte is being tested against other values than 0x1A. I like Chuck's suggestion to use an InputStreamReader because the interfaces are (at least accidentally) the same, at least for the method in question. Me too. It is the most logical, and the one which I would apply if I were to rewrite this app from scratch. I would also have the other app (the one which sends this stream to the webapp) send some kind of prefix to the stream, indicating the encoding used. (Or at least have both that app and the webapp have some external parameter telling them respectively what to send and what to expect). I'm not sure how you would modify an entire application to fix this code everywhere, though. Right. I was trying to find a magic shortcut. At first I was hoping that I could just do some kind of string replace patch with Notepad, directly on the compiled classes. Unfortunately, considering these byte tests in several places, I can't. Thanks again for all the suggestions though. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 André, André Warnier wrote: an existing webapp reads from a socket connected to an external program. The input stream is created as follows : fromApp = socket.getInputStream(); The read is as follows : StringBuffer buf = new StringBuffer(2000); int ic; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); This is wrong, because it assumes that the input stream is always in an 8-bit default platform encoding, which it isn't. Does it? The only assumption I see here is that the byte code 0x1a has a special meaning. Since ASCII is usually the lowest common denominator for character encodings, is this a bad assumption? How do I do this correctly, assuming that I do know that the incoming stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is being used (such as iso-8859-1 or iso-8859-2) ? I cannot change the InputStream into something else, because there are a zillion other places where this webapp tests on the read byte's value, numerically. I like Chuck's suggestion to use an InputStreamReader because the interfaces are (at least accidentally) the same, at least for the method in question. I'm not sure how you would modify an entire application to fix this code everywhere, though. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklr6QEACgkQ9CaO5/Lv0PBYDQCgk0iWAsuvSlujYJRCiWerrHXg lFIAnio6Qts6FMg1lWZZvNSkqvNLY70p =z+yg -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
Caldarale, Charles R wrote: From: Konstantin Kolinko [mailto:knst.koli...@gmail.com] Subject: Re: [OT] Basic int/char conversion question reset() is not implemented in InputStreamReader Quite correct; sorry - the revised code would be this: import java.io.ByteArrayInputStream; import java.io.InputStreamReader; import java.io.IOException; import java.nio.charset.Charset; public class Converter { byte[] ba = new byte[1]; ByteArrayInputStream bais = new ByteArrayInputStream(ba); InputStreamReader isr; public Converter(String csName) { isr = new InputStreamReader(bais, Charset.forName(csName)); } public char convert(byte b) { bais.reset(); ba[0] = b; try { return (char)isr.read(); } catch (IOException ioe) { // This can't happen in our situation, but... return '\0'; } } } Once again, thank you all for all the information provided. The real solution would be to rewrite that whole application (external feeder program included), to make it take charset variations into account. Better yet, it should all be rewritten to use Unicode and UTF-8, and be done with it. I don't have that option now, so I will use one of the techniques provided, depending on what I find is easiest to implement here for the specific limited problem at hand. But I will preserve all your code snippets and tips, because this is all extremely difficult to find in the Java documentation, at least from this angle. Once again, as I believe Chuck once wrote, when one knows how to phrase the question, one probably has already 90% of the answer. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
On Jan 2, 2009, at 7:39 AM, André Warnier wrote: Once again, as I believe Chuck once wrote, when one knows how to phrase the question, one probably has already 90% of the answer. Sometimes, posing the question solves the problem. More than once, thinking I was stuck, I set out to compose a complete well thought out post to this or some other list. At some point, it occurs to me, Gee, I don't think so, but I'd better check that point too, and bingo! -- that was the issue. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] Basic int/char conversion question
From: André Warnier [mailto:a...@ice-sa.com] Subject: [OT] Basic int/char conversion question I cannot change the InputStream into something else Actually, I think you can. If you wrapper the InputStream with an InputStreamReader specifying the desired character set, the rest of the code can continue to use the read() method and check for specific character (rather than byte) values. For example: fromApp = new InputStreamReader(socket.getInputStream(), Charset.forName(ISO-8859-2)); As long as the code is checking for values in the ASCII range (0 - 127) or -1, I believe this will work for you. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
On Thu, Jan 1, 2009 at 11:13, André Warnier a...@ice-sa.com wrote: Hi. This has nothing specific to Tomcat, it's just a problem I'm having as a non-java expert in modifying an exiting webapp. I hope someone on this list can answer quickly, or send me to the appropriate place to find out. I have tried to find, but get somewhat lost in the Java docs. Problem : an existing webapp reads from a socket connected to an external program. The input stream is created as follows : fromApp = socket.getInputStream(); The read is as follows : StringBuffer buf = new StringBuffer(2000); int ic; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); This is wrong, because it assumes that the input stream is always in an 8-bit default platform encoding, which it isn't. How do I do this correctly, assuming that I do know that the incoming stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is being used (such as iso-8859-1 or iso-8859-2) ? I cannot change the InputStream into something else, because there are a zillion other places where this webapp tests on the read byte's value, numerically. I mean, to append correctly to buf what was read in the int, knowing that the proper encoding (charset) of fromApp is X, how do I write this ? Thanks. Another option: Read the bytes into a ByteBuffer, then convert the bytes into a string. You can tell the String constructor which charset to use. -- Len
RE: [OT] Basic int/char conversion question
From: Len Popp [mailto:len.p...@gmail.com] Subject: Re: [OT] Basic int/char conversion question Another option: Read the bytes into a ByteBuffer, then convert the bytes into a string. You can tell the String constructor which charset to use. That would seem to violate one of the specified constraints: I cannot change the InputStream into something else, because there are a zillion other places where this webapp tests on the read byte's value, numerically. whereas using an InputStreamReader would not since its read() method is compatible with that of a plain InputStream. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
Caldarale, Charles R wrote: From: Len Popp [mailto:len.p...@gmail.com] Subject: Re: [OT] Basic int/char conversion question I note with satisfaction that I'm not the only one laboring away on this day-after, but you're just all going a bit too fast for me and my growing but still limited Java knowledge. I like the idea of wrapping this in an InputStreamReader, but how is this solving my problem ? Suppose I do this : String knownEncoding = ISO-8859-1; // or ISO-8859-2 InputStreamReader fromApp; fromApp = = new InputStreamReader(socket.getInputStream(), Charset.forName(knownEncoding)); int ic = 0; StringBuffer buf = new StringBuffer(2000); while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); .. then I'm still appending the same char (really, byte) to my buffer, right ? For example, suppose the byte in question is \xB5 on the fromApp stream. Interpreted as iso-8859-1, this would be the character micro, with Unicode codepoint U00B5, which I would thus append to the StringBuffer (which is, unless I am mistaken, composed of Unicode characters). However, if the fromApp stream really was iso-8859-2, this same byte \xB5 should be interpreted as the Unicode codepoint U013E (LATIN SMALL LETTER L WITH CARON). ( ł ) But by doing buf.append((char) ic) I am still interpreting ic as being, by platform default, ISO-8859-1, thus I am still appending the Unicode codepoint U00B5. Not so ? Or, can I / do I have to now also say : char ic = 0; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append(ic); ?? In other words, in order to keep my changes and post-festivities headaches to a minimum, I would like to keep buf being a StringBuffer. So what I was really looking for was the correct alternative to buf.append((char) ic); which would convert ic from an integer, to the appropriate Unicode character, taking into account the knownEncoding which I know. Does that not exist ? A cursory examination of the webapp code seems to show that the byte in question is only ever compared to either -1 or integers below 127, or characters in the lower ASCII range A-Za-z. But is if (char == some-integer) always valid as a replacement for if (int == some-integer) ? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
On Thu, Jan 1, 2009 at 14:39, André Warnier a...@ice-sa.com wrote: I note with satisfaction that I'm not the only one laboring away on this day-after, but you're just all going a bit too fast for me and my growing but still limited Java knowledge. No hang-over here. :-) In other words, in order to keep my changes and post-festivities headaches to a minimum, I would like to keep buf being a StringBuffer. So what I was really looking for was the correct alternative to buf.append((char) ic); which would convert ic from an integer, to the appropriate Unicode character, taking into account the knownEncoding which I know. Does that not exist ? (I'll leave the InputStreamReader explanation to Chuck.) I was guessing that the StringBuffer would soon be converted to a String (which is the usual case). If not … I don't see a simple one-line way to convert one byte to a character in a given charset. It looks like String and CharsetDecoder are the classes you're supposed to use. If there's an easy way to convert a single character, someone please point it out. How about this: Read the bytes as bytes, convert them to a String in the correct charset, and create a StringBuffer from that. Like so: String knownEncoding = ISO-8859-1; // or ISO-8859-2 InputStreamReader fromApp; fromApp = = new InputStreamReader(socket.getInputStream(), int ic = 0; ByteBuffer inbuf = ByteBuffer.allocate(2000); while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) inbuf.put((char)ic); byte[] inbytes = new byte[inbuf.limit()]; inbuf.get(inbytes); String s = new String(inbytes, knownEncoding); StringBuffer buf = new StringBuffer(s); (I haven't tested this so it might not be correct.) It's not very efficient but it keeps the changes in one place. -- Len
RE: [OT] Basic int/char conversion question
Andre/Len in case the earlier responses did not answer how to receive a CharSet encoded InputStream to a reader suggest implmenting a Reader which will accomodate charset (such as InputStreamReader) http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html InputStreamReader public InputStreamReader(InputStream in, Charset cs) Create an InputStreamReader that uses the given charset. Parameters:in - An InputStreamcs - A charsetSince:1.4 BTW: Int to Char conversion String new_string=new java.lang.Integer(int int_input).toString(); BTW: CharToInt conversion int inta=new java.lang.Integer(String str_input).intValue(); HTH Martin __ Disclaimer and confidentiality note Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission. Date: Thu, 1 Jan 2009 16:23:05 -0500 From: len.p...@gmail.com To: users@tomcat.apache.org Subject: Re: [OT] Basic int/char conversion question On Thu, Jan 1, 2009 at 14:39, André Warnier a...@ice-sa.com wrote: I note with satisfaction that I'm not the only one laboring away on this day-after, but you're just all going a bit too fast for me and my growing but still limited Java knowledge. No hang-over here. :-) In other words, in order to keep my changes and post-festivities headaches to a minimum, I would like to keep buf being a StringBuffer. So what I was really looking for was the correct alternative to buf.append((char) ic); which would convert ic from an integer, to the appropriate Unicode character, taking into account the knownEncoding which I know. Does that not exist ? (I'll leave the InputStreamReader explanation to Chuck.) I was guessing that the StringBuffer would soon be converted to a String (which is the usual case). If not … I don't see a simple one-line way to convert one byte to a character in a given charset. It looks like String and CharsetDecoder are the classes you're supposed to use. If there's an easy way to convert a single character, someone please point it out. How about this: Read the bytes as bytes, convert them to a String in the correct charset, and create a StringBuffer from that. Like so: String knownEncoding = ISO-8859-1; // or ISO-8859-2 InputStreamReader fromApp; fromApp = = new InputStreamReader(socket.getInputStream(), int ic = 0; ByteBuffer inbuf = ByteBuffer.allocate(2000); while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) inbuf.put((char)ic); byte[] inbytes = new byte[inbuf.limit()]; inbuf.get(inbytes); String s = new String(inbytes, knownEncoding); StringBuffer buf = new StringBuffer(s); (I haven't tested this so it might not be correct.) It's not very efficient but it keeps the changes in one place. -- Len _ Life on your PC is safer, easier, and more enjoyable with Windows Vista®. http://clk.atdmt.com/MRT/go/127032870/direct/01/
Re: [OT] Basic int/char conversion question
2009/1/1 André Warnier a...@ice-sa.com: Hi. This has nothing specific to Tomcat, it's just a problem I'm having as a non-java expert in modifying an exiting webapp. I hope someone on this list can answer quickly, or send me to the appropriate place to find out. I have tried to find, but get somewhat lost in the Java docs. Problem : an existing webapp reads from a socket connected to an external program. The input stream is created as follows : fromApp = socket.getInputStream(); The read is as follows : StringBuffer buf = new StringBuffer(2000); int ic; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); This is wrong, because it assumes that the input stream is always in an 8-bit default platform encoding, which it isn't. How do I do this correctly, assuming that I do know that the incoming stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is being used (such as iso-8859-1 or iso-8859-2) ? I cannot change the InputStream into something else, because there are a zillion other places where this webapp tests on the read byte's value, numerically. I mean, to append correctly to buf what was read in the int, knowing that the proper encoding (charset) of fromApp is X, how do I write this ? 1. Using iso-8859-1 does not loose any information. That is, you can later print this out to iso-8859-1 stream, you will get exactly those 8-bit bytes of iso-8859-2 as were in input. If you need correctly Unicode, though, you can convert them by calling String.getBytes(encoding) and new String(bytes, encoding). new String(str.getBytes(ISO-8859-1), ISO-8859-2) 2. Well, the above, and all the others' tips I have read in this thread so far are the right ones. Those are what you should do when you are engineering and writing a well-made application. That is, you have to go with InputStreamReader, String, CharsetDecoder APIs and that will take care of various encodings, including multi-byte ones. In you case, when you are tailoring some oddly (bad) written specific application to your specific environment, and do not expect much, there is a simple approach: implement this conversion by using a lookup table. You will just need some static table of 256 chars and you are done. For example, package mypackage; import java.io.UnsupportedEncodingException; public class TranslationTable { private static char[] table; static { // static initialization block byte[] bytes = new byte[256]; for (int i=0; ibytes.length; i++){ bytes[i] = (byte) i; } try { table = new String(bytes, ISO-8859-2).toCharArray(); } catch (UnsupportedEncodingException ex) { ex.printStackTrace(); //System.exit(1); throw new Error(Class initialization failed, ex); } } public static char lookup(int i) { // will throw ArrayIndexOutOfBoundsException if i is -1, but that should be OK return table[i]; } } and replace buf.append((char)ic); with buf.append(TranslationTable.lookup(ic)); Also, I would replace StringBuffer with StringBuilder, if you are running in Java 5 or later, but that is another story. Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
To Konstantin and all the others who have responded, many thanks for all the tips, specially since this was quite a bit off-topic. I need some time to digest the tips though, and choose the best way according to the code that was dumped in my lap. I must say that I find it a bit curious that Java does not have an easy out-of-the-box method to convert a byte to a char, with a character filter specifier. Something like char mychar = toChar(int,charset) (or int.toChar(charset)) Oh well, maybe Java 7.. To Konstantin in particular : I know that I don't lose information by converting iso-8859-2 (thinking it is iso-8859-1) to Unicode one way, then re-converting this Unicode to iso-8859-2 (re-using the iso-8859-1 filter). I will get the same bytes in the end. The problem is that this is a servlet writing the result to the response object. And if I tell it to use iso-8859-1 for the response, it automatically also sets the response Content-Type to iso-8859-1. Which in this case is wrong, because the browser then gets confused. And as I have found out, it is quite hard to change this Content-Type header after-the-fact. Even a servlet filter won't do it, because by that time the response is committed. Even the front-end Apache can't do it, because it won't let you change the Content-Type header.. So my problem is in reverse : The servlet must set the response output encoding to iso-8859-2, in order to produce the correct Content-Type for the browser. To produce correct iso-8859-2 from the internal Unicode string, this Unicode string must have the proper Unicode chars corresponding to the iso-8859-2 characters I want to output. But the servlet reads those bytes as int's, and does a bunch of internal tests and manipulations on them, without taking into account that they could be anything else than iso-8859-1. For the same reason, I cannot just replace the InputStream by something that would translate these bytes on-the-fly to Unicode chars, because for high iso-8859-2 bytes, it would generate internal codes that do no longer fall into values 0-255, and that may create a problem somewhere deep in code I haven't yet looked at. I think I have to go back to examine that code, and see how often this StringBuffer is being used/manipulated. If not too often, I might replace it by a byte buffer, and do the conversion all at once each time it is being written out. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] Basic int/char conversion question
From: André Warnier [mailto:a...@ice-sa.com] Subject: Re: [OT] Basic int/char conversion question I must say that I find it a bit curious that Java does not have an easy out-of-the-box method to convert a byte to a char, with a character filter specifier. This would be possible only for 8-bit character sets. Since Java tries to be general, you must feed the converter a stream of bytes, rather than one at a time. If you already have an array of bytes, that can be wrapped in a ByteArrayInputStream and then further wrapped in an InputStreamReader, resulting in proper translation of the bytes to Unicode characters. I know that I don't lose information by converting iso-8859-2 (thinking it is iso-8859-1) to Unicode one way, then re-converting this Unicode to iso-8859-2 (re-using the iso-8859-1 filter). I will get the same bytes in the end. That may be true for 8859-1 and 8859-2, but I suspect it's not true in general. The preferred mappings for a Unicode character in a given encoding may not necessarily be the exact bytes given on input, especially if they've been sent through the wrong converter to begin with. Even a servlet filter won't do it, because by that time the response is committed. It will if you wrapper the response object and not commit the real one until you've set the desired header in the filter. For the same reason, I cannot just replace the InputStream by something that would translate these bytes on-the-fly to Unicode chars, because for high iso-8859-2 bytes, it would generate internal codes that do no longer fall into values 0-255, and that may create a problem somewhere deep in code I haven't yet looked at. I suspect that won't be a problem, unless the code is looking for something in the upper ranges. The example you posted showed it looking at control codes, which are the same in Unicode and any ISO-8859 variant. If the code is looking at high-order bytes, it's seriously flawed already. I still think the easiest thing for you to do is put in the InputStreamReader wrapper, and run your test cases. You should certainly examine the code for any erroneous tests, but those should be corrected rather than extending the existing kludge. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] Basic int/char conversion question
From: André Warnier [mailto:a...@ice-sa.com] Subject: Re: [OT] Basic int/char conversion question Suppose I do this : String knownEncoding = ISO-8859-1; // or ISO-8859-2 InputStreamReader fromApp; fromApp = = new InputStreamReader(socket.getInputStream(), Charset.forName(knownEncoding)); int ic = 0; StringBuffer buf = new StringBuffer(2000); while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append((char)ic); .. then I'm still appending the same char (really, byte) to my buffer, right ? No, it's not the same. It's the proper Unicode equivalent of the input byte (or bytes, for multi-byte character sets), not the original 8-bit value. You're responsible for setting the appropriate character set on InputStreamReader constructor to insure that conversion takes place. But by doing buf.append((char) ic) I am still interpreting ic as being, by platform default, ISO-8859-1, thus I am still appending the Unicode codepoint U00B5. That's not correct. The interpretation occurs on the read() operation on the InputStreamReader, not the cast to a char. The read() already converted the byte according to the specified Charset; if your input is 8859-2, you must use that on the InputStreamReader constructor. Or, can I / do I have to now also say : char ic = 0; while((ic = fromApp.read()) != 26 ic != -1) // hex 1A (SUB) buf.append(ic); That can't ever work, since a char is unsigned, so can never have a value of -1; you will get a compilation error since the result of the read() is an int, not a char. In other words, in order to keep my changes and post-festivities headaches to a minimum, I would like to keep buf being a StringBuffer. Which is exactly why you should use an InputStreamReader, not an InputStream, and not change anything else. So what I was really looking for was the correct alternative to buf.append((char) ic); You're looking in the wrong place; the conversion should occur as the input is being read, not during the append(). A cursory examination of the webapp code seems to show that the byte in question is only ever compared to either -1 or integers below 127, or characters in the lower ASCII range A-Za-z. Excellent; then wrappering the InputStream with an InputStreamReader set to the appropriate character set is *exactly* what you need. But is if (char == some-integer) always valid as a replacement for if (int == some-integer) No; a char is unsigned, which is why all read() methods return an int, not a byte or a char. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] Basic int/char conversion question
From: Len Popp [mailto:len.p...@gmail.com] Subject: Re: [OT] Basic int/char conversion question If there's an easy way to convert a single character, someone please point it out. Not particularly easy, but this should work: import java.io.ByteArrayInputStream; import java.io.InputStreamReader; import java.io.IOException; import java.nio.charset.Charset; public class Converter { byte[] ba = new byte[1]; InputStreamReader isr; public Converter(String csName) { isr = new InputStreamReader(new ByteArrayInputStream(ba), Charset.forName(csName)); } public char convert(byte b) { try { isr.reset(); ba[0] = b; return (char)isr.read(); } catch (IOException ioe) { // This can't happen in our situation, but... return '\0'; } } } The calling program merely has to instantiate a Converter once for the character set of interest, then call the convert method to translate the byte: cvt = new Converter(ISO-8859-2); ... myChar = cvt.convert(myByte); This of course only works for 8-bit character sets. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Basic int/char conversion question
2009/1/2 Caldarale, Charles R chuck.caldar...@unisys.com: From: Len Popp [mailto:len.p...@gmail.com] Subject: Re: [OT] Basic int/char conversion question If there's an easy way to convert a single character, someone please point it out. Not particularly easy, but this should work: (...) isr = new InputStreamReader(new ByteArrayInputStream(ba), Charset.forName(csName)); (...) isr.reset(); reset() is not implemented in InputStreamReader, as of Sun JDK 6u07 that I have installed, thus you have to make a direct call to ByteArrayInputStream.reset(). Well, it serves the same purpose as TranslationTable class that I have provided earlier. Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
RE: [OT] Basic int/char conversion question
From: Konstantin Kolinko [mailto:knst.koli...@gmail.com] Subject: Re: [OT] Basic int/char conversion question reset() is not implemented in InputStreamReader Quite correct; sorry - the revised code would be this: import java.io.ByteArrayInputStream; import java.io.InputStreamReader; import java.io.IOException; import java.nio.charset.Charset; public class Converter { byte[] ba = new byte[1]; ByteArrayInputStream bais = new ByteArrayInputStream(ba); InputStreamReader isr; public Converter(String csName) { isr = new InputStreamReader(bais, Charset.forName(csName)); } public char convert(byte b) { bais.reset(); ba[0] = b; try { return (char)isr.read(); } catch (IOException ioe) { // This can't happen in our situation, but... return '\0'; } } } Well, it serves the same purpose as TranslationTable class that I have provided earlier. True, and yours should be more efficient, and could be easily modified to create an instance for any given character set rather than using a static table. I think the Converter class above is more easily adaptable to multi-byte character sets should that ever be of interest. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org