Re: [OT] Basic int/char conversion question

2009-01-13 Thread André Warnier

Hi.

Christopher Schultz wrote:


André,

André Warnier wrote:

an existing webapp reads from a socket connected to an external program.
The input stream is created as follows :
fromApp = socket.getInputStream();
The read is as follows :
StringBuffer buf = new StringBuffer(2000);
int ic;
while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
   buf.append((char)ic);

This is wrong, because it assumes that the input stream is always in an
8-bit default platform encoding, which it isn't.


Does it?

The only assumption I see here is that the byte code 0x1a has a special
meaning. Since ASCII is usually the lowest common denominator for
character encodings, is this a bad assumption?


Considering the often devious ways in which character encoding questions 
can come back to bite one, I am not so sure.
By doing a read(), the app currently consumes one byte, whether it 
matches 0x1A or not. If the input stream was UTF-8 for instance, that 
byte might be the 2d, or 3rd byte of a multi-byte UTF-8 character 
sequence, which might happen to have the integer value 0x1A, although 
it's meaning would be totally different.
(I have not re-checked the UTF-8 encoding to verify if that is a 
possible value for a 2d or 3rd byte, but I think it is).





How do I do this correctly, assuming that I do know that the incoming
stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit
encoding is being used (such as iso-8859-1 or iso-8859-2) ?
I cannot change the InputStream into something else, because there are a
zillion other places where this webapp tests on the read byte's value,
numerically.


and there are other places where the byte is being tested against 
other values than 0x1A.




I like Chuck's suggestion to use an InputStreamReader because the
interfaces are (at least accidentally) the same, at least for the method
in question. 


Me too. It is the most logical, and the one which I would apply if I 
were to rewrite this app from scratch.  I would also have the other app 
(the one which sends this stream to the webapp) send some kind of prefix 
to the stream, indicating the encoding used. (Or at least have both that 
app and the webapp have some external parameter telling them 
respectively what to send and what to expect).


I'm not sure how you would modify an entire application to

fix this code everywhere, though.


Right. I was trying to find a magic shortcut. At first I was hoping that 
I could just do some kind of string replace patch with Notepad, 
directly on the compiled classes.  Unfortunately, considering these byte 
tests in several places, I can't.


Thanks again for all the suggestions though.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

André Warnier wrote:
 an existing webapp reads from a socket connected to an external program.
 The input stream is created as follows :
 fromApp = socket.getInputStream();
 The read is as follows :
 StringBuffer buf = new StringBuffer(2000);
 int ic;
 while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
buf.append((char)ic);
 
 This is wrong, because it assumes that the input stream is always in an
 8-bit default platform encoding, which it isn't.

Does it?

The only assumption I see here is that the byte code 0x1a has a special
meaning. Since ASCII is usually the lowest common denominator for
character encodings, is this a bad assumption?

 How do I do this correctly, assuming that I do know that the incoming
 stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit
 encoding is being used (such as iso-8859-1 or iso-8859-2) ?
 I cannot change the InputStream into something else, because there are a
 zillion other places where this webapp tests on the read byte's value,
 numerically.

I like Chuck's suggestion to use an InputStreamReader because the
interfaces are (at least accidentally) the same, at least for the method
in question. I'm not sure how you would modify an entire application to
fix this code everywhere, though.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklr6QEACgkQ9CaO5/Lv0PBYDQCgk0iWAsuvSlujYJRCiWerrHXg
lFIAnio6Qts6FMg1lWZZvNSkqvNLY70p
=z+yg
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-02 Thread André Warnier

Caldarale, Charles R wrote:

From: Konstantin Kolinko [mailto:knst.koli...@gmail.com]
Subject: Re: [OT] Basic int/char conversion question

reset() is not implemented in InputStreamReader


Quite correct; sorry - the revised code would be this:

import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.nio.charset.Charset;

public class Converter {
  byte[] ba = new byte[1];
  ByteArrayInputStream bais = new ByteArrayInputStream(ba);
  InputStreamReader isr;

  public Converter(String csName) {
isr = new InputStreamReader(bais, Charset.forName(csName));
  }

  public char convert(byte b) {
bais.reset();
ba[0] = b;
try {
  return (char)isr.read();
} catch (IOException ioe) {
  // This can't happen in our situation, but...
  return '\0';
}
  }
}


Once again, thank you all for all the information provided.
The real solution would be to rewrite that whole application (external 
feeder program included), to make it take charset variations into 
account.  Better yet, it should all be rewritten to use Unicode and 
UTF-8, and be done with it.
I don't have that option now, so I will use one of the techniques 
provided, depending on what I find is easiest to implement here for the 
specific limited problem at hand.
But I will preserve all your code snippets and tips, because this is all 
extremely difficult to find in the Java documentation, at least from 
this angle.
Once again, as I believe Chuck once wrote, when one knows how to phrase 
the question, one probably has already 90% of the answer.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-02 Thread Ken Bowen


On Jan 2, 2009, at 7:39 AM, André Warnier wrote:

Once again, as I believe Chuck once wrote, when one knows how to  
phrase the question, one probably has already 90% of the answer.



Sometimes, posing the question solves the problem.  More than once,  
thinking I was stuck, I set out to compose a complete well thought  
out post to this or some other list.  At some point, it occurs to me,  
Gee, I don't think so, but I'd better check that point too, and  
bingo! -- that was the issue.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: [OT] Basic int/char conversion question

 I cannot change the InputStream into something else

Actually, I think you can.  If you wrapper the InputStream with an 
InputStreamReader specifying the desired character set, the rest of the code 
can continue to use the read() method and check for specific character (rather 
than byte) values.  For example:

fromApp = new InputStreamReader(socket.getInputStream(), 
Charset.forName(ISO-8859-2));

As long as the code is checking for values in the ASCII range (0 - 127) or -1, 
I believe this will work for you.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-01 Thread Len Popp
On Thu, Jan 1, 2009 at 11:13, André Warnier a...@ice-sa.com wrote:
 Hi.

 This has nothing specific to Tomcat, it's just a problem I'm having as a
 non-java expert in modifying an exiting webapp.
 I hope someone on this list can answer quickly, or send me to the
 appropriate place to find out.  I have tried to find, but get somewhat lost
 in the Java docs.

 Problem :
 an existing webapp reads from a socket connected to an external program.
 The input stream is created as follows :
 fromApp = socket.getInputStream();
 The read is as follows :
 StringBuffer buf = new StringBuffer(2000);
 int ic;
 while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
   buf.append((char)ic);

 This is wrong, because it assumes that the input stream is always in an
 8-bit default platform encoding, which it isn't.

 How do I do this correctly, assuming that I do know that the incoming stream
 is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is
 being used (such as iso-8859-1 or iso-8859-2) ?
 I cannot change the InputStream into something else, because there are a
 zillion other places where this webapp tests on the read byte's value,
 numerically.

 I mean, to append correctly to buf what was read in the int, knowing
 that the proper encoding (charset) of fromApp is X, how do I write this
 ?

 Thanks.


Another option: Read the bytes into a ByteBuffer, then convert the
bytes into a string. You can tell the String constructor which charset
to use.
-- 
Len


RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: Len Popp [mailto:len.p...@gmail.com]
 Subject: Re: [OT] Basic int/char conversion question

 Another option: Read the bytes into a ByteBuffer, then convert
 the bytes into a string. You can tell the String constructor
 which charset to use.

That would seem to violate one of the specified constraints:

  I cannot change the InputStream into something else,
  because there are a zillion other places where this
  webapp tests on the read byte's value, numerically.

whereas using an InputStreamReader would not since its read() method is 
compatible with that of a plain InputStream.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-01 Thread André Warnier

Caldarale, Charles R wrote:

From: Len Popp [mailto:len.p...@gmail.com]
Subject: Re: [OT] Basic int/char conversion question


I note with satisfaction that I'm not the only one laboring away on this 
day-after, but you're just all going a bit too fast for me and my 
growing but still limited Java knowledge.


I like the idea of wrapping this in an InputStreamReader, but how is 
this solving my problem ?

Suppose I do this :

String knownEncoding = ISO-8859-1; // or ISO-8859-2
InputStreamReader fromApp;
fromApp =  = new InputStreamReader(socket.getInputStream(), 
Charset.forName(knownEncoding));

int ic = 0;
StringBuffer buf = new StringBuffer(2000);
while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
  buf.append((char)ic);

.. then I'm still appending the same char (really, byte) to my buffer, 
right ?


For example, suppose the byte in question is \xB5 on the fromApp stream.
Interpreted as iso-8859-1, this would be the character micro, with 
Unicode codepoint U00B5, which I would thus append to the StringBuffer 
(which is, unless I am mistaken, composed of Unicode characters).
However, if the fromApp stream really was iso-8859-2, this same byte 
\xB5 should be interpreted as the Unicode codepoint U013E (LATIN SMALL 
LETTER L WITH CARON). ( ł )

But by doing
buf.append((char) ic)
I am still interpreting ic as being, by platform default, ISO-8859-1, 
thus I am still appending the Unicode codepoint U00B5.

Not so ?

Or, can I / do I have to now also say :
char ic = 0;
while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
  buf.append(ic);

??

In other words, in order to keep my changes and post-festivities 
headaches to a minimum, I would like to keep buf being a StringBuffer. 
So what I was really looking for was the correct alternative to

  buf.append((char) ic);
which would convert ic from an integer, to the appropriate Unicode 
character, taking into account the knownEncoding which I know.


Does that not exist ?

A cursory examination of the webapp code seems to show that the byte in 
question is only ever compared to either -1 or integers below 127, or 
characters in the lower ASCII range A-Za-z.

But is
if (char == some-integer)
always valid as a replacement for
if (int == some-integer)
?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-01 Thread Len Popp
On Thu, Jan 1, 2009 at 14:39, André Warnier a...@ice-sa.com wrote:
 I note with satisfaction that I'm not the only one laboring away on this
 day-after, but you're just all going a bit too fast for me and my growing
 but still limited Java knowledge.

No hang-over here. :-)

 In other words, in order to keep my changes and post-festivities headaches
 to a minimum, I would like to keep buf being a StringBuffer. So what I was
 really looking for was the correct alternative to
  buf.append((char) ic);
 which would convert ic from an integer, to the appropriate Unicode
 character, taking into account the knownEncoding which I know.

 Does that not exist ?

(I'll leave the InputStreamReader explanation to Chuck.)

I was guessing that the StringBuffer would soon be converted to a
String (which is the usual case). If not …

I don't see a simple one-line way to convert one byte to a character
in a given charset. It looks like String and CharsetDecoder are the
classes you're supposed to use. If there's an easy way to convert a
single character, someone please point it out.

How about this: Read the bytes as bytes, convert them to a String in
the correct charset, and create a StringBuffer from that. Like so:

  String knownEncoding = ISO-8859-1; // or ISO-8859-2
  InputStreamReader fromApp;
  fromApp =  = new InputStreamReader(socket.getInputStream(),
  int ic = 0;
  ByteBuffer inbuf = ByteBuffer.allocate(2000);
  while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
   inbuf.put((char)ic);
  byte[] inbytes = new byte[inbuf.limit()];
  inbuf.get(inbytes);
  String s = new String(inbytes, knownEncoding);
  StringBuffer buf = new StringBuffer(s);

(I haven't tested this so it might not be correct.)
It's not very efficient but it keeps the changes in one place.
-- 
Len


RE: [OT] Basic int/char conversion question

2009-01-01 Thread Martin Gainty

Andre/Len

in case the earlier responses did not answer how to receive a CharSet encoded 
InputStream to a reader
suggest implmenting a Reader which will accomodate charset (such as 
InputStreamReader)
http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html

InputStreamReader
public InputStreamReader(InputStream in,
 Charset cs)
Create an InputStreamReader that uses the given charset. 


Parameters:in - An InputStreamcs - A charsetSince:1.4
BTW: Int to Char conversion
String new_string=new java.lang.Integer(int int_input).toString();

BTW: CharToInt conversion
int inta=new java.lang.Integer(String str_input).intValue();

HTH
Martin 

__ 
Disclaimer and confidentiality note 
Everything in this e-mail and any attachments relates to the official business 
of Sender. This transmission is of a confidential nature and Sender does not 
endorse distribution to any party other than intended recipient. Sender does 
not necessarily endorse content contained within this transmission. 




 Date: Thu, 1 Jan 2009 16:23:05 -0500
 From: len.p...@gmail.com
 To: users@tomcat.apache.org
 Subject: Re: [OT] Basic int/char conversion question
 
 On Thu, Jan 1, 2009 at 14:39, André Warnier a...@ice-sa.com wrote:
  I note with satisfaction that I'm not the only one laboring away on this
  day-after, but you're just all going a bit too fast for me and my growing
  but still limited Java knowledge.
 
 No hang-over here. :-)
 
  In other words, in order to keep my changes and post-festivities headaches
  to a minimum, I would like to keep buf being a StringBuffer. So what I was
  really looking for was the correct alternative to
   buf.append((char) ic);
  which would convert ic from an integer, to the appropriate Unicode
  character, taking into account the knownEncoding which I know.
 
  Does that not exist ?
 
 (I'll leave the InputStreamReader explanation to Chuck.)
 
 I was guessing that the StringBuffer would soon be converted to a
 String (which is the usual case). If not …
 
 I don't see a simple one-line way to convert one byte to a character
 in a given charset. It looks like String and CharsetDecoder are the
 classes you're supposed to use. If there's an easy way to convert a
 single character, someone please point it out.
 
 How about this: Read the bytes as bytes, convert them to a String in
 the correct charset, and create a StringBuffer from that. Like so:
 
   String knownEncoding = ISO-8859-1; // or ISO-8859-2
   InputStreamReader fromApp;
   fromApp =  = new InputStreamReader(socket.getInputStream(),
   int ic = 0;
   ByteBuffer inbuf = ByteBuffer.allocate(2000);
   while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
inbuf.put((char)ic);
   byte[] inbytes = new byte[inbuf.limit()];
   inbuf.get(inbytes);
   String s = new String(inbytes, knownEncoding);
   StringBuffer buf = new StringBuffer(s);
 
 (I haven't tested this so it might not be correct.)
 It's not very efficient but it keeps the changes in one place.
 -- 
 Len

_
Life on your PC is safer, easier, and more enjoyable with Windows Vista®. 
http://clk.atdmt.com/MRT/go/127032870/direct/01/

Re: [OT] Basic int/char conversion question

2009-01-01 Thread Konstantin Kolinko
2009/1/1 André Warnier a...@ice-sa.com:
 Hi.

 This has nothing specific to Tomcat, it's just a problem I'm having as a
 non-java expert in modifying an exiting webapp.
 I hope someone on this list can answer quickly, or send me to the
 appropriate place to find out.  I have tried to find, but get somewhat lost
 in the Java docs.

 Problem :
 an existing webapp reads from a socket connected to an external program.
 The input stream is created as follows :
 fromApp = socket.getInputStream();
 The read is as follows :
 StringBuffer buf = new StringBuffer(2000);
 int ic;
 while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
   buf.append((char)ic);

 This is wrong, because it assumes that the input stream is always in an
 8-bit default platform encoding, which it isn't.

 How do I do this correctly, assuming that I do know that the incoming stream
 is an 8-bit stream (like iso-8859-x), and I do know which 8-bit encoding is
 being used (such as iso-8859-1 or iso-8859-2) ?
 I cannot change the InputStream into something else, because there are a
 zillion other places where this webapp tests on the read byte's value,
 numerically.

 I mean, to append correctly to buf what was read in the int, knowing
 that the proper encoding (charset) of fromApp is X, how do I write this
 ?


1. Using iso-8859-1 does not loose any information. That is, you can later
print this out to iso-8859-1 stream, you will get exactly those 8-bit bytes
of iso-8859-2 as were in input.

If you need correctly Unicode, though, you can convert them by calling
String.getBytes(encoding) and new String(bytes, encoding).

new String(str.getBytes(ISO-8859-1), ISO-8859-2)

2. Well, the above, and all the others' tips I have read in this thread so far
are the right ones. Those are what you should do when you are engineering
and writing a well-made application. That is, you have to go with
InputStreamReader, String, CharsetDecoder APIs and that will take care of
various encodings, including multi-byte ones.

In you case, when you are tailoring some oddly (bad) written specific
application
to your specific environment, and do not expect much, there is a
simple approach:
implement this conversion by using a lookup table.

You will just need some static table of 256 chars and you are done.

For example,

package mypackage;
import java.io.UnsupportedEncodingException;

public class TranslationTable {
  private static char[] table;

  static {
 // static initialization block

 byte[] bytes = new byte[256];
 for (int i=0; ibytes.length; i++){
bytes[i] = (byte) i;
 }

 try {
table = new String(bytes, ISO-8859-2).toCharArray();
 } catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
//System.exit(1);
throw new Error(Class initialization failed, ex);
 }
  }

  public static char lookup(int i) {
 // will throw ArrayIndexOutOfBoundsException if i is -1, but that
should be OK
 return table[i];
  }
}

and replace

   buf.append((char)ic);

with

  buf.append(TranslationTable.lookup(ic));

Also, I would replace StringBuffer with StringBuilder, if you are
running in Java 5 or
later, but that is another story.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-01 Thread André Warnier

To Konstantin and all the others who have responded,
many thanks for all the tips, specially since this was quite a bit 
off-topic.
I need some time to digest the tips though, and choose the best way 
according to the code that was dumped in my lap.


I must say that I find it a bit curious that Java does not have an easy 
out-of-the-box method to convert a byte to a char, with a character 
filter specifier. Something like

char mychar = toChar(int,charset) (or int.toChar(charset))
Oh well, maybe Java 7..

To Konstantin in particular :
I know that I don't lose information by converting iso-8859-2 (thinking 
it is iso-8859-1) to Unicode one way, then re-converting this Unicode to 
iso-8859-2 (re-using the iso-8859-1 filter).  I will get the same bytes 
in the end.
The problem is that this is a servlet writing the result to the response 
object.  And if I tell it to use iso-8859-1 for the response, it 
automatically also sets the response Content-Type to iso-8859-1.

Which in this case is wrong, because the browser then gets confused.
And as I have found out, it is quite hard to change this Content-Type 
header after-the-fact.
Even a servlet filter won't do it, because by that time the response is 
committed.
Even the front-end Apache can't do it, because it won't let you change 
the Content-Type header..


So my problem is in reverse :
The servlet must set the response output encoding to iso-8859-2, in 
order to produce the correct Content-Type for the browser. To produce 
correct iso-8859-2 from the internal Unicode string, this Unicode string 
must have the proper Unicode chars corresponding to the iso-8859-2 
characters I want to output.
But the servlet reads those bytes as int's, and does a bunch of internal 
tests and manipulations on them, without taking into account that they 
could be anything else than iso-8859-1.


For the same reason, I cannot just replace the InputStream by something 
that would translate these bytes on-the-fly to Unicode chars, because 
for high iso-8859-2 bytes, it would generate internal codes that do no 
longer fall into values 0-255, and that may create a problem somewhere 
deep in code I haven't yet looked at.


I think I have to go back to examine that code, and see how often this 
StringBuffer is being used/manipulated.  If not too often, I might 
replace it by a byte buffer, and do the conversion all at once each time 
it is being written out.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: [OT] Basic int/char conversion question

 I must say that I find it a bit curious that Java does not
 have an easy out-of-the-box method to convert a byte to a
 char, with a character filter specifier.

This would be possible only for 8-bit character sets.  Since Java tries to be 
general, you must feed the converter a stream of bytes, rather than one at a 
time.  If you already have an array of bytes, that can be wrapped in a 
ByteArrayInputStream and then further wrapped in an InputStreamReader, 
resulting in proper translation of the bytes to Unicode characters.

 I know that I don't lose information by converting
 iso-8859-2 (thinking it is iso-8859-1) to Unicode
 one way, then re-converting this Unicode to iso-8859-2
 (re-using the iso-8859-1 filter).  I will get the
 same bytes in the end.

That may be true for 8859-1 and 8859-2, but I suspect it's not true in general. 
 The preferred mappings for a Unicode character in a given encoding may not 
necessarily be the exact bytes given on input, especially if they've been sent 
through the wrong converter to begin with.

 Even a servlet filter won't do it, because by that time the
 response is committed.

It will if you wrapper the response object and not commit the real one until 
you've set the desired header in the filter.

 For the same reason, I cannot just replace the InputStream
 by something that would translate these bytes on-the-fly to
 Unicode chars, because for high iso-8859-2 bytes, it would
 generate internal codes that do no longer fall into values
 0-255, and that may create a problem somewhere deep in code
 I haven't yet looked at.

I suspect that won't be a problem, unless the code is looking for something in 
the upper ranges.  The example you posted showed it looking at control codes, 
which are the same in Unicode and any ISO-8859 variant.  If the code is looking 
at high-order bytes, it's seriously flawed already.

I still think the easiest thing for you to do is put in the InputStreamReader 
wrapper, and run your test cases.  You should certainly examine the code for 
any erroneous tests, but those should be corrected rather than extending the 
existing kludge.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: [OT] Basic int/char conversion question

 Suppose I do this :

 String knownEncoding = ISO-8859-1; // or ISO-8859-2
 InputStreamReader fromApp;
 fromApp =  = new InputStreamReader(socket.getInputStream(),
 Charset.forName(knownEncoding));
 int ic = 0;
 StringBuffer buf = new StringBuffer(2000);
 while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
buf.append((char)ic);

 .. then I'm still appending the same char (really, byte) to my
 buffer, right ?

No, it's not the same.  It's the proper Unicode equivalent of the input byte 
(or bytes, for multi-byte character sets), not the original 8-bit value.  
You're responsible for setting the appropriate character set on 
InputStreamReader constructor to insure that conversion takes place.

 But by doing
 buf.append((char) ic)
 I am still interpreting ic as being, by platform default, ISO-8859-1,
 thus I am still appending the Unicode codepoint U00B5.

That's not correct.  The interpretation occurs on the read() operation on the 
InputStreamReader, not the cast to a char.  The read() already converted the 
byte according to the specified Charset; if your input is 8859-2, you must use 
that on the InputStreamReader constructor.

 Or, can I / do I have to now also say :
 char ic = 0;
 while((ic = fromApp.read()) != 26  ic != -1) // hex 1A (SUB)
buf.append(ic);

That can't ever work, since a char is unsigned, so can never have a value of 
-1; you will get a compilation error since the result of the read() is an int, 
not a char.

 In other words, in order to keep my changes and post-festivities
 headaches to a minimum, I would like to keep buf being a StringBuffer.

Which is exactly why you should use an InputStreamReader, not an InputStream, 
and not change anything else.

 So what I was really looking for was the correct alternative to
buf.append((char) ic);

You're looking in the wrong place; the conversion should occur as the input is 
being read, not during the append().

 A cursory examination of the webapp code seems to show that
 the byte in question is only ever compared to either -1 or
 integers below 127, or characters in the lower ASCII range
 A-Za-z.

Excellent; then wrappering the InputStream with an InputStreamReader set to the 
appropriate character set is *exactly* what you need.

 But is
 if (char == some-integer)
 always valid as a replacement for
 if (int == some-integer)

No; a char is unsigned, which is why all read() methods return an int, not a 
byte or a char.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: Len Popp [mailto:len.p...@gmail.com]
 Subject: Re: [OT] Basic int/char conversion question

 If there's an easy way to convert a single character,
 someone please point it out.

Not particularly easy, but this should work:

import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.nio.charset.Charset;

public class Converter {
  byte[] ba = new byte[1];
  InputStreamReader isr;

  public Converter(String csName) {
isr = new InputStreamReader(new ByteArrayInputStream(ba), 
Charset.forName(csName));
  }

  public char convert(byte b) {
try {
  isr.reset();
  ba[0] = b;
  return (char)isr.read();
} catch (IOException ioe) {
  // This can't happen in our situation, but...
  return '\0';
}
  }
}

The calling program merely has to instantiate a Converter once for the 
character set of interest, then call the convert method to translate the byte:

cvt = new Converter(ISO-8859-2);
...
myChar = cvt.convert(myByte);

This of course only works for 8-bit character sets.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [OT] Basic int/char conversion question

2009-01-01 Thread Konstantin Kolinko
2009/1/2 Caldarale, Charles R chuck.caldar...@unisys.com:
 From: Len Popp [mailto:len.p...@gmail.com]
 Subject: Re: [OT] Basic int/char conversion question

 If there's an easy way to convert a single character,
 someone please point it out.

 Not particularly easy, but this should work:

 (...)
isr = new InputStreamReader(new ByteArrayInputStream(ba), 
 Charset.forName(csName));
 (...)
  isr.reset();


reset() is not implemented in InputStreamReader, as of Sun JDK 6u07 that
I have installed, thus you have to make a direct call to
ByteArrayInputStream.reset().

Well, it serves the same purpose as TranslationTable class that I have
provided earlier.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [OT] Basic int/char conversion question

2009-01-01 Thread Caldarale, Charles R
 From: Konstantin Kolinko [mailto:knst.koli...@gmail.com]
 Subject: Re: [OT] Basic int/char conversion question

 reset() is not implemented in InputStreamReader

Quite correct; sorry - the revised code would be this:

import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.nio.charset.Charset;

public class Converter {
  byte[] ba = new byte[1];
  ByteArrayInputStream bais = new ByteArrayInputStream(ba);
  InputStreamReader isr;

  public Converter(String csName) {
isr = new InputStreamReader(bais, Charset.forName(csName));
  }

  public char convert(byte b) {
bais.reset();
ba[0] = b;
try {
  return (char)isr.read();
} catch (IOException ioe) {
  // This can't happen in our situation, but...
  return '\0';
}
  }
}

 Well, it serves the same purpose as TranslationTable class that
 I have provided earlier.

True, and yours should be more efficient, and could be easily modified to 
create an instance for any given character set rather than using a static 
table.  I think the Converter class above is more easily adaptable to 
multi-byte character sets should that ever be of interest.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org