Re: Unicode question

Patrick Lee Tue, 16 Jul 2002 03:35:45 -0700

Hi Sergei


>In the StringUtil class the putting into Unicode is
>   public static void putUncompressedUnicode(final String input,
>            final byte[] output,
>            final int offset) {
>        int strlen = input.length();
>
>        for (int k = 0; k < strlen; k++) {
>            char c = input.charAt(k);
>
>            output[offset + (2 * k)] = (byte) c;
>            output[offset + (2 * k) + 1] = (byte) (c >> 8);
>        }
>    }
>
>For latin symbols it will be:
>abc->a_b_c_  (underline is substitution for the 0x00)
>
>
>In the getFromUnicode:
>         byte[] bstring = new byte[len];
>         int index = offset + 1;
>         // start with low bits.
>
>         for (int k = 0; k < len; k++) {
>             bstring[k] = string[index];
>             index += 2;
>        }
>it is supposed that
>_a_b_c -> abc
>
>Could you make it clear?
>Unicode is _a_b_c (the first byte has high byte and the second byte has
little byte ) ?

I think you are right about what the code does.  As to why the two functions
are not compatible.  I will try to explain my view.  Correct me if I am
wrong. It could be the first function is reading from the media i.e. the
excel file and it is storing bits in LittleEndian format.  That is, low
order byte come in first in a sequence and the high order byte follows.
Whereas the second function deals with compressing (trimming) Unicode string
IN memory.  Therefore, it is already in BigEndian format.

Cheers
Patrick Lee






--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Unicode question

Reply via email to