Thanks for pointing that out Evans.
> The Java protocol buffer API encodes strings as UTF-8. Since C++ has
> no unicode support, what you get on the other end is the raw UTF-8
> encoded data.
I was of the opinion that UTF8 encoding encodes each character using 8
bits or a byte. So not sure as to why the raw encoded data represents
the character using 2 bytes instead of one. Also if on the Java end,
if on the stream writer, I add something like:
writer.write(new String(msg.getBytes(), "UTF8").getBytes()) instead of
simply writer.write(msg.getBytes()), I see the characters as expected
on the C++ client. However this I believe messes up with the protobuf
headers, so on C++ I receive only a partial file upto the entry that
contains one such character.

> encoded data. You'll need to use some Unicode API to process it in
> whatever way your application requires. I suggest ICU:
Trying this out now. Will post an update shortly.

Thanks for the prompt response.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to