Thanks for pointing that out Evans.
> The Java protocol buffer API encodes strings as UTF-8. Since C++ has
> no unicode support, what you get on the other end is the raw UTF-8
> encoded data.
I was of the opinion that UTF8 encoding encodes each character using 8
bits or a byte. So not sure as to why the raw encoded data represents
the character using 2 bytes instead of one. Also if on the Java end,
if on the stream writer, I add something like:
writer.write(new String(msg.getBytes(), "UTF8").getBytes()) instead of
simply writer.write(msg.getBytes()), I see the characters as expected
on the C++ client. However this I believe messes up with the protobuf
headers, so on C++ I receive only a partial file upto the entry that
contains one such character.

> encoded data. You'll need to use some Unicode API to process it in
> whatever way your application requires. I suggest ICU:
Trying this out now. Will post an update shortly.

Thanks for the prompt response.

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to