Thanks for pointing that out Evans. > The Java protocol buffer API encodes strings as UTF-8. Since C++ has > no unicode support, what you get on the other end is the raw UTF-8 > encoded data. I was of the opinion that UTF8 encoding encodes each character using 8 bits or a byte. So not sure as to why the raw encoded data represents the character using 2 bytes instead of one. Also if on the Java end, if on the stream writer, I add something like: writer.write(new String(msg.getBytes(), "UTF8").getBytes()) instead of simply writer.write(msg.getBytes()), I see the characters as expected on the C++ client. However this I believe messes up with the protobuf headers, so on C++ I receive only a partial file upto the entry that contains one such character.
> encoded data. You'll need to use some Unicode API to process it in > whatever way your application requires. I suggest ICU: Trying this out now. Will post an update shortly. Thanks for the prompt response. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.