> The reason this appears to work is because String.getBytes() encodes in
> ISO-8859-1 encoding by default.

Thanks a lot for the above. Just want to summarize my understanding.
C++ needs to explicitly decode the UTF8 encoded string, which is when
it will interpret the characters properly.
I can use the library ICU mentioned by Evans above. Also observed that
MultiByteToWideChar(CP_UTF8,...) helps me with this.
I cannot use wide string or ICU data structures, as I need to keep the
data in char format as char is used by our DB libraries to communicate
with stored procedures.
Now then when I run WideCharToMultiByte(CP_ACP,...) after this, it
converts the UTF8 wide string to ISO-8859-1 string which can be stored
in char.
For now I am fairly confident that java server would always return
characters that can be represented by ISO-8859-1 (as this is a
migration project from C++ server (no protobuf involved) to Java
server and earlier this issue was never faced).
Can we encode the protobuf data in ISO-8859-1 from the server end
itself?
(I understand in the long run, we need to migrate to DB libraries that
support unicode and change the client code completely to work with
wide characters)

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to