> The reason this appears to work is because String.getBytes() encodes in > ISO-8859-1 encoding by default.
Thanks a lot for the above. Just want to summarize my understanding. C++ needs to explicitly decode the UTF8 encoded string, which is when it will interpret the characters properly. I can use the library ICU mentioned by Evans above. Also observed that MultiByteToWideChar(CP_UTF8,...) helps me with this. I cannot use wide string or ICU data structures, as I need to keep the data in char format as char is used by our DB libraries to communicate with stored procedures. Now then when I run WideCharToMultiByte(CP_ACP,...) after this, it converts the UTF8 wide string to ISO-8859-1 string which can be stored in char. For now I am fairly confident that java server would always return characters that can be represented by ISO-8859-1 (as this is a migration project from C++ server (no protobuf involved) to Java server and earlier this issue was never faced). Can we encode the protobuf data in ISO-8859-1 from the server end itself? (I understand in the long run, we need to migrate to DB libraries that support unicode and change the client code completely to work with wide characters) -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
