Re: string vs. bytes

2009-05-12 Thread dan . schmidt . valle
I am having a very similar problem. Just installed the 2.0.3 version and now all my serialisations complain. libprotobuf ERROR ./google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes'

Re: string vs. bytes

2009-05-12 Thread Henner Zeller
Hi, On Tue, May 12, 2009 at 6:47 AM, dan.schmidt.va...@gmail.com wrote: I am having a very similar problem. Just installed the 2.0.3 version and now all my serialisations complain. libprotobuf ERROR ./google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data

Re: string vs. bytes

2009-05-12 Thread dan . schmidt . valle
Thanks very much for the answers guys. Most illustrative. The error messages did in fact disappear with that simple change in all my proto files. Still, now that this error has shown in the code I have, I keep wondering whether the fact that I'm serialising to string is inefficient. What would

Re: string vs. bytes

2009-05-12 Thread Kenton Varda
The serialized message is just an array of bytes. We use std::string as an efficient container for these bytes, but it is still just storing bytes. std::string, unlike Java's String, only contains bytes, not unicode characters. So, there is no performance penalty. In fact, serializing to a

Re: string vs. bytes

2009-05-10 Thread Henner Zeller
On Sun, May 10, 2009 at 6:08 AM, edan edan...@gmail.com wrote: I have some fields that may contain non-UTF8 data. I understand that I just need to change their type from string to bytes and it should just work, transparently. yes. The're the same on the wire. I have a few fields that