On Sun, May 10, 2009 at 6:08 AM, edan <edan...@gmail.com> wrote: > I have some fields that may contain non-UTF8 data. > I understand that I just need to change their type from "string" to "bytes" > and it should just work, transparently.
yes. The're the same on the wire. > I have a few fields that probably will only contain ASCII i.e. legal UTF8, > but I'm not 100% sure. > I am tempted to just turn them all to "bytes". > But this begs the question - what is the "string" type useful for, and why > shouldn't I just always use "bytes" to be sure, all the time, and not both > with "string" at all? > Does "string" add anything besides validation that only valid UTF8 is > passing over the wire? Is there really a big benefit to this behavior? Or > is there some other advantage that I'll miss out on by changing all my > "string"s to "bytes"? If you use the C++ api there is not much difference since both types are represented as std::string in the API. It makes a big difference for the Java API (and Python?), that have a native type for an UTF-8 string. In Java, if you deal with a protocol buffer 'string' type, the generated API will return a java.lang.String while otherwise it will return a ByteString. ByteString can hold any character while the native Java String works only for UTF-8. So while 'ByteString' is more flexible, 'String' is more convenient to deal with within Java code because all string manipulation libraries can handle it. So the benefit is a more convenient Api in the generated Java code. And as well documentation: if you use 'string' you emphasize that a field only contains readable text while 'bytes' might contain any binary blob. -h --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---