oh sorry for the confusion, when i change it to byte, its not giving
the error, but the value is gibberish which contains some special
characters and values of 2-3 fields together.

I don't know what the problem is but I found the solution, here is
what I am doing:
I searched online and read some python docs and then I wrote another
python script and processing each protobuf data like:

t = Title()

t.title = t.title.encode('utf-8')
t.description = t.description.encode('utf-8')
t.isbn = t.isbn.encode('utf-8')

and then writing it back to my database
title_str_pb = t.SerializeToString()

and now when I open it in c++, its not giving any error.

So, I think when I was adding the original data, I should have
called .encode('utf-8') on all the python strings.

Is there anything I am missing, or easy way to do it.

On Mar 20, 11:38 pm, Kenton Varda <ken...@google.com> wrote:
> If you changed all the "string" types to "bytes" instead, then you should
> not see that error.  Are you sure you did that?  If so, can you write a
> small demo program which produces this error, even when the protobuf type
> contains no "string" fields, and send it to me?
> On Fri, Mar 20, 2009 at 11:16 AM, <saad.a...@gmail.com> wrote:
> > I am not very experienced programmer, but I will try to explain whats
> > happening:
> > I have books titles database in protocol buffer format. The message
> > Title has fields like:
> > optional string title = 1;
> > optional string description = 2;
> > optional string isbn = 3
> > ...
> > ...
> > When I convert my mysql data to pb, i use python and store it using
> > title_str_pb = title.SerializeToString()
> > When I read back the titles in python, everything works fine. Like:
> > t = Title()
> > t.ParseFromString(title_str_pb)
> > title = t.title
> > description = t.description
> > But now I want to use this protocol buffer data in c++. like:
> > Title t;
> > t.ParseFromString(title_str_pb)
> > and I get error:
> > Encountered string containing invalid UTF-8 data while parsing
> > protocol buffer. Strings must contain only UTF-8; use the 'bytes' type
> > for raw bytes.
> > I changed the string type to bytes type, then also I get the same
> > error.
> > I have a million book records stored in pb format. I don't want to
> > loose my data. Can somebody help please. As an alternative I will
> > restore my data back using python. But I want to use it in c++.
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to