Re: [protobuf] Dealing with Corrupted Protocol Buffers

Jason Hsueh Fri, 21 Jan 2011 11:55:01 -0800

It will be rather difficult to correct for the error. The point at which the
parse fails may not be the point of corruption: e.g., the corruption may be
in a byte that is part of a varint, and the continuation bit may be set when
it shouldn't. Similarly you could have a corruption in the length delimiter
for a string or nested message field. Both could cause you to read more
bytes than you should have for that particular field. The encoding is dense
enough that the parser may merrily consume more bytes before encountering an
error to complain about.


You can try to mess with the bytes; you might be able to deal with errors
using some assumptions about the serialized data based on your protocol. But
in general, and going forward, you should write small messages in a
container format that allows for error recovery. Various threads from this
search<http://groups.google.com/group/protobuf/search?group=protobuf&q=container+format&qt_g=Search+this+group>
discuss
this issue.

On Thu, Jan 20, 2011 at 7:11 PM, Julius Schorzman <[email protected]> wrote:

> Thanks for the tip on CodedInputStream Evan!   I will explore it and
> if I get anything out of it will report back my findings for anyone
> else dealing with this issue.
>
> On Thu, Jan 20, 2011 at 6:27 PM, Evan Jones <[email protected]> wrote:
> > On Jan 20, 2011, at 2:48 , julius-schorzman wrote:
> >>
> >> My question is -- can anything be done to retrieve part of the file?
> >> It would be nice to know at which point in the file the problematic
> >> message occurred, and then I could crop to that point or do some
> >> manual exception -- but unfortunately this exception is very general.
> >> I find it hard to believe that a single mis-saved bit makes the whole
> >> file worthless.
> >
> > You are correct: your entire data is not worthless, but at the point of
> the
> > error, you will need some manual intervention to figure out what is going
> > on.
> >
> > It is probably possible to figure out the byte offset where this error
> > occurs. The CodedInputStream tracks some sort of bytesRead counter, I
> seem
> > to recall. However, this will require you to modify the source.
> >
> >
> >> I also find it curious that the source provides no way (that I can
> >> tell) to get at any lower level data in the p.b. since whenever I try
> >> to do anything with it it throws an exception.  Best I can tell I will
> >> have to write from scratch my own code to decode the p.b. file.
> >
> > The lowest level tools that are provided is CodedInputStream. But yes,
> you
> > will effectively have to "parse" the message yourself. Look at the code
> that
> > is generated for the mergeFrom method of your message to get an idea for
> how
> > it works, and you can read the encoding documentation:
> >
> > http://code.google.com/apis/protocolbuffers/docs/encoding.html
> >
> > You can definitely figure out what is going on, but it will be a bit of a
> > pain. Good luck,
> >
> > Evan Jones
> >
> > --
> > http://evanjones.ca/
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<protobuf%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Dealing with Corrupted Protocol Buffers

Reply via email to