On Wed, May 16, 2018 at 7:39 AM Dmitry Timofeev <[email protected]>
wrote:

> Hi,
>
> I consider if Protocol Buffers can be possibly used in an application that
> requires canonical representation of messages coming from external source.
>
> The encoding and proto3 guide [1, 2] include several requirements for a
> parser that make it accept non-canonical data (this list is probably not
> exhaustive):
>   - Message fields may appear in any order
>   - There are might be multiple instances of the same *non-repeated* field
>   - Message may contain unknown fields
>   - ¿Default values of primitives may appear on the wire
>   - Map entries may appear in any order
>   - Repeated fields of primitives may be packed or unpacked.
>
> 1. Is there any natural way to extend the parser with checks of canonical
> form?
>
No.


> By "natural" I mean a compiler and/or runtime plugin, something that does
> not require a fork of the project.
>
2. If not, does such optional feature make sense in Protocol Buffers? Would
> you accept an option that makes the generated reader code 'strict',
> rejecting non-canonical representations, and, consequently, not
> forward-compatible?
>
Also no here. Compatibility is one of the main reasons to use protobuf
because it allows you to evolve your protocol without breaking anyone in a
complex system. If you don't need compatibility at all (i.e., you will
never change your protocol), using a C++ struct will be much more
performant than protobuf because you can skip the whole
parsing/serialization cost.

There is a way to mimic the behavior you want though:
1. parse the input data to a proto message
2. check if the proto message has any unknown fields; if any, report error
3. serialize the proto message using deterministic serialization (
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/coded_stream.h#L842
)
4. compare the serialized data against the input data; if they match, the
input data is in the "canonical form"; if not, report error

It will incur an additional serialization cost, but can get you close
enough to the canonical form.

>
> Thanks,
> Dmitry
>
> [1] https://developers.google.com/protocol-buffers/docs/encoding
> [2] https://developers.google.com/protocol-buffers/docs/proto3
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to