On Wed, May 16, 2018 at 7:39 AM Dmitry Timofeev <dmytro.tymof...@bitfury.com>
wrote:

> Hi,
>
> I consider if Protocol Buffers can be possibly used in an application that
> requires canonical representation of messages coming from external source.
>
> The encoding and proto3 guide [1, 2] include several requirements for a
> parser that make it accept non-canonical data (this list is probably not
> exhaustive):
>   - Message fields may appear in any order
>   - There are might be multiple instances of the same *non-repeated* field
>   - Message may contain unknown fields
>   - ¿Default values of primitives may appear on the wire
>   - Map entries may appear in any order
>   - Repeated fields of primitives may be packed or unpacked.
>
> 1. Is there any natural way to extend the parser with checks of canonical
> form?
>
No.


> By "natural" I mean a compiler and/or runtime plugin, something that does
> not require a fork of the project.
>
2. If not, does such optional feature make sense in Protocol Buffers? Would
> you accept an option that makes the generated reader code 'strict',
> rejecting non-canonical representations, and, consequently, not
> forward-compatible?
>
Also no here. Compatibility is one of the main reasons to use protobuf
because it allows you to evolve your protocol without breaking anyone in a
complex system. If you don't need compatibility at all (i.e., you will
never change your protocol), using a C++ struct will be much more
performant than protobuf because you can skip the whole
parsing/serialization cost.

There is a way to mimic the behavior you want though:
1. parse the input data to a proto message
2. check if the proto message has any unknown fields; if any, report error
3. serialize the proto message using deterministic serialization (
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/coded_stream.h#L842
)
4. compare the serialized data against the input data; if they match, the
input data is in the "canonical form"; if not, report error

It will incur an additional serialization cost, but can get you close
enough to the canonical form.

>
> Thanks,
> Dmitry
>
> [1] https://developers.google.com/protocol-buffers/docs/encoding
> [2] https://developers.google.com/protocol-buffers/docs/proto3
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to