I see, thank you very much for the explanation!

On 16.05.18 21:44, Feng Xiao wrote:


On Wed, May 16, 2018 at 7:39 AM Dmitry Timofeev <dmytro.tymof...@bitfury.com <mailto:dmytro.tymof...@bitfury.com>> wrote:

    Hi,

    I consider if Protocol Buffers can be possibly used in an
    application that requires canonical representation of messages
    coming from external source.

    The encoding and proto3 guide [1, 2] include several requirements
    for a parser that make it accept non-canonical data (this list is
    probably not exhaustive):
      - Message fields may appear in any order
      - There are might be multiple instances of the same
    /non-repeated/ field
      - Message may contain unknown fields
      - ¿Default values of primitives may appear on the wire
      - Map entries may appear in any order
      - Repeated fields of primitives may be packed or unpacked.

    1. Is there any natural way to extend the parser with checks of
    canonical form?

No.

    By "natural" I mean a compiler and/or runtime plugin, something
    that does not require a fork of the project.

    2. If not, does such optional feature make sense in Protocol
    Buffers? Would you accept an option that makes the generated
    reader code 'strict', rejecting non-canonical representations,
    and, consequently, not forward-compatible?

Also no here. Compatibility is one of the main reasons to use protobuf because it allows you to evolve your protocol without breaking anyone in a complex system. If you don't need compatibility at all (i.e., you will never change your protocol), using a C++ struct will be much more performant than protobuf because you can skip the whole parsing/serialization cost.

There is a way to mimic the behavior you want though:
1. parse the input data to a proto message
2. check if the proto message has any unknown fields; if any, report error
3. serialize the proto message using deterministic serialization (https://github.com/google/protobuf/blob/master/src/google/protobuf/io/coded_stream.h#L842) 4. compare the serialized data against the input data; if they match, the input data is in the "canonical form"; if not, report error

It will incur an additional serialization cost, but can get you close enough to the canonical form.


    Thanks,
    Dmitry

    [1] https://developers.google.com/protocol-buffers/docs/encoding
    [2] https://developers.google.com/protocol-buffers/docs/proto3

-- You received this message because you are subscribed to the Google
    Groups "Protocol Buffers" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to protobuf+unsubscr...@googlegroups.com
    <mailto:protobuf+unsubscr...@googlegroups.com>.
    To post to this group, send email to protobuf@googlegroups.com
    <mailto:protobuf@googlegroups.com>.
    Visit this group at https://groups.google.com/group/protobuf.
    For more options, visit https://groups.google.com/d/optout.







--
THIS COMMUNICATION AND ANY ATTACHMENTS MAY CONTAIN CONFIDENTIAL INFORMATION OF 
THE SENDER. ALL UNAUTHORIZED USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF 
YOU ARE NOT THE INTENDED RECIPIENT, PLEASE NOTIFY THE SENDER IMMEDIATELY AND 
DESTROY ALL COPIES OF THIS COMMUNICATION. THANK YOU.

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to