Understanding the reasoning and theoretical limits of compact encoding field ids

Juan Cruz Viotti Wed, 28 Oct 2020 07:02:25 -0700

Hey there,

I'm studying Thrift's compact protocol spec's struct encoding section
[1] and I have some questions that I couldn't answer from just the spec.


The spec describes two types of field header encodings:

- A 4-bit unsigned integer field identifier delta followed by a 4-bit
  unsigned integer type id

- A 4-bit unsigned integer type id followed by a 16-bit signed
  Zigzag-encoded integer absolute field identifier (for when the field
  delta exceeds 15)

My first question is: Why is the longer form using a signed integer? It
doesn't seem like Apache Thrift supports negative field identifiers.

Then, assuming the longer-form encodes absolute field identifiers and
not deltas like in the shorter form, the largest positive zigzag-encoded
integer that fits into 16-bits is 32767. As a consequence, the
longer-form encoding seems to impose a theoretical limit on the amount
of fields that can be included in a struct. On the other hand, the
delta-based shorter form would, in theory, let a struct grow
indefinitely.

Why does the longer-form encoding abandon the delta-based approach,
which seems to be superior in all respects?

Do implementations provide an upper limit on the amount of struct fields
when using the delta-based approach?

Thanks in advance for the clarifications,

[1]: 
https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md#struct-encoding

-- 
Juan Cruz Viotti
https://www.jviotti.com

Understanding the reasoning and theoretical limits of compact encoding field ids

Reply via email to