Re: AVRO Best Practices for Sparse object storage

roger peppe Fri, 26 Jun 2020 07:10:40 -0700

Assuming each field is represented as a union {null, string}, 70 null
fields would take about 70 bytes (one byte for the discriminator for each
union). One way to reduce that overhead might be to put a bunch of the
fields that are very commonly null into a possibly-null sub-record. That
way you'd need to store just one byte if all its fields are null (although
it would use an extra byte if any of the fields inside it are present).
Another way to save some space would be to avoid using a {null, string}
union where an empty string is sufficient to tell that the field isn't
present. That will save you one byte per non-null field because a string is
prefixed by its length, so could potentially save you 20 or 30 bytes.


  cheers,
    rog.

On Fri, 26 Jun 2020 at 00:14, Sohail Khan <[email protected]> wrote:

> Hello Team,
>
> I am trying to serialize data in AVRO format and store it in Database.
> This would bring down the disk requirement of the table. Currently we are
> storing it in JSON format.
> I Have a very huge POJO with string type fields (Approximately 100), but
> for a given POJO hardly 20 or 30 have values, rest are null. I call it a
> sparse object. I am currently achieving approximately 20 percent
> improvement. Any suggestions, How to take it further, what are the best
> practices w.r.t to handling null values
>
> Thanks and Regards
> Sohail Khan
>

Re: AVRO Best Practices for Sparse object storage

Reply via email to