I think the specification is clear about that.

Unions
> A union is encoded by first writing a long value indicating the
> zero-based position within the union of the schema of its value. The value
> is then encoded per the indicated schema within the union.
> For example, the union schema ["string","null"] would encode:
>
>    - null as the integer 1 (the index of "null" in the union, encoded as
>    hex 02):
>
> 02
>
>    - the string "a" as zero (the index of "string" in the union),
>    followed by the serialized string:
>
> 00 02 61


http://avro.apache.org/docs/1.7.6/spec.html

So there is an overhead but that may not be the main issue.

The issue might be more about defining a correct schema. If a field can be
null then all clients should handle the case when the field is indeed null.
That's a 'hygiene issue' (or data quality issue if your prefer), like with
a database schema.

Regards

Bertrand

Bertrand Dechoux


On Fri, Mar 14, 2014 at 9:15 AM, Fengyun RAO <[email protected]> wrote:

> I have some string fields which may be null, while some definitely not
> null.
> The problem is that it takes time to distinguish them.
> There are about 100 fields, 50 of which are string,  10 of which I guess
> could be null.
>
> Could I just specify all string types ["string", "null"],
> how much is the efficiency difference?
>
>
>

Reply via email to