Re: How to specify datum for avro union type without ambiguity?

Marcelo Valle Wed, 25 Apr 2018 07:38:29 -0700

Hi Elliot,

I can give you a simple example:


   - Type1: attributes name, middle_name, last_name and age. middle_name is
   optional and age has default value of 35.
   - Type2: attributes name, last_name and age. age is optional and has no
   default value.

If my schema has union of both types, what would I do with datum: `{"name":
"Hakuna", "last_name": "Matata"}` ? Should age be filled with the default
value?
If middle_name had a default value instead of being optional, would it be
filled, even if this was a datum of Type 2?

> Is it that the equivalent schemas might evolve in a divergent manner over
time or perhaps that by targeting a specific schema you are wanting to
convey some out of band information that may have some meaning to a
consumer, if not Avro

Not sure what you meant here, but I need no schema evolution in my case.
Every version is a completely independent schema.
Users use avro schemas to validate their input data, that's why they're
using avro for.

Thanks,
Marcelo.

On 25 April 2018 at 13:43, Elliot West <[email protected]> wrote:

> A quick question: If the datum is valid in more than one schema, what is
> the scenario where knowing the specific schema is necessary? Is it that the
> equivalent schemas might evolve in a divergent manner over time or perhaps
> that by targeting a specific schema you are wanting to convey some out of
> band information that may have some meaning to a consumer, if not Avro?
>
> Elliot.
>
> On 25 April 2018 at 12:27, Marcelo Valle <[email protected]> wrote:
>
>> I am writing a python program using the official avro library for python,
>> version 1.8.2.
>>
>> This is a simple schema to show my problem:
>>
>>     {
>>       "type": "record",
>>       "namespace": "com.example",
>>       "name": "NameUnion",
>>       "fields": [
>>         {
>>           "name": "name",
>>           "type": [
>>             {
>>               "type": "record",
>>               "namespace": "com.example",
>>               "name": "FullName",
>>               "fields": [
>>                 {
>>                   "name": "first",
>>                   "type": "string"
>>                 },
>>                 {
>>                   "name": "last",
>>                   "type": "string"
>>                 }
>>               ]
>>             },
>>             {
>>               "type": "record",
>>               "namespace": "com.example",
>>               "name": "ConcatenatedFullName",
>>               "fields": [
>>                 {
>>                   "name": "entireName",
>>                   "type": "string"
>>                 }
>>               ]
>>             }
>>           ]
>>         }
>>       ]
>>     }
>>
>> Possible datums for this schema would be `{"name": {"first": "Hakuna",
>> "last": "Matata"}}` and `{"name": {"entireName": "Hakuna Matata"}}`.
>>
>> However, this gives margin to ambiguity, as not always avro will be able
>> to detect the right schema specified in the union. In this case, either
>> datum will correspond to 1 and only 1 valid schema, but there might be a
>> case where more than 1 schema in the union would be valid.
>>
>> I wonder whether it would be possible to use a datum like `{"name":
>> {"FullName": {"first": "Hakuna", "last": "Matata"}}}`, where the specific
>> union schema name is specified in the datum.
>>
>> Is it possible? How to do it?
>>
>> --
>> Marcelo Valle
>> http://mvalle.com - @mvallebr
>>
>
>


-- 
Marcelo Valle
http://mvalle.com - @mvallebr

Re: How to specify datum for avro union type without ambiguity?

Reply via email to