No, I was thinking of something like this:

{
  "namespace": "com.foobar",
  "name": "UnionRecords",
  "type": "array",
  "items": {
    "type": "record",
    "name": "RecordWithCommonFields",
    "fields": [
      {"name": "commonField1", "type": "string"},
      {"name": "commonField2", "type": "string"},
      {"name": "subtype", "type": [
        {
          "type" : "record",
          "name": "RecordTypeA",
          "fields" : [
            {"name": "integerSpecificToA1", "type": ["null", "long"] },
            {"name": "stringSpecificToA1", "type": ["null", "string"]}
          ]
        },
        {
          "type" : "record",
          "name": "RecordTypeB",
          "fields" : [
            {"name": "booleanSpecificToB1", "type": ["null", "boolean"]},
            {"name": "stringSpecificToB1", "type": ["null", "string"]}
          ]
        }
      ]}
    ]
  }
}

This schema represents an array of records, each record has two mandatory
field (the common type fields), and one field for the subtypes. Latter
(named as subtype) is a union field (non nullable, mandatory) of
RecordTypeA and RecordTypeB records each record with the subtype specific
fields. Does this solve your use case?

Regards,
Nandor

On Wed, Jun 20, 2018 at 4:34 PM, Horváth Péter Gergely <
[email protected]> wrote:

> Hi Nandor,
>
> Thank you for your suggestion. Do you mean something like this:
>
> [
>   {
>     "namespace": "com.foobar",
>     "name": "UnionRecords",
>     "type": "array",
>     "items": {
>       "type": "record",
>       "name": "UnionRecord",
>       "fields": [
>         {"name": "commonField1", "type": "string"},
>         {"name": "commonField2", "type": "string"},
>         {"name": "integerSpecificToA1", "type": ["null", "long"] },
>         {"name": "stringSpecificToA1", "type": ["null", "string"]},
>         {"name": "booleanSpecificToB1", "type": ["null", "boolean"]},
>         {"name": "stringSpecificToB1", "type": ["null", "string"]}
>       ]
>     }
>   }
> ]
>
> How would you make the distinction when the record is being read? That is
> not clear to me. Could you please clarify that?
>
> Thanks,
> Peter
>
>
>
> 2018-06-20 15:51 GMT+02:00 Nandor Kollar <[email protected]>:
>
>> Hi Peter,
>>
>> I think what you need is a union
>> <https://avro.apache.org/docs/1.8.1/spec.html#Unions> of records. What
>> comes to my mind is to create a record type with these fields: all common
>> field (commonField1, commonField2) and an additional union field for the
>> derived types (not nullable union, since your base class is abstract). The
>> union is union of your concrete records: RecordTypeB (with the fields
>> specific only for this derived type), RecordTypeA (with the fields
>> specific only for this derived type).
>>
>> Regards,
>> Nandor
>>
>> On Wed, Jun 20, 2018 at 3:35 PM, Horváth Péter Gergely <
>> [email protected]> wrote:
>>
>>> Hi All,
>>>
>>> We have some legacy file format, which I would need to migrate to Avro
>>> format. The tricky part is that the records basically have
>>>
>>>    - some common fields,
>>>    - a discriminator field and
>>>    - some unique fields, specific to the type selected by the
>>>    discriminator field
>>>
>>> all of them is stored in the same file, without any order, mixed with
>>> each other.
>>>
>>> In Java/object-oriented programming, one could represent our records
>>> concept as the following:
>>>
>>> abstract class RecordWithCommonFields {
>>>    private Long commonField1;
>>>    private String commonField2;
>>>    ...
>>> }
>>>
>>> class RecordTypeA extends RecordWithCommonFields {
>>>    private Integer specificToA1;
>>>    private String specificToA1;
>>>    ...
>>> }
>>>
>>> class RecordTypeB extends RecordWithCommonFields {
>>>    private Boolean specificToB1;
>>>    private String specificToB1;
>>>    ...
>>> }
>>>
>>> Imagine the data being something like this:
>>>
>>> commonField1Value;commonField2Value,TYPE_IS_A,specificToA1Va
>>> lue,specificToA1Value
>>> commonField1Value;commonField2Value,TYPE_IS_B,specificToB1Va
>>> lue,specificToB1Value
>>>
>>> So I would like to process an incoming file and write its content to
>>> Avro format, somehow representing the different types of the records:
>>> technically this would be an array, which should hold different types of
>>> records.
>>>
>>> Can someone give me some ideas on how to achieve this?
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>>
>

Reply via email to