Hi Nandor,

Thank you for your suggestion. Do you mean something like this:

[
  {
    "namespace": "com.foobar",
    "name": "UnionRecords",
    "type": "array",
    "items": {
      "type": "record",
      "name": "UnionRecord",
      "fields": [
        {"name": "commonField1", "type": "string"},
        {"name": "commonField2", "type": "string"},
        {"name": "integerSpecificToA1", "type": ["null", "long"] },
        {"name": "stringSpecificToA1", "type": ["null", "string"]},
        {"name": "booleanSpecificToB1", "type": ["null", "boolean"]},
        {"name": "stringSpecificToB1", "type": ["null", "string"]}
      ]
    }
  }
]

How would you make the distinction when the record is being read? That is
not clear to me. Could you please clarify that?

Thanks,
Peter



2018-06-20 15:51 GMT+02:00 Nandor Kollar <[email protected]>:

> Hi Peter,
>
> I think what you need is a union
> <https://avro.apache.org/docs/1.8.1/spec.html#Unions> of records. What
> comes to my mind is to create a record type with these fields: all common
> field (commonField1, commonField2) and an additional union field for the
> derived types (not nullable union, since your base class is abstract). The
> union is union of your concrete records: RecordTypeB (with the fields
> specific only for this derived type), RecordTypeA (with the fields
> specific only for this derived type).
>
> Regards,
> Nandor
>
> On Wed, Jun 20, 2018 at 3:35 PM, Horváth Péter Gergely <
> [email protected]> wrote:
>
>> Hi All,
>>
>> We have some legacy file format, which I would need to migrate to Avro
>> format. The tricky part is that the records basically have
>>
>>    - some common fields,
>>    - a discriminator field and
>>    - some unique fields, specific to the type selected by the
>>    discriminator field
>>
>> all of them is stored in the same file, without any order, mixed with
>> each other.
>>
>> In Java/object-oriented programming, one could represent our records
>> concept as the following:
>>
>> abstract class RecordWithCommonFields {
>>    private Long commonField1;
>>    private String commonField2;
>>    ...
>> }
>>
>> class RecordTypeA extends RecordWithCommonFields {
>>    private Integer specificToA1;
>>    private String specificToA1;
>>    ...
>> }
>>
>> class RecordTypeB extends RecordWithCommonFields {
>>    private Boolean specificToB1;
>>    private String specificToB1;
>>    ...
>> }
>>
>> Imagine the data being something like this:
>>
>> commonField1Value;commonField2Value,TYPE_IS_A,specificToA1Va
>> lue,specificToA1Value
>> commonField1Value;commonField2Value,TYPE_IS_B,specificToB1Va
>> lue,specificToB1Value
>>
>> So I would like to process an incoming file and write its content to Avro
>> format, somehow representing the different types of the records:
>> technically this would be an array, which should hold different types of
>> records.
>>
>> Can someone give me some ideas on how to achieve this?
>>
>> Thanks,
>> Peter
>>
>>
>

Reply via email to