Hi Nandor,
Thank you for your suggestion. Do you mean something like this:
[
{
"namespace": "com.foobar",
"name": "UnionRecords",
"type": "array",
"items": {
"type": "record",
"name": "UnionRecord",
"fields": [
{"name": "commonField1", "type": "string"},
{"name": "commonField2", "type": "string"},
{"name": "integerSpecificToA1", "type": ["null", "long"] },
{"name": "stringSpecificToA1", "type": ["null", "string"]},
{"name": "booleanSpecificToB1", "type": ["null", "boolean"]},
{"name": "stringSpecificToB1", "type": ["null", "string"]}
]
}
}
]
How would you make the distinction when the record is being read? That is
not clear to me. Could you please clarify that?
Thanks,
Peter
2018-06-20 15:51 GMT+02:00 Nandor Kollar <[email protected]>:
> Hi Peter,
>
> I think what you need is a union
> <https://avro.apache.org/docs/1.8.1/spec.html#Unions> of records. What
> comes to my mind is to create a record type with these fields: all common
> field (commonField1, commonField2) and an additional union field for the
> derived types (not nullable union, since your base class is abstract). The
> union is union of your concrete records: RecordTypeB (with the fields
> specific only for this derived type), RecordTypeA (with the fields
> specific only for this derived type).
>
> Regards,
> Nandor
>
> On Wed, Jun 20, 2018 at 3:35 PM, Horváth Péter Gergely <
> [email protected]> wrote:
>
>> Hi All,
>>
>> We have some legacy file format, which I would need to migrate to Avro
>> format. The tricky part is that the records basically have
>>
>> - some common fields,
>> - a discriminator field and
>> - some unique fields, specific to the type selected by the
>> discriminator field
>>
>> all of them is stored in the same file, without any order, mixed with
>> each other.
>>
>> In Java/object-oriented programming, one could represent our records
>> concept as the following:
>>
>> abstract class RecordWithCommonFields {
>> private Long commonField1;
>> private String commonField2;
>> ...
>> }
>>
>> class RecordTypeA extends RecordWithCommonFields {
>> private Integer specificToA1;
>> private String specificToA1;
>> ...
>> }
>>
>> class RecordTypeB extends RecordWithCommonFields {
>> private Boolean specificToB1;
>> private String specificToB1;
>> ...
>> }
>>
>> Imagine the data being something like this:
>>
>> commonField1Value;commonField2Value,TYPE_IS_A,specificToA1Va
>> lue,specificToA1Value
>> commonField1Value;commonField2Value,TYPE_IS_B,specificToB1Va
>> lue,specificToB1Value
>>
>> So I would like to process an incoming file and write its content to Avro
>> format, somehow representing the different types of the records:
>> technically this would be an array, which should hold different types of
>> records.
>>
>> Can someone give me some ideas on how to achieve this?
>>
>> Thanks,
>> Peter
>>
>>
>