Hi Nandor, Yes, thank you, that works perfectly, even after deserializing, the class of the subtype is correctly restored. Instead of using subtypes, we now have a consistent wrapper for everything: that is SMART! :)
I think this is maybe an edge case, but maybe it would be worth adding a little bit of hint regarding this to the official documentation. Once again, thank you for your help. Regards, Peter 2018-06-20 16:56 GMT+02:00 Nandor Kollar <[email protected]>: > No, I was thinking of something like this: > > { > "namespace": "com.foobar", > "name": "UnionRecords", > "type": "array", > "items": { > "type": "record", > "name": "RecordWithCommonFields", > "fields": [ > {"name": "commonField1", "type": "string"}, > {"name": "commonField2", "type": "string"}, > {"name": "subtype", "type": [ > { > "type" : "record", > "name": "RecordTypeA", > "fields" : [ > {"name": "integerSpecificToA1", "type": ["null", "long"] }, > {"name": "stringSpecificToA1", "type": ["null", "string"]} > ] > }, > { > "type" : "record", > "name": "RecordTypeB", > "fields" : [ > {"name": "booleanSpecificToB1", "type": ["null", "boolean"]}, > {"name": "stringSpecificToB1", "type": ["null", "string"]} > ] > } > ]} > ] > } > } > > This schema represents an array of records, each record has two mandatory > field (the common type fields), and one field for the subtypes. Latter > (named as subtype) is a union field (non nullable, mandatory) of > RecordTypeA and RecordTypeB records each record with the subtype specific > fields. Does this solve your use case? > > Regards, > Nandor > > On Wed, Jun 20, 2018 at 4:34 PM, Horváth Péter Gergely < > [email protected]> wrote: > >> Hi Nandor, >> >> Thank you for your suggestion. Do you mean something like this: >> >> [ >> { >> "namespace": "com.foobar", >> "name": "UnionRecords", >> "type": "array", >> "items": { >> "type": "record", >> "name": "UnionRecord", >> "fields": [ >> {"name": "commonField1", "type": "string"}, >> {"name": "commonField2", "type": "string"}, >> {"name": "integerSpecificToA1", "type": ["null", "long"] }, >> {"name": "stringSpecificToA1", "type": ["null", "string"]}, >> {"name": "booleanSpecificToB1", "type": ["null", "boolean"]}, >> {"name": "stringSpecificToB1", "type": ["null", "string"]} >> ] >> } >> } >> ] >> >> How would you make the distinction when the record is being read? That is >> not clear to me. Could you please clarify that? >> >> Thanks, >> Peter >> >> >> >> 2018-06-20 15:51 GMT+02:00 Nandor Kollar <[email protected]>: >> >>> Hi Peter, >>> >>> I think what you need is a union >>> <https://avro.apache.org/docs/1.8.1/spec.html#Unions> of records. What >>> comes to my mind is to create a record type with these fields: all common >>> field (commonField1, commonField2) and an additional union field for >>> the derived types (not nullable union, since your base class is abstract). >>> The union is union of your concrete records: RecordTypeB (with the >>> fields specific only for this derived type), RecordTypeA (with the >>> fields specific only for this derived type). >>> >>> Regards, >>> Nandor >>> >>> On Wed, Jun 20, 2018 at 3:35 PM, Horváth Péter Gergely < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> We have some legacy file format, which I would need to migrate to Avro >>>> format. The tricky part is that the records basically have >>>> >>>> - some common fields, >>>> - a discriminator field and >>>> - some unique fields, specific to the type selected by the >>>> discriminator field >>>> >>>> all of them is stored in the same file, without any order, mixed with >>>> each other. >>>> >>>> In Java/object-oriented programming, one could represent our records >>>> concept as the following: >>>> >>>> abstract class RecordWithCommonFields { >>>> private Long commonField1; >>>> private String commonField2; >>>> ... >>>> } >>>> >>>> class RecordTypeA extends RecordWithCommonFields { >>>> private Integer specificToA1; >>>> private String specificToA1; >>>> ... >>>> } >>>> >>>> class RecordTypeB extends RecordWithCommonFields { >>>> private Boolean specificToB1; >>>> private String specificToB1; >>>> ... >>>> } >>>> >>>> Imagine the data being something like this: >>>> >>>> commonField1Value;commonField2Value,TYPE_IS_A,specificToA1Va >>>> lue,specificToA1Value >>>> commonField1Value;commonField2Value,TYPE_IS_B,specificToB1Va >>>> lue,specificToB1Value >>>> >>>> So I would like to process an incoming file and write its content to >>>> Avro format, somehow representing the different types of the records: >>>> technically this would be an array, which should hold different types of >>>> records. >>>> >>>> Can someone give me some ideas on how to achieve this? >>>> >>>> Thanks, >>>> Peter >>>> >>>> >>> >> >
