Hi All,
We have some legacy file format, which I would need to migrate to Avro
format. The tricky part is that the records basically have
- some common fields,
- a discriminator field and
- some unique fields, specific to the type selected by the discriminator
field
all of them is stored in the same file, without any order, mixed with each
other.
In Java/object-oriented programming, one could represent our records
concept as the following:
abstract class RecordWithCommonFields {
private Long commonField1;
private String commonField2;
...
}
class RecordTypeA extends RecordWithCommonFields {
private Integer specificToA1;
private String specificToA1;
...
}
class RecordTypeB extends RecordWithCommonFields {
private Boolean specificToB1;
private String specificToB1;
...
}
Imagine the data being something like this:
commonField1Value;commonField2Value,TYPE_IS_A,specificToA1Value,specificToA1Value
commonField1Value;commonField2Value,TYPE_IS_B,specificToB1Value,specificToB1Value
So I would like to process an incoming file and write its content to Avro
format, somehow representing the different types of the records:
technically this would be an array, which should hold different types of
records.
Can someone give me some ideas on how to achieve this?
Thanks,
Peter