Hi, so have a case where we have
data set 1 with schema and a field - { "name": "x", "type" : "string" }
we have app1 and it does .get("x") generic retrieval
This application becomes long lived and we don't want (maybe can't) change it.
We want to change the name of the field. Lets say our new field name is "y" ...
according to docs/specs we are supposed to add that to aliases... A new
producer can create data referencing the improved name “y” and an old consumer
can go on thinking in terms of a “x” without having to do any work.
The problem is the world changes and really the context of that field name
should be "y" and not "x". We want to-do this because the context of the schema
should make sense and context for current state is important. e.g. we used to
call it "horse_drawn_carriage" and now we want to call it "automobile"
(pda->mobile_device (lots of things change over time in context) ... there are
lots of real world examples that I don't/can't want to get into the weeds about
hopefully my two random ones are enough to help illustrate the problem is
real... we also have cases where over time the name will likely change again
so if we kept using the current approach and add more to aliases you don't know
which one of those aliases is really the current one which is why we favor
field name to be current context.
so we do
data set 2 with schema and a field - { "name": "y", "type" : "string",
"aliases" :["x"]}
we have app2 and it does .get("y") generic retrieval because that is how folks
now know to build their apps. The problem is.... aliases are not bidirectional.
So we can't reference "x" to get at our data in the old app which breaks :(
So we came up with a patch that handles this ~ roughly ~
public static Object resolveField(GenericRecord genericRecord, String
fieldName) {
for (Schema.Field field : genericRecord.getSchema().getFields()) {
if (field.name().equals(fieldName)) { return
genericRecord.get(fieldName); }
for (String alias : field.aliases()) {
if (fieldName.equals(alias)) { return
genericRecord.get(field.name()); }
}
}
return null;
}
I wanted to check first if we were missing something as we were going through
this or doing something by changing alias in a way that the community believes
is at odds with some principles we were not understanding or properly grocking?
I am very open minded that we have gone down the wrong path here however it
does seem to solve the core problem we have with keeping context of the schema
current. I could see how this problem is not just us or our use case and one
that others have too.
If folks are in sync with this change I would like to propose/create a patch
and see about making aliases work bi-directionally allowing folks to use the
name field as "the current context of the name of the thing" where the list of
aliases are historic items.
Thoughts?
Regards,
~ Joe Stein