On Jun 10, 2010, at 11:49 AM, Doug Cutting wrote:
> On 06/10/2010 09:27 AM, Scott Carey wrote:
>> I propose:
>> * We use getters/setters, and allow setters and constructors to be
>> package protected with an option so the objects can be safely passed
>> around outside the package without stateful wrapper objects. Users
>> can write factory methods or other static helper methods to abstract
>> away as much as they would like from there.
>> * Setters should not allow invalid data to be set.
>
> I'd be okay with these changes. +1
>
>> * Provide some built-in mechanism to resolve unions on a read and
>> verify them on a write. Perhaps a inner static enum for each union,
>> and a getter for each branch.
>
> So a schema like:
>
> {"type":"record", "name":"Foo", "fields":[
> {"name":"bar", "type":["int", "float"]}]}
>
> might generate something like:
>
> public class Foo {
> public enum BarType { INT, FLOAT }
> private BarType bar$Type;
> private Object bar;
> public void setBar(int value) { bar = value; bar$Type = INT; }
> public void setBar(float value) { bar = value; bar$Type = FLOAT; }
> public int getBar$Int() {
> if (bar$Type != INT) throw new ...;
> return (int)bar;
> }
> public float getBar$Float() {
> if (bar$Type != FLOAT) throw new ...;
> return (float)bar;
> }
> }
>
I was trying to think of a way to storing the type in there to keep the memory
footprint minimized.
The getter name disambiguation needs a good format, and the $ delimiter above
would work well and be clear to users.
However, I was thinking something more along these lines:
{code}
public class Foo {
// intrinsic simple types could be shared rather than in each class
// named types must be per union.
public enum BarType {
INT(Integer.class),
FLOAT(Float.class);
private Class<?> type; // internal use only
BarType(Class<?> type) {
this.type = type;
}
}
private Object bar;
// Integer vs int: avoid auto-unbox, auto-rebox if the user has an Integer
already
public void setBar(Integer value) { bar = value; }
// same, Float vs float
public void setBar(Float value) { bar = value; }
public Integer getBar$Int() {
// we could return null instead of throwing. Return Integer, client can
un-box if needed
if (bar.getClass() == INT.type) return bar; else throw new ...;
}
public Float getBar$Float() {
if (bar.getClass() == FLOAT.type) return bar; else throw new ...;
}
public BarType getBar$Type() {
// we could use a static ConcurrentHashMap<Class<?>, BarType> instead of
cascaded if/else if/ ...
Class<?> c = bar.getClass();
if (INT.type == c) {
return INT;
} else if (FLOAT.type == c) {
return FLOAT;
} else {
throw new NeverHappensException(); // :)
}
}
{code}
I think the above is thread-safe with no race conditions, though if one thread
is calling the setter toggling the type another thread calling getBar$Type
might see a 'delayed' type.
Since the ENUM is exposed to the client, it also needs a regular naming pattern
so that schema evolution doesn't change the method names.
A user accessing an enum can then switch on the enum and call the appropriate
method.
> Is that the sort of thing you have in mind? I've added dollar signs to
> avoid potential conflicts, e.g., if the user had a field named 'barType'
> or 'barInt'. I think its best to always add the dollars, since
> otherwise it could get awkward if a field named 'barInt' were added later.
>
>> * Unions with one type and NULL should not translate to Object, but
>> rather be that type and allow null.
>
> That's already the case, no?
>
> Also, you didn't provide a proposal for constructors, yet, right?
Yes, I'll need a more complicated example to discuss that one appropriately. A
pattern that aids in business object <> Avro translation, especially in the
presence of class hierarchies and a couple different object composition
patterns. A Union as a representation of a subclass is tricky. So is it when
modeling composition.
> Perhaps all of the setters could be on a generated builder class, with
> an instance only returned when it's completely populated?
Providing builder classes rather than constructors could be extremely useful to
aid in encapsulation. It should address my constructor concern and remain
flexible.
>
> Doug