On Jun 10, 2010, at 11:49 AM, Doug Cutting wrote:

> On 06/10/2010 09:27 AM, Scott Carey wrote:
>> I propose:
>> * We use getters/setters, and allow setters and constructors to be
>> package protected with an option so the objects can be safely passed
>> around outside the package without stateful wrapper objects.  Users
>> can write factory methods or other static helper methods to abstract
>> away as much as they would like from there.
>> * Setters should not allow invalid data to be set.
> 
> I'd be okay with these changes.  +1
> 
>> * Provide some built-in mechanism to resolve unions on a read and
>> verify them on a write.  Perhaps a inner static enum for each union,
>> and a getter for each branch.
> 
> So a schema like:
> 
> {"type":"record", "name":"Foo", "fields":[
>   {"name":"bar", "type":["int", "float"]}]}
> 
> might generate something like:
> 
> public class Foo {
>   public enum BarType { INT, FLOAT }
>   private BarType bar$Type;
>   private Object bar;
>   public void setBar(int value) { bar = value; bar$Type = INT; }
>   public void setBar(float value) { bar = value; bar$Type = FLOAT; }
>   public int getBar$Int() {
>     if (bar$Type != INT) throw new ...;
>     return (int)bar;
>   }
>   public float getBar$Float() {
>     if (bar$Type != FLOAT) throw new ...;
>     return (float)bar;
>   }
> }
> 

I was trying to think of a way to storing the type in there to keep the memory 
footprint minimized.  
The getter name disambiguation needs a good format, and the $ delimiter above 
would work well and be clear to users.
However, I was thinking something more along these lines:

{code}
public class Foo {
  // intrinsic simple types could be shared rather than in each class
  // named types must be per union.
  public enum BarType { 
    INT(Integer.class),
    FLOAT(Float.class);
    private Class<?> type; // internal use only
    BarType(Class<?> type) {
      this.type = type;
    }
  }
  private Object bar;
  // Integer vs int: avoid auto-unbox, auto-rebox if the user has an Integer 
already
  public void setBar(Integer value) { bar = value; } 
  // same, Float vs float
  public void setBar(Float value) { bar = value; } 
  public Integer getBar$Int() {
    // we could return null instead of throwing.  Return Integer, client can 
un-box if needed
    if (bar.getClass() == INT.type) return bar; else throw new ...;  
  }
  public Float getBar$Float() {
    if (bar.getClass() == FLOAT.type) return bar; else throw new ...;
  }
  public BarType getBar$Type() {
    // we could use a static ConcurrentHashMap<Class<?>, BarType>  instead of 
cascaded if/else if/ ...
    Class<?> c = bar.getClass();
    if (INT.type == c) {
        return INT;
    } else if (FLOAT.type == c) {
        return FLOAT;
    } else {
      throw new NeverHappensException();  // :)
    }
}
{code}

I think the above is thread-safe with no race conditions, though if one thread 
is calling the setter toggling the type another thread calling getBar$Type 
might see a 'delayed' type.

Since the ENUM is exposed to the client, it also needs a regular naming pattern 
so that schema evolution doesn't change the method names.

A user accessing an enum can then switch on the enum and call the appropriate 
method.


> Is that the sort of thing you have in mind?  I've added dollar signs to 
> avoid potential conflicts, e.g., if the user had a field named 'barType' 
> or 'barInt'.  I think its best to always add the dollars, since 
> otherwise it could get awkward if a field named 'barInt' were added later.
> 
>> * Unions with one type and NULL should not translate to Object, but
>> rather be that type and allow null.
> 
> That's already the case, no?
> 
> Also, you didn't provide a proposal for constructors, yet, right? 

Yes, I'll need a more complicated example to discuss that one appropriately.  A 
pattern that aids in business object <> Avro translation, especially in the 
presence of class hierarchies and a couple different object composition 
patterns. A Union as a representation of a subclass is tricky.  So is it when 
modeling composition.

> Perhaps all of the setters could be on a generated builder class, with 
> an instance only returned when it's completely populated?

Providing builder classes rather than constructors could be extremely useful to 
aid in encapsulation.  It should address my constructor concern and remain 
flexible.


> 
> Doug

Reply via email to