On Jun 7, 2010, at 3:51 PM, Doug Cutting wrote: > On 06/07/2010 03:40 PM, Scott Carey wrote: >> If the specific compiler generated a couple constructors -- >> * A default empty argument constructor -- fills fields with defaults. >> * A constructor with all fields passed in -- assigns fields from the >> constructor and does nothing with defaults. >> >> Then I can use the latter to avoid the performance issue. > > A problem with the latter is that such constructors can be fragile. If > you have a number of integer fields and reorder them in your schema then > the geneated constructor arguments will be re-ordered but the compiler > will not notice a problem.
But, if you have a wrapper class, it will have the same fragility -- users will make that same mistake. Why not solve it here? My requirement is that objects are fail-fast on construction (or near to it) if the resulting state would result in a failure to serialize. Its not acceptable that a client only finds out that the object can't be serialized when it is later written. That occurs asynchronously in my case, and its almost impossible to track down 'who forgot to set field X' otherwise. So, my wrappers only have public static factories that require all fields, though many of these are our business objects and not each individual field - this avoids most of the field re-order problem above. The factory throws an exception if the data provided does not create an object that is serializable. > > Another option might be to have a flag on the compiler to control > generation of default value settings. > >> FWIW, I have written wrappers around all of my datum classes to deal >> with this, > > FWIW, Philip claims that (from his Google experience with protobufs) > that the best practice is always to use a manually-written wrapper > around generated classes. I agree it is a best practice to have wrapper _methods_ (factories, helpers, etc). A wrapper API can translate business objects to Avro and otherwise abstract away the avro details from other users. But I'm not so sure about wrapper instances with state. Its a much larger maintenance burden -- suddenly you need wrappers for all the other Avro APIs too -- file read/write, etc to handle your wrapper type rather than the avro record type. For every Avro SpecificRecord, I have an object instance that wraps it and hides the construction details, union resolution details, and setter/constructor/factory validation. These hide the fields and expose them with getters. > >> and I have wanted to either modify the SpecificCompiler or >> have a new generated object 'flavor' that had the safe-construction, >> union resolution, and getter methods that my wrappers provide. I >> just haven't had time and its lower priority than some other >> enhancements I'll need soon. > > Perhaps you can just post some pseudo code that illustrates what would > be generated, for discussion? I'll post some examples later, Its been a hectic week fighting other fires and they're still not all extinguished. > > For generated code like this I see no point to getter/setter methods. > They seem equivalent to public fields. But perhaps I'm missing something. * Setters can check the data. For example they can ensure that the type on a Union is valid. Otherwise you only find errors when you attempt to serialize it. * If you have a public field, you can't force users through the setter, so if you have a setter, you need a getter. * If the implementation details change, you break your API. For example, if we go from an int to an Integer on the internal member, or a Utf8 to a String, a geter/setter can hide that. The API is more brittle if the fields are not encapsulated. * Reflection is actually faster on method access than field access. I propose: * We use getters/setters, and allow setters and constructors to be package protected with an option so the objects can be safely passed around outside the package without stateful wrapper objects. Users can write factory methods or other static helper methods to abstract away as much as they would like from there. * Setters should not allow invalid data to be set. * Provide some built-in mechanism to resolve unions on a read and verify them on a write. Perhaps a inner static enum for each union, and a getter for each branch. * Unions with one type and NULL should not translate to Object, but rather be that type and allow null. > > Doug >
