Mark, I'd welcome improvements to default value validation in Avro. For performance, I think this should be an explicit, separate operation from parsing schemas. But we might invoke it on schemas at various points, e.g., when creating a file. If you are able, please contribute your implementation by filing an issue in Avro's Jira.
Thanks, Doug On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <[email protected]> wrote: > On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <[email protected]> wrote: > >> No, I don't know of a default value validator that's been implemented >> yet. It would be great to have one. >> >> I think this would recursively walk a schema. Whenever a non-null >> default value is found it could call ResolvingGrammarDecoder#encode(). >> That's what interprets Json default values. (Perhaps this logic >> should be moved, though.) > > > Thanks for the reply Doug. > > I did find ResolvingGrammarDecoder.encode (I saw that it is called by the > builders) and was using it as you described, but I ran into limitations: > > + When the field type is an array, map or record, values of the > wrong JSON type (not array or object) are translated to an empty array, > map or record. For example, specifying a default of 0, null or "" results > in an empty array, map or record. > > + For all numeric Avro types (int, long, float and double) the default > value may be of any JSON numeric type, and the JSON values will be coerced > to the Avro type in spite of the fact that part of the value may be > lost/truncated. For example, a long default value that exceeds 32-bits > will be truncated if the field is type int. > > + The byte array length is not validated for a fixed type. > > + For nested fields and certain types (e.g., enums) a cryptic error > is often output that does not contain the name of the offending field. > > These deficiencies can mask errors made by the user when defining > a default value. This is important to our application. > > To compensate for these deficiencies we implemented our own checking that > is more strict than Avro's. To do this, we serialize the default value > using our own JSON serializer in a special mode where default values are > applied. Any errors during serialization indicate that the default value > is invalid. > > Something similar might be done in Avro itself, for example, if the JSON > encoder were made to operate in a special mode where default values are > applied. > > --mark >
