I think AVRO-816 should help you. Neither S1 nor S2 subsume one another, but S3 subsumes them both.
Doug On Fri, Feb 1, 2013 at 1:42 PM, Aaron Kimball <[email protected]> wrote: > Ok, I read the patch and JIRA issue a bit more thoroughly. Schema > normalization just tells you if two schemas differ only in the unimportant > bits. > > As I understand it, subsumes() will tell you if a schema is a strict > superset of another. > i.e., > if S1 is a record of { a:int, b:string }, and S2 is a record of { a:int, > b:string, c:int }, then S2.subsumes(S1) would return true but not vice > versa. Is that correct? > > The functionality I need, is to guarantee that two writers who write to a > common data store with possibly different schemas can still read one > another's data without a deserialization error. They need to agree ahead of > time that they're going to write data with schemas "close enough" that the > other one can always deserialize the data into their preferred format. > > S1 and S2 above do not meet this criterion, because S2 cannot read record > written with S1. It doesn't know how to instantiate field 'c'. > > However, S1 and S3 = { a:int, b:string, c:int default 0 } would meet my > criterion. > > Does AVRO-816 help me answer this question? > Thanks, > - Aaron > > > > On Thu, Jan 31, 2013 at 10:17 PM, Aaron Kimball <[email protected]> > wrote: >> >> That sounds like what I'm looking for. I'll take a look! >> >> Thanks, >> - Aaron >> >> On Jan 31, 2013 10:39 AM, "Doug Cutting" <[email protected]> wrote: >>> >>> Aaron, >>> >>> You can use the SchemaNormalization class to test if two schemas are >>> effectively identical: >>> >>> >>> http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas >>> >>> http://avro.apache.org/docs/current/api/java/org/apache/avro/SchemaNormalization.html >>> >>> AVRO-816 has code to tell whether one Schema subsumes another (i.e., >>> can, with resolution, read the other) and to combine multiple schemas >>> into a single that subsumes them all. >>> >>> https://issues.apache.org/jira/browse/AVRO-816 >>> >>> Bob Cotton recently suggested that we should commit some form of this. >>> I'd be happy to do this if others agree. >>> >>> Doug >>> >>> On Wed, Jan 30, 2013 at 3:17 PM, Aaron Kimball <[email protected]> >>> wrote: >>> > Does Avro have an API to allow you to tell whether two schemas are a >>> > match, >>> > statically? >>> > >>> > i.e., schema1.canRead(schema2) /** return true iff schema1 can be used >>> > as a >>> > reader schema for schema2 */ >>> > >>> > From my (admittedly cursorary) scan of the docs + source, it seems like >>> > there isn't something quite that concise, though maybe this can be >>> > accomplished using ResolvingGrammarGenerator? >>> > >>> > I'm pessimistic because of the following quote from the spec [1] >>> > >>> > [matching] if both are unions: >>> > The first schema in the reader's union that matches the selected >>> > writer's >>> > union schema is recursively resolved against it. if none match, an >>> > error is >>> > signalled. >>> > >>> > That sentence makes me think it's context dependent; I interpret "the >>> > selected writer's union schema" as "the schema of the actual thing >>> > written >>> > in a data buffer, which is one of the possible schemas the writer >>> > declared >>> > in her union type". i.e., you can only tell if schema R can be a reader >>> > for >>> > some other schema W in terms of a literal record written by W, and >>> > cannot be >>> > deduced statically for all possible records that can be encoded with >>> > schema >>> > W. Is this interpretation correct? If so, does anyone have any ideas >>> > how to >>> > ensure the best bounds on statically-guaranteed backward compatibility >>> > between a given reader and writer? >>> > >>> > Thanks, >>> > - Aaron >>> > >>> > [1] http://avro.apache.org/docs/current/spec.html#Schema+Resolution > >
