I have filed a Jira to track this problem, 
https://issues.apache.org/jira/browse/AVRO-1898.

I've fixed it locally. The fix is really simple, but I don't know how to submit 
it.

________________________________
From: Xu Yang <[email protected]>
Sent: Wednesday, February 8, 2017 10:37 AM
To: [email protected]
Subject: Avro C++ library potential bug in union type

Hello Community,

We are currently working on integrating avro into our products. We are exciting 
about avro's schema evolution feature however we found in avro-cpp library it 
didn't support it very well in old versions (1.7.0 which are what we used 
before)

I found some JIRAs(AVRO-1360<https://issues.apache.org/jira/browse/AVRO-1360> & 
AVRO-1474<https://issues.apache.org/jira/browse/AVRO-1474>) online which 
indicate some schema-evolution bugs has been fixed since 1.7.7 so we upgrade 
our avro-cpp library from 1.7.0 to latest 1.8.1, it did resolve the problem in 
old avro-cpp however we found it breaks some our existing tests after upgrade 
which seems like a regression.

According to the avro-cpp spec (http://avro.apache.org/docs/1.7.7/spec.html). 
Since 1.7.7, it added a new note in Union type section: "Note that when a 
default value<http://avro.apache.org/docs/1.7.7/spec.html#schema_record> is 
specified for a record field whose type is a union, the type of the default 
value must match the first element of the union. Thus, for unions containing 
"null", the "null" is usually listed first, since the default value of such 
unions is typically null."

Based on the description, union type like ["null","string"] should only have 
default value "deafult:null", and this works fine in 1.8.1. While other unions 
like ["string","null"] should have default value like "default:"test"", this 
failed in latest version when it trying construct a avro schema object from 
string. It also failed for other similar cases like ["int","null"] or 
["float","null'].

I have divided into the avro-cpp source code a little bit. In the failed case 
it seems hit an assert to force dafault value type is always json::etObject if 
it is not json::etNull; which for me, it seems not always correct, the default 
value type can be string or int or whatever as long as it matches the first 
element type in the union according to the spec.

avro-cpp-1.8.1\impl\Compiler.cc:
282:     case AVRO_UNION:
283:     {
284:         GenericUnion result(n);
285:         string name;
286:         Entity e2;
287:         if (e.type() == json::etNull) {
288:             name = "null";
289:             e2 = e;
290:         } else {
291:             assertType(e, json::etObject);
292:             const map<string, Entity>& v = e.objectValue();
293:             if (v.size() != 1) {
294:                 throw Exception(boost::format("Default value for "
295:                     "union has more than one field: %1%") % e.toString());
296:             }
297:             map<string, Entity>::const_iterator it = v.begin();
298:             name = it->first;
299:             e2 = it->second;
300:         }

it seems all the codes above has been added in svn revision 
1606545<https://svn.apache.org/viewvc?view=revision&sortby=date&revision=1606545>
 by @thiru to fix JIRA  AVRO-1474 I mentioned above. 
https://svn.apache.org/viewvc/avro/trunk/lang/c%2B%2B/impl/Compiler.cc?view=log&sortby=date&pathrev=1606545


I have already create a minimal repo which can constantly reproduce this 
problem, can I file a Jira to track this problem? I will attach my repo there.


Thank you very much!
Yang

Reply via email to