Sure, but that is kind of an unbounded question. Can you be more specific as to 
what you’re looking for?

Here’s a shot at an answer:
Polymorphism is a weak spot for Avro; unions help get around that short coming. 
We have unions which contain multiple record specifications. The reference that 
has a union datatype in the schema could point to an instance of one of many 
classes at runtime with which class that is being known only at runtime.


[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com
[email protected]<mailto:[email protected]>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you 
print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for 
Company Registration Information.




From: Wai Yip Tung <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, June 5, 2014 at 1:40 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Union resolution in dynamic languages

That's good to know. Would you mind sharing your use case with us?

Wai Yip

[cid:[email protected]]
Grant Overby (groverby)<mailto:[email protected]>
Thursday, June 05, 2014 6:46 AM
Disallowing multiple named types within a union would break our use cases.

We have a similar problem. With two record types in a union, the Python driver 
doesn’t choose well.

We solved this problem by adding a pseudo-reserved key to the dict to indicate 
which named type to use. I started the process of open sourcing that patch a 
few days ago. It’s definitely a hack, but I’m hoping the community will accept 
it.

Our patch doesn’t change the time complexity. From a brief glance , choosing 
within the union seems to typically be O(n) as the recursion short circuits. 
For named types, the complexity could be O(1). Achieving O(1) for non named 
types seems achievable too. How many projects are impacted by this ‘wasted’ 
complexity? Simpler code might be better than faster code.

[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com
[email protected]<mailto:[email protected]>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you 
print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for 
Company Registration Information.




From: Wai Yip Tung <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, June 4, 2014 at 9:34 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Union resolution in dynamic languages

Also I ask about this in the context of building an optimized encoder. For this 
implementation, the resolution will be much simpler if we limit union to not 
support two records, similar to the spec do not allow two array or two map 
types. I wonder if this limit breaks any significant use case.

Wai Yip
[cid:[email protected]]
Wai Yip Tung<mailto:[email protected]>
Wednesday, June 04, 2014 6:34 PM
Also I ask about this in the context of building an optimized encoder. For this 
implementation, the resolution will be much simpler if we limit union to not 
support two records, similar to the spec do not allow two array or two map 
types. I wonder if this limit breaks any significant use case.

Wai Yip
[cid:[email protected]]
Wai Yip Tung<mailto:[email protected]>
Wednesday, June 04, 2014 4:40 PM
For encoding data of union type, the Avro specification do not say a lot which 
one of the type in the union is used. So far I am mostly using union so that I 
can write null or another simple type. In these cases, it is fairly obvious for 
the encoding to distinguish null from other types.

However a union can also be any named types. So they can be two records. Let 
say a Manger record and a NonManager record. I think with strongly typed 
languages, the suitable type in the union can be selected by introspection. But 
for dynamic languages, these might just be a represented as maps without any 
notion of type. In some case, we may find that the object has all the 
attributes of a NonManager but not the Manager. So we can conclude NonManager 
is the proper schema to use. But this can get complicated with nested data 
structure where the attribute that can disambiguate thing appear in a deeper 
level. Or you can think of valid scenario where inspecting the content of the 
obj cannot unambiguously resolve the union branch.

I notice that the Python implementation use two pass recursive validation 
possible for the reason of for resolving the union choice.

I am wonder if there are much consideration about are potentially complex, 
indirectly nested union types that might be difficult to resolve? Thus adding 
complexity to the implementation of the encoders? Are there use case in 
practice that involve complex union decision?

Wai Yip

Reply via email to