Went through that thread. None are convincing from a design standpoint because:

1.  Avro is used in non-Java environments. The Avro IDL is language agnostic 
while the code-gen is language-specific. So the C# code-gen could spit out 
unsigned. Every language has limitations but not sure why Java's limitations 
should drive Avro's designs, despite the heritage. (it's going to grow into 
other languages, right?)
2. unsigned 32/64bit values have been extensively used as primitive types for 
over 3 decades (i.e. it's held it's ground. Heck, even core Java devs hate that 
unsigned doesn't exist. eg 
http://stackoverflow.com/questions/430346/why-doesnt-java-support-unsigned-ints)
3. All other workarounds simply add more friction to development when in 
reality, working with a primitive data type that's been around "forever" should 
be very transparent and very fluid.

Stepping off the soapbox, I also have a workaround for future readers. We cast 
uint<-> int after temporarily disabling arithmetic overflows, and then let Avro 
handle then as signed varints (aka zipzag varints). As example code: 

int avroInt32; // this is code-gen'd off the IDL
uint csharpUint32; // this is an app domain var 

// to avro DTO
avroInt32 = unchecked((int) csharpUint32);

// from Avro DTO
csharpUint32 = unchecked((uint)avroInt32 );

Pros:
a) Use the encoding compression inherent in varints (eg: stay under 4 bytes 
till 134,217,727)
b) Keep the application domain logic as unsigned (as it needs to be)
c) Minimize the glue logic / impedance when converting from app domain => DTO 
domain

Cons:
1) Specific glue code needed because Avro inherits Java's limitations
2) We're still wasting half of the addressable range since we're skipping every 
other possible varint encoding (reserved for -ve numbers) since we only see +ve 
numbers. Which means instead of hitting my 5th varint byte after 268,435,455, I 
now need that 5th byte at half that - 134,217,727. It's not *too* bad but seems 
wasteful to always transport a bit that's never used (bit 0, a zigzag varint's 
'sign bit' will always be 0, carrying no informational content). 

Cheers
Sid

> From: [email protected]
> Date: Wed, 12 Feb 2014 17:50:02 +0530
> Subject: Re: unsigned 32bit (uint) in Avro - C# ?
> To: [email protected]
> 
> See also this past thread on the topic perhaps:
> http://mail-archives.apache.org/mod_mbox/avro-user/201212.mbox/%[email protected]%3e
> 
> On Mon, Feb 10, 2014 at 3:46 PM, Mika Ristimaki
> <[email protected]> wrote:
> > Hi,
> >
> > Java doesn't have unsigned primitives, so most likely Avro doesn't support
> > them directly either.
> >
> > -Mika
> >
> > On Feb 10, 2014, at 3:34 AM, Sid Shetye <[email protected]> wrote:
> >
> > How do I serialize an unsigned integer (uint or UInt32 in C#) in Avro?
> >
> > It's very bizarre that unsigned aren't discussed at
> > http://avro.apache.org/docs/1.7.6/spec.html#schema_primitive
> >
> >
> >
> >
> 
> 
> 
> -- 
> Harsh J
                                          

Reply via email to