Hi, That's not the best idea as it's wasting a lot of space as encoding eats up lots of space (e.g. 1Byte ASCII, 2-3Byte for UTF-8). Especially as AVRO uses the MSB for compressing smaller ints, this does not seem very keen for mass data.
I'll see if 64Bit unsigned -> 64Bit signed conversion or using the matisse of double works better for us. Thanks, Dennis Am 29.03.12 01:38 schrieb "Miki Tebeka" unter <[email protected]>: >I would encode to string. Should be simple enough, just means you need >a pass on the data after reading it. > >On Wed, Mar 28, 2012 at 11:43 AM, Scott Carey <[email protected]> >wrote: >> On 3/28/12 11:01 AM, "Meyer, Dennis" <[email protected]> wrote: >> >> Hi, >> >> What type refers to an Java Bigint or C long long? Or is there any other >> type in Avro that maps a 64 bit unsigned int? >> >> I unfortunately could only find smaller types in the docs: >> >> Primitive Types >> >> The set of primitive type names is: >> >> string: unicode character sequence >> bytes: sequence of 8-bit bytes >> int: 32-bit signed integer >> long: 64-bit signed integer >> float: single precision (32-bit) IEEE 754 floating-point number >> double: double precision (64-bit) IEEE 754 floating-point number >> boolean: a binary value >> null: no value >> >> >> Anyway in the encoding section theres some 64bit unsigned. Can I use >>them >> somehow by a type? >> >> >> An unsigned value fits in a signed one. They are both 64 bits. Each >> language that supports a long unsigned type has its own way to convert >>from >> one to the other without loss of data. >> >> Work around might be to use the 52 significant bits of a double, but >>seems >> like a hack and of course loosing some more number space compared to >>uint64. >> I'd like to get around any other self-encoding hacks as I'd like to >>also use >> Hadoop/PIG/HIVE on top on AVRO, so would like to keep functionality on >> numbers if possible. >> >> >> Java does not have an unsigned 64 bit type. Hadoop/Pig/Hive all only >>have >> signed 64 bit integer quantities. >> >> Luckily, multiplication and addition on two's compliment signed values >>is >> identical to the operations on unsigned ints, so for many operations >>there >> is no loss in fidelity as long as you pass the raw bits on to something >>that >> interprets the number as an unsigned quantity. >> >> That is, if you take the raw bits of a set of unsigned 64 bit numbers, >>and >> treat those bits as if they are a signed 64 bit quantities, then do >> addition, subtraction, and multiplication on them, then treat the raw >>bit >> result as an unsigned 64 bit value, it is as if you did the whole thing >> unsigned. >> >> http://en.wikipedia.org/wiki/Two%27s_complement >> >> Avro only has signed 32 and 64 bit integer quantities because they can >>be >> mapped to unsigned ones in most cases without a problem and many >>(actually, >> most) languages do not support unsigned integers. >> >> If you want various precision quantities you can use an Avro Fixed type >>to >> map to any type you choose. For example you can use a 16 byte fixed to >>map >> to 128 bit unsigned ints. >> >> >> Thanks, >> Dennis
