does anyone have any suggestions for dealing with large lists/arrays of primitive values in avro?
in my case (numerical algorithms), my naive mapping of a vector type (mathematical vectors, not java Vectors) to an avro specific type generates a GenericArray<Double>. needless to say, i would prefer to avoid the cost of boxing up all the individual floating point numbers. is it possible to coerce avro into using raw java primitive arrays, e.g. "double[]"? On Wed, Jul 28, 2010 at 9:10 AM, Doug Cutting <[email protected]> wrote: > On 07/28/2010 02:07 AM, Nick Palmer wrote: > >> It would be very nice if GenericArray implemented List. I need get, >> set, and remove in GenericData.Array for my application and have >> already added these to my Avro code so I can continue developing. I >> was planning to file a patch in JIRA for this change. >> > > This would be a great patch to have! > > > The trouble with making GenericArray implement List is that >> List.size() returns an int and GenericArray.size() returns a long. Is >> there a reason for this? >> > > Avro arrays can be arbitrarily long, written as blocks. The thinking was > that the interface should expose the length as a long, permitting > implementations that might page values from disk as you iterate. The > collision with List#size() is unfortunate. > > We could either: > a. unilaterally change GenericArray#size() to return int; or > b. rename GenericArray#size() to be something else, like arraySize() or > somesuch, so that someone could still implement a version that's paged. > > My instinct is towards (a). If/when someone ever implements a paged > representation for GenericArray they can perhaps add a method with the full > size then. > > Doug >
