Avro does permit partial reading of arrays. Arrays are written as a series of length-prefixed blocks:
http://avro.apache.org/docs/current/spec.html#binary_encode_complex The standard encoders do not write arrays as multiple blocks, but BlockingBinaryEncoder does. It can be used with any DatumWriter implementation. If you, for example, have an array whose implementation is backed by a database and contains billions of elements, it can be written as a single Avro value with a BlockingBinaryEncoder. http://avro.apache.org/docs/current/api/java/org/apache/avro/io/BlockingBinaryEncoder.html All Decoder implementations read array blocks correctly, but none of the standard DatumReader implementations support reading of partial arrays. So you could use the Decoder API directly to read your data, or you might extend an existing DatumReader to read partial arrays. For example, you might override GenericDatumReader#readArray() to only read the first N elements, then skip the rest. Or all the array elements might be stored externally as they are read. Doug On Mon, Dec 15, 2014 at 2:54 PM, yael aharon <[email protected]> wrote: > Hello, > I need to read very large avro records, where each record is an array which > can have hundreds of millions of members. I am concerned about reading a > whole record into memory at once. > Is there a way to read only a part of the record instead? > thanks, Yael
