Thanks for the reply Doug. Out of curiosity, is maintaining sync markers while writing the file and then passing these markers to the readers while reading not a good way to achieve random access in avro? Atleast that's what my understanding from reading the javadoc[1] was, which could be flawed.
[1] http://avro.apache.org/docs/1.3.3/api/java/org/apache/avro/file/DataFileWriter.html#sync() On Mon, Jul 1, 2013 at 12:05 PM, Doug Cutting <[email protected]> wrote: > Avro data files do not generally support random access. > > SortedKeyValueFile supports random access by key. > > > http://avro.apache.org/docs/current/api/java/org/apache/avro/hadoop/file/SortedKeyValueFile.Reader.html > > From the documentation: > > "The SortedKeyValueFile is a directory with two files, named 'data' > and 'index'. The 'data' file is an ordinary Avro container file with > records. Each record has exactly two fields, 'key' and 'value'. The > keys are sorted lexicographically. The 'index' file is a small Avro > container file mapping keys in the 'data' file to their byte > positions. The index file is intended to fit in memory, so it should > remain small. There is one entry in the index file for each data block > in the Avro container file." > > Doug > > On Mon, Jul 1, 2013 at 8:37 AM, [email protected] > <[email protected]> wrote: > > Hello, > > > > Is it possible to have random access to a record in an avro file? For > > instance, if I have an avro file with a schema containing four records: > > employee id, name, address and phone. While reading the file, is there > any > > way at all to directly jump to a record with employee id 100 instead of > > having to scan the whole file every single time and filtering out > records? > > > > Thanks for the help. > > > > -- > > Swarnim > -- Swarnim
