Thank you very much Mike. I am looking @ Avro C API right now and this is extremely helpful. Lewis
On Sat, May 24, 2014 at 6:00 AM, Mike Stanley <[email protected]> wrote: > While I haven't benchmarked java performance I have looked closely at Ruby > vs C with regards to reading large avro files. With C - I have processed > ~900Mb files with 25+M rows in ~42s. And routinely process 270Mb / 7.5M > record files with C, on average, in 15s. These numbers were observed > running on a Mac Book Pro 2012 model (exact specs elude me at the > moment). Not scientific but may help give you a ballpark of what is > possible. > I am using Java. I did play with the size of the buffer reader, but I > found that the default size of 8K gave me the best performance. > thanks, Yael > > > On Fri, May 23, 2014 at 4:14 AM, Martin Kleppmann <[email protected] > > wrote: > >> Which language are you using? Afaik, most language implementations of >> Avro only have an interface for reading one record at a time, but they do >> buffer the input file internally, so there shouldn't be a performance >> disadvantage to reading one record at a time. >> >> If you have an example that is particularly slow, you could be a great >> help to the Avro community by getting out a profiler and finding the >> bottleneck :) >> >> Thanks, >> Martin >> >> On 14 May 2014, at 20:13, yael aharon <[email protected]> wrote: >> > I am building a java utility that reads large AVRO files and does some >> processing. These files have millions of records in them and it can take >> minutes to read them using DataFileReader.next(). >> > Is there a way to read more than one record at a time? >> > thanks, Yael >> >> > -- *Lewis*
