this my approach : although you could use AvroDatafile, I used my own: I use SequenceFile , or RCFile, or TFile as an "envelope", and just serialize avro into a bytes array, and write that into these envelops as a payload. I did some tests, TFile envelope was best in speed.
On Fri, Sep 30, 2011 at 6:42 PM, Eric Hauser <[email protected]> wrote: > A coworker and I were having a conversation today about choosing a > compression algorithm for some data we are storing in Hadoop. We have > been using (https://github.com/tomslabs/avro-utils) for our Map/Reduce > jobs and Haivvreo for integration with Hive. By default, the > avro-utils OutputFormat uses deflate compression. Even though > default/zlib/gzip files are not splittable, we decided that Avro data > files are always splittable because individual blocks within the file > are compressed instead of the entire file. > > Is this accurate? Thanks. >
