I am not sure to understand the relation between your problem and the way the temporary data are stored after the map phase.
However, I guess you are looking for a DataFileWriter and its setCodec function. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29 Regards Bertrand PS : A snappy-compressed avro file is not a standard file which has been compressed afterwards but really a specific file containing compressed blocks. This principle is similar to the SequenceFile's. Maybe that's what you mean by different snappy codec? On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <[email protected]>wrote: > Hi, > > I am writing an application that produces avro record files , to be stored > on AWS S3 as possible input to EMR. > I would like to compress with snappy codec before storing them on S3. > It is my understanding that hadoop currently uses a different snappy > codec, mostly used as intermediate map output format . > My question is how can I generate within my application logic (not MR) > snappy compressed avro files? > > > >
