As mentioned in http://stackoverflow.com/a/15821136 Hadoop's snappy codec just 
doesn't work with externally generated files.

Can files generated by 
DataFileWriter<http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29>
  serve as input files for a map reduce job, specially EMR jobs ?
________________________________
From: Bertrand Dechoux <[email protected]>
Sent: Sunday, October 13, 2013 6:36 PM
To: [email protected]
Subject: Re: Generating snappy compressed avro files as hadoop map reduce input 
files

I am not sure to understand the relation between your problem and the way the 
temporary data are stored after the map phase.

However, I guess you are looking for a DataFileWriter and its setCodec function.
http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29

Regards

Bertrand

PS : A snappy-compressed avro file is not a standard file which has been 
compressed afterwards but really a specific file containing compressed blocks. 
This principle is similar to the SequenceFile's. Maybe that's what you mean by 
different snappy codec?

On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am writing an application that produces avro record files , to be stored on 
AWS S3 as possible input to EMR.
I would like to compress with snappy codec before storing them on S3.
It is my understanding that hadoop currently uses a different snappy codec, 
mostly used as intermediate map output format .
My question is how can I generate within my application logic (not MR) snappy 
compressed avro files?





Reply via email to