If you're using hadoop, why not use AvroSequenceFileOutputFormat - this works fine with snappy (block level compression may be best depending on your data)
On Oct 13, 2013, at 10:58 AM, David Ginzburg <[email protected]> wrote: > As mentioned in http://stackoverflow.com/a/15821136 Hadoop's snappy codec > just doesn't work with externally generated files. > > Can files generated by DataFileWriter serve as input files for a map reduce > job, specially EMR jobs ? > From: Bertrand Dechoux <[email protected]> > Sent: Sunday, October 13, 2013 6:36 PM > To: [email protected] > Subject: Re: Generating snappy compressed avro files as hadoop map reduce > input files > > I am not sure to understand the relation between your problem and the way the > temporary data are stored after the map phase. > > However, I guess you are looking for a DataFileWriter and its setCodec > function. > http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29 > > Regards > > Bertrand > > PS : A snappy-compressed avro file is not a standard file which has been > compressed afterwards but really a specific file containing compressed > blocks. This principle is similar to the SequenceFile's. Maybe that's what > you mean by different snappy codec? > > On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <[email protected]> > wrote: > Hi, > > I am writing an application that produces avro record files , to be stored on > AWS S3 as possible input to EMR. > I would like to compress with snappy codec before storing them on S3. > It is my understanding that hadoop currently uses a different snappy codec, > mostly used as intermediate map output format . > My question is how can I generate within my application logic (not MR) snappy > compressed avro files? > > > > >
smime.p7s
Description: S/MIME cryptographic signature
