If you're using hadoop, why not use AvroSequenceFileOutputFormat - this works 
fine with snappy (block level compression may be best depending on your data)

On Oct 13, 2013, at 10:58 AM, David Ginzburg <[email protected]> wrote:

> As mentioned in http://stackoverflow.com/a/15821136 Hadoop's snappy codec 
> just doesn't work with externally generated files.
> 
> Can files generated by DataFileWriter  serve as input files for a map reduce 
> job, specially EMR jobs ? 
> From: Bertrand Dechoux <[email protected]>
> Sent: Sunday, October 13, 2013 6:36 PM
> To: [email protected]
> Subject: Re: Generating snappy compressed avro files as hadoop map reduce 
> input files
>  
> I am not sure to understand the relation between your problem and the way the 
> temporary data are stored after the map phase.
> 
> However, I guess you are looking for a DataFileWriter and its setCodec 
> function.
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29
> 
> Regards
> 
> Bertrand
> 
> PS : A snappy-compressed avro file is not a standard file which has been 
> compressed afterwards but really a specific file containing compressed 
> blocks. This principle is similar to the SequenceFile's. Maybe that's what 
> you mean by different snappy codec?
> 
> On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <[email protected]> 
> wrote:
> Hi,
> 
> I am writing an application that produces avro record files , to be stored on 
> AWS S3 as possible input to EMR.
> I would like to compress with snappy codec before storing them on S3.
> It is my understanding that hadoop currently uses a different snappy codec, 
> mostly used as intermediate map output format .
> My question is how can I generate within my application logic (not MR) snappy 
> compressed avro files?
> 
> 
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to