I haven't actually tried writing, but look at AvroSequenceFileOutputFormat (and obviously have native snappy libraries on your box)
Also the javadoc is a bit IMHO ambiguous on AvroJob setup - you can totally use NullWritable (or any other hadoop supported Serializable) as a key. On Oct 13, 2013, at 2:23 PM, David Ginzburg <[email protected]> wrote: > Thanks, > I am not generating the avro files with hadoop MR, but a different process. > I Plan to just store the files on s3 for potential archive processing with > EMR. > Can I use AvroSequenceFile from a non M/R process to generate the sequence > files having my avro records as the values, and null keys ? > From: graham sanderson <[email protected]> > Sent: Sunday, October 13, 2013 9:16 PM > To: [email protected] > Subject: Re: Generating snappy compressed avro files as hadoop map reduce > input files > > If you're using hadoop, why not use AvroSequenceFileOutputFormat - this works > fine with snappy (block level compression may be best depending on your data) > > On Oct 13, 2013, at 10:58 AM, David Ginzburg <[email protected]> wrote: > >> As mentioned in http://stackoverflow.com/a/15821136 Hadoop's snappy codec >> just doesn't work with externally generated files. >> >> Can files generated by DataFileWriter serve as input files for a map reduce >> job, specially EMR jobs ? >> From: Bertrand Dechoux <[email protected]> >> Sent: Sunday, October 13, 2013 6:36 PM >> To: [email protected] >> Subject: Re: Generating snappy compressed avro files as hadoop map reduce >> input files >> >> I am not sure to understand the relation between your problem and the way >> the temporary data are stored after the map phase. >> >> However, I guess you are looking for a DataFileWriter and its setCodec >> function. >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#setCodec%28org.apache.avro.file.CodecFactory%29 >> >> Regards >> >> Bertrand >> >> PS : A snappy-compressed avro file is not a standard file which has been >> compressed afterwards but really a specific file containing compressed >> blocks. This principle is similar to the SequenceFile's. Maybe that's what >> you mean by different snappy codec? >> >> On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg <[email protected]> >> wrote: >> Hi, >> >> I am writing an application that produces avro record files , to be stored >> on AWS S3 as possible input to EMR. >> I would like to compress with snappy codec before storing them on S3. >> It is my understanding that hadoop currently uses a different snappy codec, >> mostly used as intermediate map output format . >> My question is how can I generate within my application logic (not MR) >> snappy compressed avro files? >> >> >> >> >> > >
smime.p7s
Description: S/MIME cryptographic signature
