Re: Enabling compression

Harsh J Tue, 09 Apr 2013 01:44:38 -0700

Hi Vinod,

In Avro, compression is provided only at the file container level
(i.e. block compression).

For compressing a simple byte array, you can rely on the Hadoop's
compression classes such as a GzipCodec [1] to compress the byte
stream directly (wrapping via a compressed output stream [2] got by
its helper method [3]).

Something like this, for example (I've not tested it out):

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GzipCodec codec = ReflectionUtils.newInstance(GzipCodec.class, new
Configuration());
OutputStream compressedOutputStream = codec.createOutputStream(outputStream);
[… Encode over compressedOutputStream, etc. …]

[1] - 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html
[2] - 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressorStream.html
[3] - 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/GzipCodec.html#createOutputStream(java.io.OutputStream)

On Tue, Apr 9, 2013 at 11:17 AM, Vinod Jammula
<vinod.kumar.jamm...@ericsson.com> wrote:
> Hi,
>
> I have a a csv string which I want to serialize, compress and write to a
> database.
>
> I have the following code to serialize the string
>
> ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);
> GenericDatumWriter w = new GenericDatumWriter(schema);
> w.write(record, e)
> byte[] avroBytes = outputStream.toByteArray();
>
>
> Following code to de-serialize and process the record.
>
> DatumReader<GenericRecord> reader = new
> GenericDatumReader<GenericRecord>(schema);
>
>  Decoder decoder = DecoderFactory.get().binaryDecoder(avroBytes, null);
>
> GenericRecord record = reader.read(decoder, null);
>
>
> I find compression with DataFileWriter and DataFileReader. But how to enable
> the compression for avro serialized buffer.
>
> Thanks and Regards,
> Vinod

-- 
Harsh J

Re: Enabling compression

Reply via email to