Re: map/reduce of compressed Avro

Martin Kleppmann Tue, 23 Apr 2013 03:39:05 -0700

To my knowledge, LZO is not a supported codec for Avro data files. It's
possible that you have a LZO-compressed Hadoop sequence file containing
Avro records, but that would be a format you defined yourself, and not the
same as an Avro data file.

Avro data files are designed to be splittable regardless of the codec they
use, so you can have multiple mappers that each consume a portion of the
input file. The format achieves that by breaking the data into blocks, and
compressing each block separately; hence it can be split at block
boundaries.

Best,
Martin

On 22 April 2013 23:47, nir_zamir <[email protected]> wrote:

> Thanks Martin.
>
> What will happen if I try to use an indexed LZO-compressed avro file? Will
> it work and utilize the index to allow multiple mappers?
>
> I think that for Snappy for example, the file is splittable and can use
> multiple mappers, but I haven't tested it yet - would be glad if anyone has
> any experience with that.
>
> Thanks!
> Nir.
>
>
>
> --
> View this message in context:
> http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4026947p4027009.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>

Re: map/reduce of compressed Avro

Reply via email to