Out of curiosity, are there any other file formats that provide splittable gzip compression like Avro object containers? I can only think of Sequence Files.
On 4/29/13 3:47 PM, "Scott Carey" <[email protected]> wrote: >Martin said it already, but I will emphasize: > >Avro data files are splittable and can support multiple mappers no matter >what codec is used for compression. This is because avro files are block >based, and only use the compression within the block. I recommend >starting with gzip compression, and moving to snappy only if deflate >compression level '1' is not fast enough. > >For more information on avro data files, see: >http://avro.apache.org/docs/current/spec.html#Object+Container+Files > > > >On 4/22/13 11:47 PM, "nir_zamir" <[email protected]> wrote: > >>Thanks Martin. >> >>What will happen if I try to use an indexed LZO-compressed avro file? >>Will >>it work and utilize the index to allow multiple mappers? >> >>I think that for Snappy for example, the file is splittable and can use >>multiple mappers, but I haven't tested it yet - would be glad if anyone >>has >>any experience with that. >> >>Thanks! >>Nir. >> >> >> >>-- >>View this message in context: >>http://apache-avro.679487.n3.nabble.com/map-reduce-of-compressed-Avro-tp4 >>0 >>26947p4027009.html >>Sent from the Avro - Users mailing list archive at Nabble.com. > >
