Hi Niels,
Thanks for the reply.
Changing the avro files is not really an option for me as it will require a lot of time( i have a lot ).
The Avro files themself are compressed a bit.
But still bzip2 gives 50% compression on one avro file.

So what i want is , to use Bzip2 compressed file as an input to my MR jobs.
Bzip2 is splittable.
Should be possible somehow , but i don't seem to find it atm.

On 22.09.2014 17:13, Niels Basjes wrote:
Hi,

You can use the GZip inside the AVRO files and still have splittable AVRO files. This has the to with the fact that there is a block structure inside the AVRO and these blocks are gzipped.

I suggest you simply try it.

Niels


On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov <[email protected] <mailto:[email protected]>> wrote:

    Hi guys,
    I would like to compress the files on HDFS to save some storage.

    As far as i see bzip2 is the only format which is splitable (and
    slow).

    The actual files are Avro.

    So in my driver class i have :

    job.setInputFormatClass(AvroKeyInputFormat.class);

    I have number of jobs running processing Avro files so i would
    like to keep the code change to a minimum.

    Is it possible to comrpess these avro files with bzip2 and keep
    the code of MR jobs the same (or with little change)
    If it is , please give me some hints as so far i don't seem to
    find any good resources on the Internet.


    Georgi




--
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to