Re: How does mapreduce job determine the compress codec

Azuryy Yu Sun, 15 Dec 2013 17:54:30 -0800

Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.





On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <jiayu...@gmail.com> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xiaotao.cs....@gmail.com>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <jiayu...@gmail.com>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>

Re: How does mapreduce job determine the compress codec

Reply via email to