LZOP and pig.tmpfilecompression.codec=lzo

Alex Nastetsky Thu, 04 Dec 2014 05:20:52 -0800

Why does the "lzo" value for "pig.tmpfilecompression.codec" convert to
"com.hadoop.compression.lzo.LzoCodec" instead of
"com.hadoop.compression.lzo.LzopCodec"?


( See org.apache.pig.impl.util.Utils.java in Pig source )

My understanding is that LzopCodec has headers and blocks and is therefore
splittable, while LzoCodec is stream-based without headers/blocks and is
not. Wouldn't we want a splittable compression codec for temp data?

>From "Hadoop in Practice" book:

What’s the difference between LZO and LZOP? Both LZO and LZOP codecs
>
> are supplied for use with Hadoop. LZO is a stream-based compression store
>
> that doesn’t have the notion of blocks or headers. LZOP has the notion of
>
> blocks (that are checksummed), and therefore is the codec you want to use,
>
> especially if you want your compressed output to be splittable.
>> Confusingly,
>
> the Hadoop codecs by default treat files ending with the .lzo extension to
>
> be LZOP-encoded, and files ending with the .lzo_deflate extension to be
>
> LZO-encoded. Also, much of the documentation seems to use LZO and
>
> LZOP interchangeably.
>
>

LZOP and pig.tmpfilecompression.codec=lzo

Reply via email to