Why does the "lzo" value for "pig.tmpfilecompression.codec" convert to "com.hadoop.compression.lzo.LzoCodec" instead of "com.hadoop.compression.lzo.LzopCodec"?
( See org.apache.pig.impl.util.Utils.java in Pig source ) My understanding is that LzopCodec has headers and blocks and is therefore splittable, while LzoCodec is stream-based without headers/blocks and is not. Wouldn't we want a splittable compression codec for temp data? >From "Hadoop in Practice" book: What’s the difference between LZO and LZOP? Both LZO and LZOP codecs > > are supplied for use with Hadoop. LZO is a stream-based compression store > > that doesn’t have the notion of blocks or headers. LZOP has the notion of > > blocks (that are checksummed), and therefore is the codec you want to use, > > especially if you want your compressed output to be splittable. >> Confusingly, > > the Hadoop codecs by default treat files ending with the .lzo extension to > > be LZOP-encoded, and files ending with the .lzo_deflate extension to be > > LZO-encoded. Also, much of the documentation seems to use LZO and > > LZOP interchangeably. > >