Re: Compressed data storage in HDFS - Error

Bejoy Ks Wed, 06 Jun 2012 02:46:32 -0700

Hi Sreenath

The lzo error is because you don't have the lzo libraries in 
Hadoop_Home/lib/native folder. You need to pack/build lzo for the OS you are 
using.


In compression as you mentioned there is an overhead in decompressing while 
processing the records. HDFS is used to store large amount of data so 
compression saves much on storage space (consider replication as well). Now it 
is not final output compression that speeds up map reduce jobs but it the 
intermediate compression that has this advantage. Intermediate compression 
means compression of map output. In a map reduce job there is much of copy and 
shuffle happening between the map and reduce phases, when this intermediate 
data is compressed this operation is faster as it consumes much lesser IO. 


The following properties enables intermediate compression
mapred.compress.map.output=true
mapred.map.output.compression.codec= hadoop.compression.lzo.LzoCodec


Regards
Bejoy KS



________________________________
 From: Siddharth Tiwari <siddharth.tiw...@live.com>
To: "user@hive.apache.org " <user@hive.apache.org> 
Sent: Wednesday, June 6, 2012 2:58 PM
Subject: RE: Compressed data storage in HDFS - Error
 

There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive 
different experience for different tasks ie HDFS read , HDFS write , shuffle 
and sort . So to go for compression or not depends on your usages .
Sent from my N8




-----Original Message----- 
From: Sreenath Menon 
Sent: 6/6/2012 8:50:23 AM 
To: user@hive.apache.org 
Subject: Compressed data storage in HDFS - Error 
I would like to compress my data in the HDFS using some Hive commands.
Step followed: (data already residing in table sample)

create table rc_lzo like sample;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
insert overwrite table rc_lzo select * from sample;

Error:
Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found

1)What do I need to do to use Lzo as well as other compression methods?

2)Heard somewhere that :Using compressed data will produce better results than 
uncompressed data in some cases. How can this be, as there is always a 
compression and decompression time allotted with compression methods. Any truth 
in this, if so how ? Can understand
 how there are better results when using compression between 
mappers-to-reducers and in between map-reduce jobs.

Thanks and Regards
Sreenath Mullassery

Re: Compressed data storage in HDFS - Error

Reply via email to