Hi All, How to compress a folder in hadoop?
I want to compress a folder which has old data and not frequently used. How can I do that ? When I searched the web I got some idea to compress the files. Can some please help me understanding Why files are not in .lzo or .gz format. I am test executing below command for two types of compression, lzo and gzip when I check the files they are of same size. How do I check if the compression was successful,When I cat the files I can see the data. MR job was successfull and created these file.? # hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true" "-Dmapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec" "-Dmapreduce.output.compress=true" "-Dmapreduce.output.compression.codec=com.hadoop.compression.lzo.LzopCodec" -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfslzo # hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true" "-Dmapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec" "-Dmapreduce.output.compress=true" "-Dmapreduce.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec" -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfsgzip output partfiles below. 15/08/05 18:36:07 INFO streaming.StreamJob: Output directory: /tmp/hdfs/hdfsgzip # hadoop fs -ls /tmp/hdfs/hdfsgzip Found 5 items -rw-r--r-- 3 hdfs supergroup 0 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/_SUCCESS -rw-r--r-- 3 hdfs supergroup 6061954911 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/part-00000 -rw-r--r-- 3 hdfs supergroup 6062727606 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-00001 -rw-r--r-- 3 hdfs supergroup 6064932250 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-00002 -rw-r--r-- 3 hdfs supergroup 6062737354 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/part-00003 # hadoop fs -ls /tmp/hdfs/hdfslzo Found 5 items -rw-r--r-- 3 hdfs supergroup 0 2015-08-05 18:28 /tmp/hdfs/hdfslzo/_SUCCESS -rw-r--r-- 3 hdfs supergroup 6061954911 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00000 -rw-r--r-- 3 hdfs supergroup 6062727606 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00001 -rw-r--r-- 3 hdfs supergroup 6064932250 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00002 -rw-r--r-- 3 hdfs supergroup 6062737354 2015-08-05 18:28 /tmp/hdfs/hdfslzo/part-00003 it will be great help if you point me to any link regarding compression. Thanks Jay
