hadoopConf.set("mapreduce.job.inputformat.class",
"com.wizecommerce.utils.mapred.TextInputFormat");
hadoopConf.set("mapreduce.job.outputformat.class",
"com.wizecommerce.utils.mapred.TextOutputFormat");
No special settings required for reading Gzip except these above
I u want to output Gzip
hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true");
hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec",
"org.apache.hadoop.io.compress.GzipCodec");
Make sure Gzip codec is defined in core-site.xml
<!-- core-site.xml -->
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</value>
</property>
I have a question
Why are u using GZIP as input to Map ? These are not splittableā¦Unless u have
to read multilines (like lines between a BEGIN and END block in a log file) and
send it as one record to the mapper
Also in Non-splitable Snappy Codec is better
Good Luck
sanjay
From: samir das mohapatra
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, June 11, 2013 9:07 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>,
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>,
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Now give .gz file as input to the MAP
Hi All,
Did any one worked on, how to pass the .gz file as file input for
mapreduce job ?
Regards,
samir.
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited. If you
are not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message along with any attachments, from
your computer system. If you are the intended recipient, please be advised that
the content of this message is subject to access, review and disclosure by the
sender's Email System Administrator.