Nothing special is required for process .gz files using MR. however , as Sanjay mentioned , verify the codec's configured in core-site and another thing to note is that these files are not splittable.
You might want to use bz2 , these are splittable. Thanks, Rahul On Wed, Jun 12, 2013 at 10:14 AM, Sanjay Subramanian < [email protected]> wrote: > hadoopConf.set("mapreduce.job.inputformat.class", > "com.wizecommerce.utils.mapred.TextInputFormat"); > > hadoopConf.set("mapreduce.job.outputformat.class", > "com.wizecommerce.utils.mapred.TextOutputFormat"); > No special settings required for reading Gzip except these above > > I u want to output Gzip > > hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true"); > > hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.GzipCodec"); > > Make sure Gzip codec is defined in core-site.xml > <!-- core-site.xml --> > <property> > <name>io.compression.codecs</name> > <value > >org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</ > value> > </property> > > I have a question > > Why are u using GZIP as input to Map ? These are not splittableā¦Unless u > have to read multilines (like lines between a BEGIN and END block in a log > file) and send it as one record to the mapper > > Also in Non-splitable Snappy Codec is better > > Good Luck > > > sanjay > > From: samir das mohapatra <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, June 11, 2013 9:07 PM > To: "[email protected]" <[email protected]>, " > [email protected]" <[email protected]>, " > [email protected]" <[email protected]> > Subject: Now give .gz file as input to the MAP > > Hi All, > Did any one worked on, how to pass the .gz file as file input for > mapreduce job ? > > Regards, > samir. > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >
