It's creating temp files on the HDFS. See code below.Thanks for your response through, I wrote my own record reader which is passing file splits to LineRecordReader which works for my problem.
public CompressedCombineFileRecordReader(CombineFileSplit split, TaskAttemptContext context, Integer index) throws IOException { Configuration currentConf = context.getConfiguration(); this.path = split.getPath(index); boolean isCompressed = findCodec(currentConf ,path); if(isCompressed) codecWiseDecompress(context.getConfiguration()); fs = this.path.getFileSystem(currentConf); this.startOffset = split.getOffset(index); if(isCompressed){ this.end = startOffset + rlength; }else{ this.end = startOffset + split.getLength(index); dPath =path; } boolean skipFirstLine = false; fileIn = fs.open(dPath); if(isCompressed) fs.deleteOnExit(dPath); if (startOffset != 0) { skipFirstLine = true; --startOffset; fileIn.seek(startOffset); } reader = new LineReader(fileIn); if (skipFirstLine) { startOffset += reader.readLine(new Text(), 0, (int)Math.min((long)Integer.MAX_VALUE, end - startOffset)); } this.pos = startOffset; } Date: Thu, 24 Sep 2015 14:38:45 +0530 Subject: Re: CombineFileInputFormat with Gzip files From: mathursh...@gmail.com To: user@hadoop.apache.org what sought of side effects? On Thu, Sep 24, 2015 at 2:35 PM, R P <hadoo...@outlook.com> wrote: Thanks Harshit. That approach doesn't look good as it will write uncompressed data to HDFS resulting into job side effects. -R P Date: Thu, 24 Sep 2015 09:55:49 +0530 Subject: Re: CombineFileInputFormat with Gzip files From: mathursh...@gmail.com To: user@hadoop.apache.org CC: mapreduce-u...@hadoop.apache.org Hi R P, Follow this link, http://www.ibm.com/developerworks/library/bd-hadoopcombine/ Regards, Harshit On Thu, Sep 24, 2015 at 4:46 AM, R P <hadoo...@outlook.com> wrote: Hello All, What is the best way to process small Gzip files with CombineFileInputFormat ? If possible please provide link to the documentation.Appreciate your help. Thanks, *Adding mapreduce-dev to the mailing list. From: hadoo...@outlook.com To: user@hadoop.apache.org Subject: CombineFileInputFormat with Gzip files Date: Tue, 22 Sep 2015 18:29:05 -0700 Hello All, What is the best way to use CombineFileInputFormat with Gzip files as input? Thanks, -- Harshit Mathur -- Harshit Mathur