It's creating temp files on the HDFS. See code below.Thanks for your response
through, I wrote my own record reader which is passing file splits to
LineRecordReader which works for my problem.
public CompressedCombineFileRecordReader(CombineFileSplit split,
TaskAttemptContext context, Integer index) throws IOException {
Configuration currentConf = context.getConfiguration();
this.path = split.getPath(index);
boolean isCompressed = findCodec(currentConf ,path);
if(isCompressed)
codecWiseDecompress(context.getConfiguration());
fs = this.path.getFileSystem(currentConf);
this.startOffset = split.getOffset(index);
if(isCompressed){
this.end = startOffset + rlength;
}else{
this.end = startOffset + split.getLength(index);
dPath =path;
}
boolean skipFirstLine = false;
fileIn = fs.open(dPath);
if(isCompressed) fs.deleteOnExit(dPath);
if (startOffset != 0) {
skipFirstLine = true;
--startOffset;
fileIn.seek(startOffset);
}
reader = new LineReader(fileIn);
if (skipFirstLine) {
startOffset += reader.readLine(new Text(), 0,
(int)Math.min((long)Integer.MAX_VALUE, end -
startOffset));
}
this.pos = startOffset;
}
Date: Thu, 24 Sep 2015 14:38:45 +0530
Subject: Re: CombineFileInputFormat with Gzip files
From: [email protected]
To: [email protected]
what sought of side effects?
On Thu, Sep 24, 2015 at 2:35 PM, R P <[email protected]> wrote:
Thanks Harshit. That approach doesn't look good as it will write uncompressed
data to HDFS resulting into job side effects. -R P
Date: Thu, 24 Sep 2015 09:55:49 +0530
Subject: Re: CombineFileInputFormat with Gzip files
From: [email protected]
To: [email protected]
CC: [email protected]
Hi R P,
Follow this link,
http://www.ibm.com/developerworks/library/bd-hadoopcombine/
Regards,
Harshit
On Thu, Sep 24, 2015 at 4:46 AM, R P <[email protected]> wrote:
Hello All,
What is the best way to process small Gzip files with CombineFileInputFormat ?
If possible please provide link to the documentation.Appreciate your help.
Thanks,
*Adding mapreduce-dev to the mailing list.
From: [email protected]
To: [email protected]
Subject: CombineFileInputFormat with Gzip files
Date: Tue, 22 Sep 2015 18:29:05 -0700
Hello All, What is the best way to use CombineFileInputFormat with Gzip files
as input?
Thanks,
--
Harshit Mathur
--
Harshit Mathur