It's creating temp files on the HDFS. See code below.Thanks for your response 
through, I wrote my own record reader which is passing file splits to 
LineRecordReader which works for my problem. 


public CompressedCombineFileRecordReader(CombineFileSplit split,
              TaskAttemptContext context, Integer index) throws IOException {
            
                        Configuration currentConf = context.getConfiguration();
                        this.path = split.getPath(index);
                        boolean isCompressed =  findCodec(currentConf ,path);
                        if(isCompressed)
                                codecWiseDecompress(context.getConfiguration());
        
                        fs = this.path.getFileSystem(currentConf);
                        
                        this.startOffset = split.getOffset(index);
        
                        if(isCompressed){
                                this.end = startOffset + rlength;
                        }else{
                                this.end = startOffset + split.getLength(index);
                                dPath =path;
                        }
                        
                        boolean skipFirstLine = false;
            
                fileIn = fs.open(dPath);
                
                if(isCompressed)  fs.deleteOnExit(dPath);
                
                if (startOffset != 0) {
                        skipFirstLine = true;
                        --startOffset;
                        fileIn.seek(startOffset);
                }
                reader = new LineReader(fileIn);
                if (skipFirstLine) {  
                        startOffset += reader.readLine(new Text(), 0,
                        (int)Math.min((long)Integer.MAX_VALUE, end - 
startOffset));
                }
                this.pos = startOffset;
          }

Date: Thu, 24 Sep 2015 14:38:45 +0530
Subject: Re: CombineFileInputFormat with Gzip files
From: mathursh...@gmail.com
To: user@hadoop.apache.org

what sought of side effects?

On Thu, Sep 24, 2015 at 2:35 PM, R P <hadoo...@outlook.com> wrote:



Thanks Harshit. That approach doesn't look good as it will write uncompressed 
data to HDFS resulting into job side effects. -R P

Date: Thu, 24 Sep 2015 09:55:49 +0530
Subject: Re: CombineFileInputFormat with Gzip files
From: mathursh...@gmail.com
To: user@hadoop.apache.org
CC: mapreduce-u...@hadoop.apache.org

Hi R P,

Follow this link,

http://www.ibm.com/developerworks/library/bd-hadoopcombine/


Regards,
Harshit

On Thu, Sep 24, 2015 at 4:46 AM, R P <hadoo...@outlook.com> wrote:



Hello All,
What is the best way to process small Gzip files with CombineFileInputFormat ?  
If possible please provide link to the documentation.Appreciate your help. 
Thanks,
*Adding  mapreduce-dev to the mailing list.

From: hadoo...@outlook.com
To: user@hadoop.apache.org
Subject: CombineFileInputFormat with Gzip files
Date: Tue, 22 Sep 2015 18:29:05 -0700




Hello All,  What is the best way to use CombineFileInputFormat with Gzip files 
as input? 
Thanks,
                                                                                
  


-- 
Harshit Mathur
                                          


-- 
Harshit Mathur
                                          

Reply via email to