Re: Use of CombineFileInputFormat

2012-09-28 Thread Harsh J
Combines multiple InputSplits per Mapper (CombineFileSplit), read in serial. Reduces # of mappers for inputs that carry several (usually small) files/blocks. On Fri, Sep 28, 2012 at 6:54 AM, Jay Vyas wrote: > Its not clear to me what the CombineInputFormat really is ? Can somebody > elaborate ?

Re: CombineFileInputFormat

2012-04-09 Thread Stan Rosenberg
or hive, both of which come with their own implementation of CombineFileInputFormat.

Re: CombineFileInputFormat

2012-04-09 Thread Deepak Nettem
Hi Stan, Just out of curiosity, care to explain the use case a bit? On Mon, Apr 9, 2012 at 5:25 PM, Stan Rosenberg wrote: > Hi, > > I just came across a use case requiring CombineFileInputFormat under > hadoop 0.20.2. I was surprised that the API does not provide a > default &

CombineFileInputFormat

2012-04-09 Thread Stan Rosenberg
Hi, I just came across a use case requiring CombineFileInputFormat under hadoop 0.20.2. I was surprised that the API does not provide a default implementation. A precursory check against newer APIs also returned the same result. What's the rationale? I ended up writing my own implement

Re: help on CombineFileInputFormat

2010-05-10 Thread Aaron Kimball
SequenceFileRecordReader both require the InputSplit to be a FileSplit. So you can't use them directly. (CombineFileInputFormat will pass a CombineFileSplit to the CombineFileRecordReader which is then passed along to the child RR that you specify.) In Sqoop I got around this by creating (anot

help on CombineFileInputFormat

2010-05-06 Thread Zhenyu Zhong
Hi, I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend it because it is an abstract class. However, I need to implement getRecordReader method in the extended class. May I ask how to implement this getRecordReader method? I tried to do something like this: public

Re: about CombineFileInputFormat

2010-05-04 Thread Amareshwari Sri Ramadasu
See patch on https://issues.apache.org/jira/browse/MAPREDUCE-364 as an example. -Amareshwari On 5/5/10 1:52 AM, "Zhenyu Zhong" wrote: Hi, I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend it because it is an abstract class. However, I need to

about CombineFileInputFormat

2010-05-04 Thread Zhenyu Zhong
Hi, I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend it because it is an abstract class. However, I need to implement getRecordReader method in the extended class. May I ask how to implement this getRecordReader method? I tried to do something like this: public

Re: CombineFileInputFormat not producing multiple mappers

2010-04-29 Thread Keith Wiley
9, at 11:53 PM, Aleksandar Stupar wrote: Hi, if the mapred.max.split.size is not set (and it's not by default) than CombineFileInputFormat only takes racks in account when grouping blocks. So if you set this property it will take also block placement on machines into account and you

Re: CombineFileInputFormat not producing multiple mappers

2010-04-29 Thread Aleksandar Stupar
Hi, if the mapred.max.split.size is not set (and it's not by default) than CombineFileInputFormat only takes racks in account when grouping blocks. So if you set this property it will take also block placement on machines into account and you should get multiple mappers. Hope this

CombineFileInputFormat not producing multiple mappers

2010-04-29 Thread Keith Wiley
I am using CombineFileInputFormat and CombineFileSplit to group small input files as fed to the mappers. The job runs properly and the output is correct, but I get only one mapper task, so I lose all my paralleization in the map stage. I realize I'm not providing much detail yet becaus

Re: CombineFileInputFormat in 0.20.2 version

2010-03-16 Thread Aaron Kimball
rsion of Hadoop and recompile, but that might be tricky since the filenames will most likely not line up (due to the project split). - Aaron On Tue, Mar 16, 2010 at 8:11 AM, Aleksandar Stupar < stupar.aleksan...@yahoo.com> wrote: > Hi all, > > I want to use CombineFileInputFormat i

CombineFileInputFormat in 0.20.2 version

2010-03-16 Thread Aleksandar Stupar
Hi all, I want to use CombineFileInputFormat in 0.20.2 version but it can't be used with Job class. Description: org.apache.hadoop.mapred.lib.CombineFileInputFormat can not be used with org.apache.hadoop.mapreduce.Job because Job.setInputFormat requires subcla