Combines multiple InputSplits per Mapper (CombineFileSplit), read in
serial. Reduces # of mappers for inputs that carry several (usually
small) files/blocks.
On Fri, Sep 28, 2012 at 6:54 AM, Jay Vyas wrote:
> Its not clear to me what the CombineInputFormat really is ? Can somebody
> elaborate ?
or hive, both of which
come with their own implementation of CombineFileInputFormat.
Hi Stan,
Just out of curiosity, care to explain the use case a bit?
On Mon, Apr 9, 2012 at 5:25 PM, Stan Rosenberg wrote:
> Hi,
>
> I just came across a use case requiring CombineFileInputFormat under
> hadoop 0.20.2. I was surprised that the API does not provide a
> default
&
Hi,
I just came across a use case requiring CombineFileInputFormat under
hadoop 0.20.2. I was surprised that the API does not provide a
default
implementation. A precursory check against newer APIs also returned
the same result.
What's the rationale? I ended up writing my own implement
SequenceFileRecordReader both require the
InputSplit to be a FileSplit. So you can't use them directly.
(CombineFileInputFormat will pass a CombineFileSplit to the
CombineFileRecordReader which is then passed along to the child RR that you
specify.)
In Sqoop I got around this by creating (anot
Hi,
I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend
it because it is an abstract class.
However, I need to implement getRecordReader method in the extended class.
May I ask how to implement this getRecordReader method?
I tried to do something like this:
public
See patch on https://issues.apache.org/jira/browse/MAPREDUCE-364 as an example.
-Amareshwari
On 5/5/10 1:52 AM, "Zhenyu Zhong" wrote:
Hi,
I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend
it because it is an abstract class.
However, I need to
Hi,
I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend
it because it is an abstract class.
However, I need to implement getRecordReader method in the extended class.
May I ask how to implement this getRecordReader method?
I tried to do something like this:
public
9, at 11:53 PM, Aleksandar Stupar wrote:
Hi,
if the mapred.max.split.size is not set (and it's not by default)
than CombineFileInputFormat
only takes racks in account when grouping blocks. So if you set this
property it will take also
block placement on machines into account and you
Hi,
if the mapred.max.split.size is not set (and it's not by default) than
CombineFileInputFormat
only takes racks in account when grouping blocks. So if you set this property
it will take also
block placement on machines into account and you should get multiple mappers.
Hope this
I am using CombineFileInputFormat and CombineFileSplit to group small input
files as fed to the mappers. The job runs properly and the output is correct,
but I get only one mapper task, so I lose all my paralleization in the map
stage.
I realize I'm not providing much detail yet becaus
rsion of Hadoop and recompile, but that might be tricky since the
filenames will most likely not line up (due to the project split).
- Aaron
On Tue, Mar 16, 2010 at 8:11 AM, Aleksandar Stupar <
stupar.aleksan...@yahoo.com> wrote:
> Hi all,
>
> I want to use CombineFileInputFormat i
Hi all,
I want to use CombineFileInputFormat in 0.20.2 version but it can't be used
with Job class.
Description:
org.apache.hadoop.mapred.lib.CombineFileInputFormat can not be used with
org.apache.hadoop.mapreduce.Job
because Job.setInputFormat requires subcla
13 matches
Mail list logo