I found the following works for me.
FileInputFormat.setMaxInputSplitSize(job, 10L * 1024L);
Kim
On 11/09/2011 04:11 AM, Radim Kolar wrote:
I have 2 input seq files 32MB each. I want to run them on as many
mappers as possible.
i appended -D mapred.max.split.size=100 as command line argum
Radim,
Alternatively you could write your own input format that will split the data
differently, but that is going to take some real digging into the sequence file
format and is going to be error prone. I would suggest that you create several
smaller input files, as Justin said.
--Bobby Evans
i found following usefull post on this topic
http://yaseminavcular.blogspot.com/2011/06/how-to-set-number-of-maps-with-hadoop.html
its more clear than wiki page.
Radim,
In this case, it doesn't matter how many mappers you provide in your
job configuration. Hadoop will only give 1 mapper per split. Since
your files are less than 64MB (assuming you're using the default block
size of HDFS), you only have 2 splits. If you really need more
mappers, you need to
I have 2 input seq files 32MB each. I want to run them on as many
mappers as possible.
i appended -D mapred.max.split.size=100 as command line argument to
job, but there is no difference. Job still runs on 2 mappers.
How split size works? Is max split size used for reading or writing fil