Re: increasing number of mappers.

2011-11-09 Thread Kim Ebert
I found the following works for me. FileInputFormat.setMaxInputSplitSize(job, 10L * 1024L); Kim On 11/09/2011 04:11 AM, Radim Kolar wrote: I have 2 input seq files 32MB each. I want to run them on as many mappers as possible. i appended -D mapred.max.split.size=100 as command line argum

Re: increasing number of mappers.

2011-11-09 Thread Robert Evans
Radim, Alternatively you could write your own input format that will split the data differently, but that is going to take some real digging into the sequence file format and is going to be error prone. I would suggest that you create several smaller input files, as Justin said. --Bobby Evans

Re: increasing number of mappers.

2011-11-09 Thread Radim Kolar
i found following usefull post on this topic http://yaseminavcular.blogspot.com/2011/06/how-to-set-number-of-maps-with-hadoop.html its more clear than wiki page.

Re: increasing number of mappers.

2011-11-09 Thread Justin Woody
Radim, In this case, it doesn't matter how many mappers you provide in your job configuration. Hadoop will only give 1 mapper per split. Since your files are less than 64MB (assuming you're using the default block size of HDFS), you only have 2 splits. If you really need more mappers, you need to

increasing number of mappers.

2011-11-09 Thread Radim Kolar
I have 2 input seq files 32MB each. I want to run them on as many mappers as possible. i appended -D mapred.max.split.size=100 as command line argument to job, but there is no difference. Job still runs on 2 mappers. How split size works? Is max split size used for reading or writing fil