You cannot have more mappers than you have regions, but you can have less. Try going that way.
Also 149,624 regions is insane, is that really the case? I don't think i've ever seen such a large deploy and it's probably bound to hit some issues... J-D On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching <[email protected]> wrote: > Hi, > > First off, I'd like to say thanks to the developers for HBase, it's been fun > to work with. > > I've been using TableInputFormat to run a Map-Reduce job and ran into an > issue. > > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: java.io.IOException: The number of tasks for this job > 149624 exceeds the configured limit 100000 > > The table i'm accessing has 149624 regions, however my Hadoop instance won't > allow me to start a job with that many map tasks. After briefly looking at > the TableInputFormatBase code, it appears that since TableSplit only knows > about a single region, my job will be forced into having mappers == # of > regions. Since the Hadoop instance I'm using is shared, I'm concerned that > even if configured limit was raised, having Jobs with so many mappers would > eventually cause havoc to the job tracker. > > Given that I have no control over the number of regions in the table > (maintained by someone else), is the only solution to implement another input > format (i.e. MultiRegionTableFormat) that allows InputSplits to have more > than one region? I don't mind doing it, but didn't want to write it if > another solution already exists. > > Apologies if this issue has been raised before, but a quick search didn't > turn anything up for me. > > Thanks, > > Avery >
