Re: TableInputFormat and number of mappers == number of regions

Jean-Daniel Cryans Sat, 09 Apr 2011 09:27:06 -0700

You cannot have more mappers than you have regions, but you can have
less. Try going that way.


Also 149,624 regions is insane, is that really the case? I don't think
i've ever seen such a large deploy and it's probably bound to hit some
issues...

J-D

On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching <[email protected]> wrote:
> Hi,
>
> First off, I'd like to say thanks to the developers for HBase, it's been fun 
> to work with.
>
> I've been using TableInputFormat to run a Map-Reduce job and ran into an 
> issue.
>
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: 
> java.io.IOException: java.io.IOException: The number of tasks for this job 
> 149624 exceeds the configured limit 100000
>
> The table i'm accessing has 149624 regions, however my Hadoop instance won't 
> allow me to start a job with that many map tasks.  After briefly looking at 
> the TableInputFormatBase code, it appears that since TableSplit only knows 
> about a single region, my job will be forced into having mappers == # of 
> regions.  Since the Hadoop instance I'm using is shared, I'm concerned that 
> even if configured limit was raised, having Jobs with so many mappers would 
> eventually cause havoc to the job tracker.
>
> Given that I have no control over the number of regions in the table 
> (maintained by someone else), is the only solution to implement another input 
> format (i.e. MultiRegionTableFormat) that allows InputSplits to have more 
> than one region?  I don't mind doing it, but didn't want to write it if 
> another solution already exists.
>
> Apologies if this issue has been raised before, but a quick search didn't 
> turn anything up for me.
>
> Thanks,
>
> Avery
>

Re: TableInputFormat and number of mappers == number of regions

Reply via email to