The number of regions is pretty insane, but not under my control unfortunately. 
 The workaround I suggested is to write another InputFormat and InputSplit such 
that each InputSplit is responsible for a configurable number of regions.  For 
example, if i have 100k regions and I configure each InputSplit to handle 1k 
regions, then I'll only have 100 map tasks.  Just was wondering if anyone else 
faced these issues.

Thanks for your quick response on a Saturday morning =),

Avery

On Apr 9, 2011, at 9:26 AM, Jean-Daniel Cryans wrote:

> You cannot have more mappers than you have regions, but you can have
> less. Try going that way.
> 
> Also 149,624 regions is insane, is that really the case? I don't think
> i've ever seen such a large deploy and it's probably bound to hit some
> issues...
> 
> J-D
> 
> On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching <[email protected]> wrote:
>> Hi,
>> 
>> First off, I'd like to say thanks to the developers for HBase, it's been fun 
>> to work with.
>> 
>> I've been using TableInputFormat to run a Map-Reduce job and ran into an 
>> issue.
>> 
>> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: 
>> java.io.IOException: java.io.IOException: The number of tasks for this job 
>> 149624 exceeds the configured limit 100000
>> 
>> The table i'm accessing has 149624 regions, however my Hadoop instance won't 
>> allow me to start a job with that many map tasks.  After briefly looking at 
>> the TableInputFormatBase code, it appears that since TableSplit only knows 
>> about a single region, my job will be forced into having mappers == # of 
>> regions.  Since the Hadoop instance I'm using is shared, I'm concerned that 
>> even if configured limit was raised, having Jobs with so many mappers would 
>> eventually cause havoc to the job tracker.
>> 
>> Given that I have no control over the number of regions in the table 
>> (maintained by someone else), is the only solution to implement another 
>> input format (i.e. MultiRegionTableFormat) that allows InputSplits to have 
>> more than one region?  I don't mind doing it, but didn't want to write it if 
>> another solution already exists.
>> 
>> Apologies if this issue has been raised before, but a quick search didn't 
>> turn anything up for me.
>> 
>> Thanks,
>> 
>> Avery
>> 

Reply via email to