The number of regions is pretty insane, but not under my control unfortunately. The workaround I suggested is to write another InputFormat and InputSplit such that each InputSplit is responsible for a configurable number of regions. For example, if i have 100k regions and I configure each InputSplit to handle 1k regions, then I'll only have 100 map tasks. Just was wondering if anyone else faced these issues.
Thanks for your quick response on a Saturday morning =), Avery On Apr 9, 2011, at 9:26 AM, Jean-Daniel Cryans wrote: > You cannot have more mappers than you have regions, but you can have > less. Try going that way. > > Also 149,624 regions is insane, is that really the case? I don't think > i've ever seen such a large deploy and it's probably bound to hit some > issues... > > J-D > > On Sat, Apr 9, 2011 at 9:15 AM, Avery Ching <[email protected]> wrote: >> Hi, >> >> First off, I'd like to say thanks to the developers for HBase, it's been fun >> to work with. >> >> I've been using TableInputFormat to run a Map-Reduce job and ran into an >> issue. >> >> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: >> java.io.IOException: java.io.IOException: The number of tasks for this job >> 149624 exceeds the configured limit 100000 >> >> The table i'm accessing has 149624 regions, however my Hadoop instance won't >> allow me to start a job with that many map tasks. After briefly looking at >> the TableInputFormatBase code, it appears that since TableSplit only knows >> about a single region, my job will be forced into having mappers == # of >> regions. Since the Hadoop instance I'm using is shared, I'm concerned that >> even if configured limit was raised, having Jobs with so many mappers would >> eventually cause havoc to the job tracker. >> >> Given that I have no control over the number of regions in the table >> (maintained by someone else), is the only solution to implement another >> input format (i.e. MultiRegionTableFormat) that allows InputSplits to have >> more than one region? I don't mind doing it, but didn't want to write it if >> another solution already exists. >> >> Apologies if this issue has been raised before, but a quick search didn't >> turn anything up for me. >> >> Thanks, >> >> Avery >>
