I just looked at the source code for HBaseStorage. It uses a modified version of TableInputFormat under the hood. TableInputFormat, AFAIK, does not support controlling the number of launched Map tasks. It might be a worthwhile contribution to HBase to write an analogous version of a CombineInputFormat, so a single Map task can read multiple regions.
On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <[email protected]> wrote: > Hi Lei, > > I don't know if that helps you, I had the same problem with the > replication verify jobs I > run in our environment. > > I created a fairscheduler pool on the jobtracker called "admin" and > configured > this pool with the maximum mappers the job should take. > > I inserted in my hbase-site.xml this section > > <property> > <name>mapred.queue.name</name> > <value>admin</value> > </property> > <property> > > You need to insert this only on the node you start the job. > > Then I login as user "hbase" on that machine with the configuration. > > When i run my verify jobs as user "hbase" the job will go to the > fairscheduler pool > "admin" and will take only the allowed count of mappers. > > Before i took all mapper i could get. > > Regards Hansi > > > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr > > Von: "[email protected]" <[email protected]> > > An: user <[email protected]>, user <[email protected]> > > Betreff: How to set number of mappers when using HBaseStorage > > > > > > When using HBaseStorage to read data from hbase table, there will be one > mapper for one region. > > Howerver, my hbase table has more than 1000 regions and only 80 mappers > capacity. > > Is there a way to set the number of mappers when using HBaseStorage? > > > > Thanks, > > Lei > > > > > > > > [email protected] > > >
