Let me have a try of the fair scheduler. Thanks, Lei
[email protected] From: Bryan Beaudreault Date: 2014-05-21 23:20 To: user Subject: Re: How to set number of mappers when using HBaseStorage Hansi's scheduler configuration is the real solution here, but combining more regions into a single split is useful for other reasons. Specifically it helps control load against an HBase cluster from the job; you don't always want 50 mappers running against a single regionserver. We run into this a lot at HubSpot, so I've created my own extension of TableInputFormat and corresponding RecordReader so that you can partition the mappers by regionserver. This allows you to split all the regions for a regionserver into a configurable number of mappers (1-N). I haven't contributed this yet, but you can get the code at https://gist.github.com/bbeaudreault/9788499 On Wed, May 21, 2014 at 11:12 AM, Pradeep Gollakota <[email protected]>wrote: > I just looked at the source code for HBaseStorage. It uses a modified > version of TableInputFormat under the hood. TableInputFormat, AFAIK, does > not support controlling the number of launched Map tasks. It might be a > worthwhile contribution to HBase to write an analogous version of a > CombineInputFormat, so a single Map task can read multiple regions. > > > On Wed, May 21, 2014 at 10:21 AM, Hansi Klose <[email protected]> wrote: > > > Hi Lei, > > > > I don't know if that helps you, I had the same problem with the > > replication verify jobs I > > run in our environment. > > > > I created a fairscheduler pool on the jobtracker called "admin" and > > configured > > this pool with the maximum mappers the job should take. > > > > I inserted in my hbase-site.xml this section > > > > <property> > > <name>mapred.queue.name</name> > > <value>admin</value> > > </property> > > <property> > > > > You need to insert this only on the node you start the job. > > > > Then I login as user "hbase" on that machine with the configuration. > > > > When i run my verify jobs as user "hbase" the job will go to the > > fairscheduler pool > > "admin" and will take only the allowed count of mappers. > > > > Before i took all mapper i could get. > > > > Regards Hansi > > > > > Gesendet: Mittwoch, 21. Mai 2014 um 04:16 Uhr > > > Von: "[email protected]" <[email protected]> > > > An: user <[email protected]>, user <[email protected]> > > > Betreff: How to set number of mappers when using HBaseStorage > > > > > > > > > When using HBaseStorage to read data from hbase table, there will be > one > > mapper for one region. > > > Howerver, my hbase table has more than 1000 regions and only 80 mappers > > capacity. > > > Is there a way to set the number of mappers when using HBaseStorage? > > > > > > Thanks, > > > Lei > > > > > > > > > > > > [email protected] > > > > > >
