You can leave your config value there. Remember to record such change in a place for future reference - you may change other cost parameter later.
The side-effects of this change partially depend on how you want your cluster balanced. I suggest you go over the CostFunction's in StochasticLoadBalancer so that you know which factors (and their weights) load balancer considers. Cheers On Tue, Jul 22, 2014 at 8:43 AM, Brian Jeltema < [email protected]> wrote: > That did the trick. I set it to 100 and regions are uniform now. Should I > leave it there? What are the side-effects of this change? > > Thanks. > > Brian > > On Jul 22, 2014, at 11:28 AM, Ted Yu <[email protected]> wrote: > > > Here is code snippet from StochasticLoadBalancer > > w.r.t. TableSkewCostFunction : > > > > private static final String TABLE_SKEW_COST_KEY = > > > > "hbase.master.balancer.stochastic.tableSkewCost"; > > > > private static final float DEFAULT_TABLE_SKEW_COST = 35; > > > > TableSkewCostFunction(Configuration conf) { > > > > super(conf); > > > > this.setMultiplier(conf.getFloat(TABLE_SKEW_COST_KEY, > > DEFAULT_TABLE_SKEW_COST)); > > > > You can try increasing the value for > > "hbase.master.balancer.stochastic.tableSkewCost" > > > > > > Cheers > > > > > > On Tue, Jul 22, 2014 at 6:59 AM, Brian Jeltema < > > [email protected]> wrote: > > > >> I don’t understand the logging output, but I do see a strange pattern. > >> I’ll try to summarize. > >> > >> There are 5 RegionServers, call them rs1 through rs5. There are a total > of > >> 174 regions for the table in question, > >> with 69 in rs1. In the log output I see lines (greatly simplified) like > >> the following: > >> > >> AssignmentManager: Assigning fooTable, …. to rs2 > >> AssignmentManager: Assigning fooTable, …. to rs3 > >> AssignmentManager: Assigning fooTable, …. to rs4 > >> AssignmentManager: Assigning fooTable, …. to rs5 > >> > >> There are 106 such lines, none logging an assignment to rs1 > >> > >> I also see 105 lines like: > >> > >> AssignmentManager: Using pre-existing plan for fooTable … src=rs1 … > >> dest=rs2 > >> AssignmentManager: Using pre-existing plan for fooTable … src=rs1 … > >> dest=rs3 > >> … > >> > >> where src=rs1 in every case, and dest=rs1 never occurs. > >> > >> I don’t see any exceptions or log output that reports a problem. > >> > >> > >> On Jul 22, 2014, at 9:18 AM, Ted Yu <[email protected]> wrote: > >> > >>> The load balancer in 0.98 considers many factors when making balancing > >> decisions. > >>> > >>> Can you take a look at the master log and look for balancer related > >> lines ? > >>> That would give you some clue. > >>> > >>> Cheers > >>> > >>> On Jul 22, 2014, at 5:03 AM, Brian Jeltema < > >> [email protected]> wrote: > >>> > >>>> I ran the balancer from hbase shell, but don’t see any change. Is > there > >> a way to balance a specific table? > >>>> > >>>>> bq. One RegionServer has 69 regions > >>>>> > >>>>> Can you run load balancer so that your regions are better balanced ? > >>>>> > >>>>> Cheers > >>>>> > >>>>> > >>>>> On Mon, Jul 21, 2014 at 6:56 AM, Brian Jeltema < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> There are 174 regions, not well balanced. One RegionServer has 69 > >> regions. > >>>>>> That RegionServer generates a > >>>>>> series of log entries (modified and shown below), one for each > >> region, at > >>>>>> roughly 1 to 2 second intervals. The timeout period expires when > >>>>>> it reaches region 36. > >>>>>> > >>>>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Creating references > for > >>>>>> hfiles > >>>>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Adding snapshot > >> references > >>>>>> for [hdfs:// > >>>>>> > >> > xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2 > >> ] > >>>>>> hfiles > >>>>>> 2014-07-21 07:49:44,503 regionserver.HRegion: Creating reference for > >> file > >>>>>> (1/1) : hdfs:// > >>>>>> > >> > xxx.digitalenvoy.net:8020/apps/hbase/data/data/default/hosts/31e2a098e9e311c4ddcfd3d8da28dfb6/p/3749b6df36c749508fe9c6f54ca425f2 > >>>>>> 2014-07-21 07:49:45,136 snapshot.FlushSnapshotSubprocedure: ... > Flush > >>>>>> Snapshotting region > >>>>>> hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6. > >>>>>> completed. > >>>>>> 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Closing > >> region > >>>>>> operation on > >>>>>> > >> > hosts,\x00\x03|\xBF!,1400600029600.31e2a098e9e311c4ddcfd3d8da28dfb6.2014-07-21 > >>>>>> 07:49:45,137 DEBUG [rs(xxx.digitalenvoy.net > >> ,60020,1405943192177)-snapshot-pool3-thread-1] > >>>>>> snapshot.FlushSnapshotSubprocedure: Starting region operation on > >>>>>> hosts,\x00\x8A\x90\xD6\x08,1400 > >>>>>> 659179080.a74402fcbd9a96a7c92b250721095729.2014-07-21 07:49:45,137 > >> DEBUG > >>>>>> [member: ‘xxx.digitalenvoy.net,60020,1405943192177' > >>>>>> subprocedure-pool1-thread-2] snapshot.RegionServerSnapshotManager: > >>>>>> Completed 1/174 local region snapshots. > >>>>>> 2014-07-21 07:49:45,137 snapshot.FlushSnapshotSubprocedure: Flush > >>>>>> Snapshotting region > >>>>>> > >> > hosts,\x00\x8A\x90\xD6\x08,1400659179080.a74402fcbd9a96a7c92b250721095729. > >>>>>> started... > >>>>>> 2014-07-21 07:49:45,137 regionserver.HRegion: Storing region-info > for > >>>>>> snapshot. > >>>>>> > >>>>>> On Jul 21, 2014, at 9:21 AM, Jean-Marc Spaggiari < > >> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Can you also tell us more about your table? How many regions on how > >> many > >>>>>>> region servers? > >>>>>>> > >>>>>>> > >>>>>>> 2014-07-21 8:23 GMT-04:00 Ted Yu <[email protected]>: > >>>>>>> > >>>>>>>> Normally such timeout is caused by one region server which is slow > >> in > >>>>>>>> completing its part of the snapshot procedure. > >>>>>>>> > >>>>>>>> Have you looked at region server logs ? > >>>>>>>> Feel free to pastebin relevant portion. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> On Jul 21, 2014, at 4:03 AM, Brian Jeltema < > >>>>>> [email protected]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> I’m running HBase 0.98. I’m trying to snapshot a table, but it’s > >> timing > >>>>>>>> out after 60 seconds. > >>>>>>>>> I increased the value of hbase.snapshot.master.timeoutMillis and > >>>>>>>> restarted HBase, > >>>>>>>>> but the timeout still happens after 60 seconds. Any suggestions? > >>>>>>>>> > >>>>>>>>> Brian > >>>> > >>> > >> > >> > >
