Thanks Suraj Varma,I have put the log into the pastebin.com.
master log: http://pastebin.com/QWv3K9HQ
regionserver log:http://pastebin.com/LM27ui72
Because there is a lot of "region is not online" in the regionserver log,so
I have filter this in the regionserver log.
The following is the count of "Region is not online:" log,start 23:16,there
is a lot of access fail because the region is not online.
--------------------------d70285c1a12dec9289ce9290c9349a79
1 23:16
103 23:36
142 23:37
169 23:38
94 23:39
120 23:40
39 23:41
110 23:42
104 23:43
114 23:44
90 23:45
121 23:46
104 23:47
74 23:48
96 23:49
100 23:50
125 23:51
59 23:52
113 23:53
134 23:54
127 23:55
131 23:56
119 23:57
82 23:58
165 23:59
and the region "d70285c1a12dec9289ce9290c9349a79" is move between two
regionserver again and again by balancer.Start 23:36,the region is move
from regionserver 192.168.18.40 to 192.168.18.40 and fail.
2012/7/19 Suraj Varma <[email protected]>
> You can use pastebin.com or similar services to cut/paste your logs.
> --S
>
> On Tue, Jul 17, 2012 at 7:11 PM, Howard <[email protected]> wrote:
> > this problem just only once,Because it happens two day before,I remember
> I
> > check the master-status and only always see regions is "pending open" in
> > Regions in Transition,not see there was two regionservers in the same
> > server.
> >
> > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what
> > does "60020,1342017399608" mean?Is there some document can help to read
> > the source code?
> > If still need to upload the log,how to upload the log?
> > sorry I am a freshman with HBase.
> >
> > 2012/7/17 Ted Yu <[email protected]>
> >
> >> Howard:
> >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out
> >> yesterday ?
> >> I guess you have noticed the following toward the end of log snippet:
> >>
> >> 2012-07-16 00:17:50,774 DEBUG
> >> org.apache.hadoop.hbase.
> >> master.handler.OpenedRegionHandler: Handling OPENED
> >> event for
> >>
> >>
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956.
> >> from 192.168.1.2,60020,1342017399608; deleting unassigned node
> >>
> >> As Ram pointed out, there might be two region server processes running
> on
> >> 192.168.1.2
> >>
> >> Please verify whether that was the case.
> >>
> >> Cheers
> >>
> >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan <
> >> [email protected]> wrote:
> >>
> >> > From the logs I can see that though the server's are same their start
> >> code
> >> > is different.
> >> > Need to analyse the previous logs also. Pls file a JIRA, if possible
> >> > attach
> >> > the logs to that.
> >> >
> >> > Thanks Howard.
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> > > -----Original Message-----
> >> > > From: Howard [mailto:[email protected]]
> >> > > Sent: Tuesday, July 17, 2012 7:32 PM
> >> > > To: [email protected]
> >> > > Subject: Re: Load balancer repeatedly close and open region in the
> same
> >> > > regionserver.
> >> > >
> >> > > *hi,Ted Yu,thanks for your reply:*
> >> > > the hbase and hadoop version is
> >> > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1,
> >> > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638
> >> > > the following is a detail log about the same region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956,
> >> > > and it repeats again and again.:
> >> > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster:
> >> > > balance
> >> > >
> hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
> >> > > e1b7e956.,
> >> > > src=192.168.1.2,60020,1342017399608,
> >> > > dest=192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:12:49,843 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> unassignment
> >> > > of
> >> > > region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > (offlining)
> >> > > 2012-07-16 00:12:49,843 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Creating unassigned node for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state
> >> > > 2012-07-16 00:12:49,845 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> >> > > 192.168.1.2,60020,1342017399608 for region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > 2012-07-16 00:12:50,555 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_CLOSED,
> server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,555 DEBUG
> >> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling
> >> > > CLOSED
> >> > > event for 93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,555 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> >> > >
> was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
> >> > > e1b7e956.
> >> > > state=CLOSED, ts=1342368770556,
> server=192.168.1.2,60020,1342017399608
> >> > > 2012-07-16 00:12:50,555 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned
> node
> >> > > for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state
> >> > > 2012-07-16 00:12:50,558 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=M_ZK_REGION_OFFLINE,
> server=10.75.18.34,60000,1342017369575,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,558 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing
> >> > > plan
> >> > > for
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > destination server is 192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:12:50,558 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
> >> > > plan
> >> > > for region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.;
> >> > >
> plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462
> >> > > 5e160e1b7e956.,
> >> > > src=192.168.1.2,60020,1342017399608,
> >> > > dest=192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:12:50,558 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > to 192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:12:50,574 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENING,
> >> > > server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,635 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENING,
> >> > > server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,639 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENED,
> server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:12:50,639 DEBUG
> >> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> >> > > OPENED
> >> > > event for
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > from 192.168.1.2,60020,1342017399608; deleting unassigned node
> >> > > 2012-07-16 00:12:50,640 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state
> >> > > RS_ZK_REGION_OPENED
> >> > > 2012-07-16 00:12:50,641 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of
> region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > has been deleted.
> >> > > 2012-07-16 00:12:50,641 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned node
> for
> >> > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state
> >> > > RS_ZK_REGION_OPENED
> >> > > 2012-07-16 00:12:50,641 INFO
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: The master has
> opened
> >> > > the
> >> > > region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > that was online on 192.168.1.2,60020,1342017399608
> >> > > 2012-07-16 00:17:49,870 INFO org.apache.hadoop.hbase.master.HMaster:
> >> > > balance
> >> > >
> hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
> >> > > e1b7e956.,
> >> > > src=192.168.1.2,60020,1342017399608,
> >> > > dest=192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:17:49,870 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> unassignment
> >> > > of
> >> > > region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > (offlining)
> >> > > 2012-07-16 00:17:49,870 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Creating unassigned node for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state
> >> > > 2012-07-16 00:17:49,872 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> >> > > 192.168.1.2,60020,1342017399608 for region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > 2012-07-16 00:17:50,464 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_CLOSED,
> server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,464 DEBUG
> >> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling
> >> > > CLOSED
> >> > > event for 93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,464 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> >> > >
> was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
> >> > > e1b7e956.
> >> > > state=CLOSED, ts=1342369070465,
> server=192.168.1.2,60020,1342017399608
> >> > > 2012-07-16 00:17:50,464 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned
> node
> >> > > for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state
> >> > > 2012-07-16 00:17:50,467 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=M_ZK_REGION_OFFLINE,
> server=10.75.18.34,60000,1342017369575,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,467 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing
> >> > > plan
> >> > > for
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > destination server is 192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:17:50,467 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
> >> > > plan
> >> > > for region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.;
> >> > >
> plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462
> >> > > 5e160e1b7e956.,
> >> > > src=192.168.1.2,60020,1342017399608,
> >> > > dest=192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:17:50,467 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > to 192.168.1.2,60020,1342002082592
> >> > > 2012-07-16 00:17:50,509 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENING,
> >> > > server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,761 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENING,
> >> > > server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,774 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> >> > > transition=RS_ZK_REGION_OPENED,
> server=192.168.1.2,60020,1342017399608,
> >> > > region=93caf5147d40f5dd4625e160e1b7e956
> >> > > 2012-07-16 00:17:50,774 DEBUG
> >> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> >> > > OPENED
> >> > > event for
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > from 192.168.1.2,60020,1342017399608; deleting unassigned node
> >> > > 2012-07-16 00:17:50,774 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node for
> >> > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state
> >> > > RS_ZK_REGION_OPENED
> >> > > 2012-07-16 00:17:50,775 DEBUG
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of
> region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > has been deleted.
> >> > > 2012-07-16 00:17:50,775 DEBUG
> >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> >> > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned node
> for
> >> > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state
> >> > > RS_ZK_REGION_OPENED
> >> > > 2012-07-16 00:17:50,775 INFO
> >> > > org.apache.hadoop.hbase.master.AssignmentManager: The master has
> opened
> >> > > the
> >> > > region
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > that was online on 192.168.1.2,60020,1342017399608
> >> > > 2012-07-16 00:22:49,916 INFO org.apache.hadoop.hbase.master.HMaster:
> >> > > balance
> >> > >
> hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160
> >> > > e1b7e956.,
> >> > > src=192.168.1.2,60020,1342017399608,
> >> > > dest=192.168.1.2,60020,1342002082592
> >> > > ....
> >> > >
> >> > > 2012/7/17 Ted Yu <[email protected]>
> >> > >
> >> > > > Which HBase version were you using ?
> >> > > >
> >> > > > The log messages below were in very short time period. Can you
> >> > > provide more
> >> > > > logs ?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > On Tue, Jul 17, 2012 at 5:09 AM, Howard <[email protected]>
> wrote:
> >> > > >
> >> > > > > Through the master and regionserver log,I find load balancer
> >> > > repeatedly
> >> > > > > close and open region in the same regionserver(period in
> >> > > > > hbase.balancer.period ).
> >> > > > > Does this is a bug in load balancer and how can I dig into or
> avoid
> >> > > this?
> >> > > > > The log like this:
> >> > > > > 2012-07-16 00:12:50,055 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zOwMwdi,1342294447440.7608037379153932a30f4bdd6d7e
> >> > > df5d.
> >> > > > > 2012-07-16 00:12:50,069 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_comment_list,zO7vqv1,1342192347676.0c9036e183055ff2df93eca348a
> >> > > eeada.
> >> > > > > 2012-07-16 00:12:50,206 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zOzYDL9,1342294447440.47eb162f6aecffcddfa897fdbb70
> >> > > 472b.
> >> > > > > 2012-07-16 00:12:50,301 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zOM1HZP,1342293292587.9af5de3db0bad0470d9b8fe6b9ea
> >> > > df14.
> >> > > > > 2012-07-16 00:12:50,428 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zOPKLDi,1342293292587.f89dbfc1dad366c991eba3c24822
> >> > > 8637.
> >> > > > > 2012-07-16 00:12:50,562 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7
> >> > > e956.
> >> > > > > 2012-07-16 00:12:50,685 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_status_list,zOIR8Tw,1342283152067.6284e9f69042266cde88a7b19071
> >> > > d701.
> >> > > > > 2012-07-16 00:12:50,791 INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received
> >> > > request to
> >> > > > > open region:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> trackurl_comment_list,zOqzaNw,1342195266309.bd8e19c1063f4f2984f9ca63365
> >> > > c5573.
> >> > > > >
> >> > > >
> >> >
> >> >
> >>
>