bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40 Have you checked whether there were two region server processes running on 192.168.18.40 ?
Cheers On Fri, Jul 27, 2012 at 2:43 AM, Howard <[email protected]> wrote: > Thanks Suraj Varma,I have put the log into the pastebin.com. > > master log: http://pastebin.com/QWv3K9HQ > regionserver log:http://pastebin.com/LM27ui72 > > Because there is a lot of "region is not online" in the regionserver log,so > I have filter this in the regionserver log. > The following is the count of "Region is not online:" log,start 23:16,there > is a lot of access fail because the region is not online. > --------------------------d70285c1a12dec9289ce9290c9349a79 > 1 23:16 > 103 23:36 > 142 23:37 > 169 23:38 > 94 23:39 > 120 23:40 > 39 23:41 > 110 23:42 > 104 23:43 > 114 23:44 > 90 23:45 > 121 23:46 > 104 23:47 > 74 23:48 > 96 23:49 > 100 23:50 > 125 23:51 > 59 23:52 > 113 23:53 > 134 23:54 > 127 23:55 > 131 23:56 > 119 23:57 > 82 23:58 > 165 23:59 > > and the region "d70285c1a12dec9289ce9290c9349a79" is move between two > regionserver again and again by balancer.Start 23:36,the region is move > from regionserver 192.168.18.40 to 192.168.18.40 and fail. > > > 2012/7/19 Suraj Varma <[email protected]> > > > You can use pastebin.com or similar services to cut/paste your logs. > > --S > > > > On Tue, Jul 17, 2012 at 7:11 PM, Howard <[email protected]> wrote: > > > this problem just only once,Because it happens two day before,I > remember > > I > > > check the master-status and only always see regions is "pending open" > in > > > Regions in Transition,not see there was two regionservers in the same > > > server. > > > > > > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > > > does "60020,1342017399608" mean?Is there some document can help to > read > > > the source code? > > > If still need to upload the log,how to upload the log? > > > sorry I am a freshman with HBase. > > > > > > 2012/7/17 Ted Yu <[email protected]> > > > > > >> Howard: > > >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > > >> yesterday ? > > >> I guess you have noticed the following toward the end of log snippet: > > >> > > >> 2012-07-16 00:17:50,774 DEBUG > > >> org.apache.hadoop.hbase. > > >> master.handler.OpenedRegionHandler: Handling OPENED > > >> event for > > >> > > >> > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > > >> from 192.168.1.2,60020,1342017399608; deleting unassigned node > > >> > > >> As Ram pointed out, there might be two region server processes running > > on > > >> 192.168.1.2 > > >> > > >> Please verify whether that was the case. > > >> > > >> Cheers > > >> > > >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > > >> [email protected]> wrote: > > >> > > >> > From the logs I can see that though the server's are same their > start > > >> code > > >> > is different. > > >> > Need to analyse the previous logs also. Pls file a JIRA, if > possible > > >> > attach > > >> > the logs to that. > > >> > > > >> > Thanks Howard. > > >> > > > >> > Regards > > >> > Ram > > >> > > > >> > > -----Original Message----- > > >> > > From: Howard [mailto:[email protected]] > > >> > > Sent: Tuesday, July 17, 2012 7:32 PM > > >> > > To: [email protected] > > >> > > Subject: Re: Load balancer repeatedly close and open region in the > > same > > >> > > regionserver. > > >> > > > > >> > > *hi,Ted Yu,thanks for your reply:* > > >> > > the hbase and hadoop version is > > >> > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > > >> > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > > >> > > the following is a detail log about the same region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956, > > >> > > and it repeats again and again.: > > >> > > 2012-07-16 00:12:49,843 INFO > org.apache.hadoop.hbase.master.HMaster: > > >> > > balance > > >> > > > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > >> > > e1b7e956., > > >> > > src=192.168.1.2,60020,1342017399608, > > >> > > dest=192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:12:49,843 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting > > unassignment > > >> > > of > > >> > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > (offlining) > > >> > > 2012-07-16 00:12:49,843 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > >> > > 2012-07-16 00:12:49,845 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > >> > > 192.168.1.2,60020,1342017399608 for region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > 2012-07-16 00:12:50,555 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_CLOSED, > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,555 DEBUG > > >> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: > Handling > > >> > > CLOSED > > >> > > event for 93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,555 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > >> > > > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > >> > > e1b7e956. > > >> > > state=CLOSED, ts=1342368770556, > > server=192.168.1.2,60020,1342017399608 > > >> > > 2012-07-16 00:12:50,555 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned > > node > > >> > > for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > >> > > 2012-07-16 00:12:50,558 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=M_ZK_REGION_OFFLINE, > > server=10.75.18.34,60000,1342017369575, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,558 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Found an > existing > > >> > > plan > > >> > > for > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > destination server is 192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:12:50,558 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Using > pre-existing > > >> > > plan > > >> > > for region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956.; > > >> > > > > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 > > >> > > 5e160e1b7e956., > > >> > > src=192.168.1.2,60020,1342017399608, > > >> > > dest=192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:12:50,558 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > to 192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:12:50,574 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENING, > > >> > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,635 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENING, > > >> > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,639 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENED, > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:12:50,639 DEBUG > > >> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > Handling > > >> > > OPENED > > >> > > event for > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > from 192.168.1.2,60020,1342017399608; deleting unassigned node > > >> > > 2012-07-16 00:12:50,640 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node > for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state > > >> > > RS_ZK_REGION_OPENED > > >> > > 2012-07-16 00:12:50,641 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > has been deleted. > > >> > > 2012-07-16 00:12:50,641 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned > node > > for > > >> > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state > > >> > > RS_ZK_REGION_OPENED > > >> > > 2012-07-16 00:12:50,641 INFO > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: The master has > > opened > > >> > > the > > >> > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > that was online on 192.168.1.2,60020,1342017399608 > > >> > > 2012-07-16 00:17:49,870 INFO > org.apache.hadoop.hbase.master.HMaster: > > >> > > balance > > >> > > > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > >> > > e1b7e956., > > >> > > src=192.168.1.2,60020,1342017399608, > > >> > > dest=192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:17:49,870 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting > > unassignment > > >> > > of > > >> > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > (offlining) > > >> > > 2012-07-16 00:17:49,870 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > >> > > 2012-07-16 00:17:49,872 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > >> > > 192.168.1.2,60020,1342017399608 for region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > 2012-07-16 00:17:50,464 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_CLOSED, > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,464 DEBUG > > >> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: > Handling > > >> > > CLOSED > > >> > > event for 93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,464 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > >> > > > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > >> > > e1b7e956. > > >> > > state=CLOSED, ts=1342369070465, > > server=192.168.1.2,60020,1342017399608 > > >> > > 2012-07-16 00:17:50,464 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned > > node > > >> > > for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > >> > > 2012-07-16 00:17:50,467 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=M_ZK_REGION_OFFLINE, > > server=10.75.18.34,60000,1342017369575, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,467 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Found an > existing > > >> > > plan > > >> > > for > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > destination server is 192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:17:50,467 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Using > pre-existing > > >> > > plan > > >> > > for region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956.; > > >> > > > > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 > > >> > > 5e160e1b7e956., > > >> > > src=192.168.1.2,60020,1342017399608, > > >> > > dest=192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:17:50,467 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > to 192.168.1.2,60020,1342002082592 > > >> > > 2012-07-16 00:17:50,509 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENING, > > >> > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,761 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENING, > > >> > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,774 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > >> > > transition=RS_ZK_REGION_OPENED, > > server=192.168.1.2,60020,1342017399608, > > >> > > region=93caf5147d40f5dd4625e160e1b7e956 > > >> > > 2012-07-16 00:17:50,774 DEBUG > > >> > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > Handling > > >> > > OPENED > > >> > > event for > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > from 192.168.1.2,60020,1342017399608; deleting unassigned node > > >> > > 2012-07-16 00:17:50,774 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Deleting existing unassigned node > for > > >> > > 93caf5147d40f5dd4625e160e1b7e956 that is in expected state > > >> > > RS_ZK_REGION_OPENED > > >> > > 2012-07-16 00:17:50,775 DEBUG > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: The znode of > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > has been deleted. > > >> > > 2012-07-16 00:17:50,775 DEBUG > > >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> > > master:60000-0x4384d0a47f40068 Successfully deleted unassigned > node > > for > > >> > > region 93caf5147d40f5dd4625e160e1b7e956 in expected state > > >> > > RS_ZK_REGION_OPENED > > >> > > 2012-07-16 00:17:50,775 INFO > > >> > > org.apache.hadoop.hbase.master.AssignmentManager: The master has > > opened > > >> > > the > > >> > > region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > that was online on 192.168.1.2,60020,1342017399608 > > >> > > 2012-07-16 00:22:49,916 INFO > org.apache.hadoop.hbase.master.HMaster: > > >> > > balance > > >> > > > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > >> > > e1b7e956., > > >> > > src=192.168.1.2,60020,1342017399608, > > >> > > dest=192.168.1.2,60020,1342002082592 > > >> > > .... > > >> > > > > >> > > 2012/7/17 Ted Yu <[email protected]> > > >> > > > > >> > > > Which HBase version were you using ? > > >> > > > > > >> > > > The log messages below were in very short time period. Can you > > >> > > provide more > > >> > > > logs ? > > >> > > > > > >> > > > Thanks > > >> > > > > > >> > > > On Tue, Jul 17, 2012 at 5:09 AM, Howard <[email protected]> > > wrote: > > >> > > > > > >> > > > > Through the master and regionserver log,I find load balancer > > >> > > repeatedly > > >> > > > > close and open region in the same regionserver(period in > > >> > > > > hbase.balancer.period ). > > >> > > > > Does this is a bug in load balancer and how can I dig into or > > avoid > > >> > > this? > > >> > > > > The log like this: > > >> > > > > 2012-07-16 00:12:50,055 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zOwMwdi,1342294447440.7608037379153932a30f4bdd6d7e > > >> > > df5d. > > >> > > > > 2012-07-16 00:12:50,069 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_comment_list,zO7vqv1,1342192347676.0c9036e183055ff2df93eca348a > > >> > > eeada. > > >> > > > > 2012-07-16 00:12:50,206 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zOzYDL9,1342294447440.47eb162f6aecffcddfa897fdbb70 > > >> > > 472b. > > >> > > > > 2012-07-16 00:12:50,301 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zOM1HZP,1342293292587.9af5de3db0bad0470d9b8fe6b9ea > > >> > > df14. > > >> > > > > 2012-07-16 00:12:50,428 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zOPKLDi,1342293292587.f89dbfc1dad366c991eba3c24822 > > >> > > 8637. > > >> > > > > 2012-07-16 00:12:50,562 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956. > > >> > > > > 2012-07-16 00:12:50,685 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_status_list,zOIR8Tw,1342283152067.6284e9f69042266cde88a7b19071 > > >> > > d701. > > >> > > > > 2012-07-16 00:12:50,791 INFO > > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > >> > > request to > > >> > > > > open region: > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > trackurl_comment_list,zOqzaNw,1342195266309.bd8e19c1063f4f2984f9ca63365 > > >> > > c5573. > > >> > > > > > > >> > > > > > >> > > > >> > > > >> > > >
