img645.prod.imageshack.us and img645.imageshack.us are both point to the same IP.
-Jack On Tue, May 24, 2011 at 3:50 PM, Jack Levin <[email protected]> wrote: > looks like our balancer is on: > > hbase(main):001:0> balance_switch true > true > 0 row(s) in 0.3700 seconds > > I simply kill PID for RS, and it stays on the list with regions > assigned, and master does not know about it. > > So it still does not work. > > -Jack > > On Tue, May 24, 2011 at 3:43 PM, Dave Latham <[email protected]> wrote: >> Are you using the graceful_stop script? >> >> In 0.90.3 the bin/graceful_stop.sh script was updated to disable the >> master's balancer. However, it doesn't seem that anything re-enables it, so >> if you're using it you need to re-enable it on your own. See the book for >> more details: >> http://hbase.apache.org/book.html#decommission >> >> Dave >> >> On Tue, May 24, 2011 at 3:33 PM, Jack Levin <[email protected]> wrote: >> >>> just put new hbase version on our test cluster. and been testing it... >>> so far if I shutdown an RS, master does not reassign its regions, and >>> we remain inconsistent forerver, likewise when new RS is up, it does >>> not get regions assigned to it, this is the master log: >>> >>> >>> 2011-05-24 15:30:57,724 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>> Event, type=NodeDeleted, state=SyncConnected, >>> path=/hbase/rs/img645.prod.imageshack.com,60020,1306276075768 >>> 2011-05-24 15:30:57,724 INFO >>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer >>> ephemeral node deleted, processing expiration >>> [img645.prod.imageshack.com,60020,1306276075768] >>> 2011-05-24 15:30:57,724 INFO >>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo >>> found for img645.prod.imageshack.com,60020,1306276075768 >>> 2011-05-24 15:30:57,726 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>> Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs >>> 2011-05-24 15:31:03,330 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>> Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs >>> 2011-05-24 15:31:03,338 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>> master:60000-0x1302094818900a4-0x1302094818900a4 Retrieved 32 byte(s) >>> of data from znode >>> /hbase/rs/img645.prod.imageshack.com,60020,1306276262774 and set >>> watcher; img645.prod.imageshack.com:60020 >>> 2011-05-24 15:31:03,350 INFO >>> org.apache.hadoop.hbase.master.ServerManager: Server start rejected; >>> we already have img645.imageshack.us:60020 registered; >>> existingServer=serverName=img645.imageshack.us,60020,1306276075768, >>> load=(requests=0, regions=0, usedHeap=40, maxHeap=3995), >>> newServer=serverName=img645.imageshack.us,60020,1306276262774, >>> load=(requests=0, regions=0, usedHeap=23, maxHeap=3995) >>> 2011-05-24 15:31:03,350 INFO >>> org.apache.hadoop.hbase.master.ServerManager: Triggering server >>> recovery; existingServer img645.imageshack.us,60020,1306276075768 >>> looks stale >>> 2011-05-24 15:31:03,353 DEBUG >>> org.apache.hadoop.hbase.master.ServerManager: >>> Added=img645.imageshack.us,60020,1306276075768 to dead servers, >>> submitted shutdown handler to be executed, root=false, meta=false >>> 2011-05-24 15:31:03,353 INFO >>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>> Splitting logs for img645.imageshack.us,60020,1306276075768 >>> 2011-05-24 15:31:04,348 INFO >>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>> Reassigning 0 region(s) that img645.imageshack.us,60020,1306276075768 >>> was carrying (skipping 0 regions(s) that are already in transition) >>> 2011-05-24 15:31:04,348 INFO >>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished >>> processing of shutdown of img645.imageshack.us,60020,1306276075768 >>> 2011-05-24 15:31:06,333 DEBUG >>> org.apache.hadoop.hbase.master.ServerManager: Server >>> img645.imageshack.us,60020,1306276262774 came back up, removed it from >>> the dead servers list >>> 2011-05-24 15:31:06,333 INFO >>> org.apache.hadoop.hbase.master.ServerManager: Registering >>> server=img645.imageshack.us,60020,1306276262774, regionCount=0, >>> userLoad=false >>> 2011-05-24 15:31:49,890 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection opening >>> connection to ZooKeeper with ensemble (img648:2181) >>> 2011-05-24 15:31:49,890 INFO org.apache.zookeeper.ZooKeeper: >>> Initiating client connection, connectString=img648:2181 >>> sessionTimeout=180000 watcher=hconnection >>> 2011-05-24 15:31:49,891 INFO org.apache.zookeeper.ClientCnxn: Opening >>> socket connection to server img648/38.99.76.205:2181 >>> 2011-05-24 15:31:49,892 INFO org.apache.zookeeper.ClientCnxn: Socket >>> connection established to img648/38.99.76.205:2181, initiating session >>> 2011-05-24 15:31:49,893 INFO org.apache.zookeeper.ClientCnxn: Session >>> establishment complete on server img648/38.99.76.205:2181, sessionid = >>> 0x13024216e690004, negotiated timeout = 180000 >>> 2011-05-24 15:31:49,894 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection >>> Received ZooKeeper Event, type=None, state=SyncConnected, path=null >>> 2011-05-24 15:31:49,895 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>> hconnection-0x13024216e690004 connected >>> 2011-05-24 15:31:49,896 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>> hconnection-0x13024216e690004 Set watcher on existing znode >>> /hbase/master >>> 2011-05-24 15:31:49,896 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>> hconnection-0x13024216e690004 Retrieved 32 byte(s) of data from znode >>> /hbase/master and set watcher; img648.prod.imageshack.com:60000 >>> 2011-05-24 15:31:49,897 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>> hconnection-0x13024216e690004 Set watcher on existing znode >>> /hbase/root-region-server >>> 2011-05-24 15:31:49,897 DEBUG >>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>> hconnection-0x13024216e690004 Retrieved 26 byte(s) of data from znode >>> /hbase/root-region-server and set watcher; img731.imageshack.us:60020 >>> 2011-05-24 15:31:49,900 DEBUG >>> org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting >>> at row= for max=2147483647 rows >>> 2011-05-24 15:31:49,900 DEBUG >>> >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>> Lookedup root region location, >>> >>> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@26f50154 >>> ; >>> hsa=img731.imageshack.us:60020 >>> 2011-05-24 15:31:49,913 DEBUG >>> >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>> Cached location for .META.,,1.1028785192 is img654.imageshack.us:60020 >>> 2011-05-24 15:31:50,061 INFO >>> >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>> Closed zookeeper sessionid=0x13024216e690004 >>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ZooKeeper: Session: >>> 0x13024216e690004 closed >>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ClientCnxn: >>> EventThread shut down >>> >>> Please help :) >>> >>> -Jack >>> >> >
