Zookeeper doesn't query addresses, it's all done in HBase which in turn stores it in ZK.
Also http://hbase.apache.org/book.html#dns J-D On Tue, May 24, 2011 at 4:37 PM, Jack Levin <[email protected]> wrote: > figured it out... the /etc/hosts file has ip to name, was used by > zookeeper was *.prod.imageshack.com, while hostname was > imgXX.imageshack.us... use by Regionserver/Master - Ideally, all > three components should source hostnames form same place, whether its > hostname or /etc/hosts (or dns), etc... it gotta be consistent, > otherwise aliases end up screwing things up and people will end up > guessing why things don't work. > > -Jack > > On Tue, May 24, 2011 at 4:04 PM, Jack Levin <[email protected]> wrote: >> img645.prod.imageshack.us and img645.imageshack.us are both point to >> the same IP. >> >> -Jack >> >> On Tue, May 24, 2011 at 3:50 PM, Jack Levin <[email protected]> wrote: >>> looks like our balancer is on: >>> >>> hbase(main):001:0> balance_switch true >>> true >>> 0 row(s) in 0.3700 seconds >>> >>> I simply kill PID for RS, and it stays on the list with regions >>> assigned, and master does not know about it. >>> >>> So it still does not work. >>> >>> -Jack >>> >>> On Tue, May 24, 2011 at 3:43 PM, Dave Latham <[email protected]> wrote: >>>> Are you using the graceful_stop script? >>>> >>>> In 0.90.3 the bin/graceful_stop.sh script was updated to disable the >>>> master's balancer. However, it doesn't seem that anything re-enables it, >>>> so >>>> if you're using it you need to re-enable it on your own. See the book for >>>> more details: >>>> http://hbase.apache.org/book.html#decommission >>>> >>>> Dave >>>> >>>> On Tue, May 24, 2011 at 3:33 PM, Jack Levin <[email protected]> wrote: >>>> >>>>> just put new hbase version on our test cluster. and been testing it... >>>>> so far if I shutdown an RS, master does not reassign its regions, and >>>>> we remain inconsistent forerver, likewise when new RS is up, it does >>>>> not get regions assigned to it, this is the master log: >>>>> >>>>> >>>>> 2011-05-24 15:30:57,724 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>>> Event, type=NodeDeleted, state=SyncConnected, >>>>> path=/hbase/rs/img645.prod.imageshack.com,60020,1306276075768 >>>>> 2011-05-24 15:30:57,724 INFO >>>>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer >>>>> ephemeral node deleted, processing expiration >>>>> [img645.prod.imageshack.com,60020,1306276075768] >>>>> 2011-05-24 15:30:57,724 INFO >>>>> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: No HServerInfo >>>>> found for img645.prod.imageshack.com,60020,1306276075768 >>>>> 2011-05-24 15:30:57,726 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>>> Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs >>>>> 2011-05-24 15:31:03,330 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>>> master:60000-0x1302094818900a4-0x1302094818900a4 Received ZooKeeper >>>>> Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/rs >>>>> 2011-05-24 15:31:03,338 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>>> master:60000-0x1302094818900a4-0x1302094818900a4 Retrieved 32 byte(s) >>>>> of data from znode >>>>> /hbase/rs/img645.prod.imageshack.com,60020,1306276262774 and set >>>>> watcher; img645.prod.imageshack.com:60020 >>>>> 2011-05-24 15:31:03,350 INFO >>>>> org.apache.hadoop.hbase.master.ServerManager: Server start rejected; >>>>> we already have img645.imageshack.us:60020 registered; >>>>> existingServer=serverName=img645.imageshack.us,60020,1306276075768, >>>>> load=(requests=0, regions=0, usedHeap=40, maxHeap=3995), >>>>> newServer=serverName=img645.imageshack.us,60020,1306276262774, >>>>> load=(requests=0, regions=0, usedHeap=23, maxHeap=3995) >>>>> 2011-05-24 15:31:03,350 INFO >>>>> org.apache.hadoop.hbase.master.ServerManager: Triggering server >>>>> recovery; existingServer img645.imageshack.us,60020,1306276075768 >>>>> looks stale >>>>> 2011-05-24 15:31:03,353 DEBUG >>>>> org.apache.hadoop.hbase.master.ServerManager: >>>>> Added=img645.imageshack.us,60020,1306276075768 to dead servers, >>>>> submitted shutdown handler to be executed, root=false, meta=false >>>>> 2011-05-24 15:31:03,353 INFO >>>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>>>> Splitting logs for img645.imageshack.us,60020,1306276075768 >>>>> 2011-05-24 15:31:04,348 INFO >>>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: >>>>> Reassigning 0 region(s) that img645.imageshack.us,60020,1306276075768 >>>>> was carrying (skipping 0 regions(s) that are already in transition) >>>>> 2011-05-24 15:31:04,348 INFO >>>>> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished >>>>> processing of shutdown of img645.imageshack.us,60020,1306276075768 >>>>> 2011-05-24 15:31:06,333 DEBUG >>>>> org.apache.hadoop.hbase.master.ServerManager: Server >>>>> img645.imageshack.us,60020,1306276262774 came back up, removed it from >>>>> the dead servers list >>>>> 2011-05-24 15:31:06,333 INFO >>>>> org.apache.hadoop.hbase.master.ServerManager: Registering >>>>> server=img645.imageshack.us,60020,1306276262774, regionCount=0, >>>>> userLoad=false >>>>> 2011-05-24 15:31:49,890 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection opening >>>>> connection to ZooKeeper with ensemble (img648:2181) >>>>> 2011-05-24 15:31:49,890 INFO org.apache.zookeeper.ZooKeeper: >>>>> Initiating client connection, connectString=img648:2181 >>>>> sessionTimeout=180000 watcher=hconnection >>>>> 2011-05-24 15:31:49,891 INFO org.apache.zookeeper.ClientCnxn: Opening >>>>> socket connection to server img648/38.99.76.205:2181 >>>>> 2011-05-24 15:31:49,892 INFO org.apache.zookeeper.ClientCnxn: Socket >>>>> connection established to img648/38.99.76.205:2181, initiating session >>>>> 2011-05-24 15:31:49,893 INFO org.apache.zookeeper.ClientCnxn: Session >>>>> establishment complete on server img648/38.99.76.205:2181, sessionid = >>>>> 0x13024216e690004, negotiated timeout = 180000 >>>>> 2011-05-24 15:31:49,894 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection >>>>> Received ZooKeeper Event, type=None, state=SyncConnected, path=null >>>>> 2011-05-24 15:31:49,895 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >>>>> hconnection-0x13024216e690004 connected >>>>> 2011-05-24 15:31:49,896 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>>> hconnection-0x13024216e690004 Set watcher on existing znode >>>>> /hbase/master >>>>> 2011-05-24 15:31:49,896 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>>> hconnection-0x13024216e690004 Retrieved 32 byte(s) of data from znode >>>>> /hbase/master and set watcher; img648.prod.imageshack.com:60000 >>>>> 2011-05-24 15:31:49,897 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>>> hconnection-0x13024216e690004 Set watcher on existing znode >>>>> /hbase/root-region-server >>>>> 2011-05-24 15:31:49,897 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>>> hconnection-0x13024216e690004 Retrieved 26 byte(s) of data from znode >>>>> /hbase/root-region-server and set watcher; img731.imageshack.us:60020 >>>>> 2011-05-24 15:31:49,900 DEBUG >>>>> org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting >>>>> at row= for max=2147483647 rows >>>>> 2011-05-24 15:31:49,900 DEBUG >>>>> >>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>>>> Lookedup root region location, >>>>> >>>>> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@26f50154 >>>>> ; >>>>> hsa=img731.imageshack.us:60020 >>>>> 2011-05-24 15:31:49,913 DEBUG >>>>> >>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>>>> Cached location for .META.,,1.1028785192 is img654.imageshack.us:60020 >>>>> 2011-05-24 15:31:50,061 INFO >>>>> >>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >>>>> Closed zookeeper sessionid=0x13024216e690004 >>>>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>> 0x13024216e690004 closed >>>>> 2011-05-24 15:31:50,063 INFO org.apache.zookeeper.ClientCnxn: >>>>> EventThread shut down >>>>> >>>>> Please help :) >>>>> >>>>> -Jack >>>>> >>>> >>> >> >
