First off all thanks for your reply. We are using three zookeeper server. All with version 3.4.6. Below the logs of one of the zookeeper server. It seems that the connection will at some time just drop.
In the meantime I will monitor heap dumps to check if there is a memory leakage. But my feeling tells me the memory leakage is not the cause but more the effect. Do you see anything in all those logs which we could determine what is happening? Thanks! 2014-08-23 07:00:19,927 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x347fc15265a00ec, timeout of 40000ms exceeded 2014-08-23 07:00:19,927 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x147fc1616d200be, timeout of 40000ms exceeded 2014-08-23 07:00:19,927 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x247fc16c80500d6, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x147fc1616d200bb, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x347fc15265a00eb, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x147fc1616d200bd, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x347fc15265a00ea, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x347fc15265a00ef, timeout of 40000ms exceeded 2014-08-23 07:00:19,928 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x247ffe833880000, timeout of 40000ms exceeded 2014-08-23 07:00:19,929 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x247ffe833880001, timeout of 40000ms exceeded 2014-08-23 07:00:19,929 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x247fc16c80500d2, timeout of 40000ms exceeded 2014-08-23 07:00:19,929 [myid:3] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x247fc16c80500d5, timeout of 40000ms exceeded 2014-08-23 07:00:19,930 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fc15265a00ec 2014-08-23 07:00:19,930 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x147fc1616d200be 2014-08-23 07:00:19,930 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x247fc16c80500d6 2014-08-23 07:00:19,930 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x147fc1616d200bb 2014-08-23 07:00:19,931 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fc15265a00eb 2014-08-23 07:00:19,931 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x147fc1616d200bd 2014-08-23 07:00:19,931 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fc15265a00ea 2014-08-23 07:00:19,931 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fc15265a00ef 2014-08-23 07:00:19,935 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x247ffe833880000 2014-08-23 07:00:19,936 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x247ffe833880001 2014-08-23 07:00:19,936 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x247fc16c80500d2 2014-08-23 07:00:19,937 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x247fc16c80500d5 2014-08-23 07:00:19,984 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.152:52656 which had sessionid 0x347fc15265a00ec 2014-08-23 07:00:19,985 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.152:52651 which had sessionid 0x147fc1616d200be 2014-08-23 07:00:19,986 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.150:44649 which had sessionid 0x247fc16c80500d6 2014-08-23 07:00:19,990 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.153:60445 which had sessionid 0x147fc1616d200bb 2014-08-23 07:00:19,993 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.154:47997 which had sessionid 0x347fc15265a00eb 2014-08-23 07:00:19,995 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.154:47999 which had sessionid 0x147fc1616d200bd 2014-08-23 07:00:19,997 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.151:34773 which had sessionid 0x347fc15265a00ea 2014-08-23 07:00:19,998 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.153:60444 which had sessionid 0x347fc15265a00ef 2014-08-23 07:00:20,000 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.102:37154 which had sessionid 0x247ffe833880000 2014-08-23 07:00:20,002 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.102:37155 which had sessionid 0x247ffe833880001 2014-08-23 07:00:20,005 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.150:48086 which had sessionid 0x247fc16c80500d2 2014-08-23 07:00:20,006 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@218] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:187) at java.lang.Thread.run(Thread.java:744) 2014-08-23 07:00:20,008 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn@1007] - Closed socket connection for client /141.105.120.151:38066 which had sessionid 0x247fc16c80500d5 2014-08-23 07:00:20,008 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@218] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:187) at java.lang.Thread.run(Thread.java:744) 2014-08-23 07:00:34,185 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /141.105.120.102:37423 2014-08-23 07:00:34,185 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /141.105.120.102:37423 2014-08-23 07:00:34,189 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130001 with negotiated timeout 40000 for client /141.105.120.102:37423 2014-08-23 07:00:34,196 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x347fcb5a0130001 type:delete cxid:0x5 zxid:0x6c0000001f txntype:-1 reqpath:n/a Error Path:/hbase/backup-masters/vps2008.directvps.nl,60000,1408691163492 Error:KeeperErrorCode = NoNode for /hbase/backup-masters/vps2008.directvps.nl,60000,1408691163492 2014-08-23 07:00:34,409 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x347fcb5a0130001 type:create cxid:0x12 zxid:0x6c00000021 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired 2014-08-23 07:00:45,466 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0001 2014-08-23 07:00:45,875 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58692 2014-08-23 07:00:45,875 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58692; will be dropped if server is in r-o mode 2014-08-23 07:00:45,875 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58692 2014-08-23 07:00:45,884 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130002 with negotiated timeout 5000 for client /178.21.116.224:58692 2014-08-23 07:00:45,887 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130002 2014-08-23 07:00:45,891 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58692 which had sessionid 0x347fcb5a0130002 2014-08-23 07:00:46,494 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58694 2014-08-23 07:00:46,494 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58694; will be dropped if server is in r-o mode 2014-08-23 07:00:46,494 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58694 2014-08-23 07:00:46,498 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130003 with negotiated timeout 5000 for client /178.21.116.224:58694 2014-08-23 07:00:46,501 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130003 2014-08-23 07:00:46,503 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58694 which had sessionid 0x347fcb5a0130003 2014-08-23 07:00:47,314 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58696 2014-08-23 07:00:47,314 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58696; will be dropped if server is in r-o mode 2014-08-23 07:00:47,315 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58696 2014-08-23 07:00:47,318 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130004 with negotiated timeout 5000 for client /178.21.116.224:58696 2014-08-23 07:00:47,321 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130004 2014-08-23 07:00:47,324 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58696 which had sessionid 0x347fcb5a0130004 2014-08-23 07:00:48,334 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58698 2014-08-23 07:00:48,335 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58698; will be dropped if server is in r-o mode 2014-08-23 07:00:48,335 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58698 2014-08-23 07:00:48,338 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130005 with negotiated timeout 5000 for client /178.21.116.224:58698 2014-08-23 07:00:48,342 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130005 2014-08-23 07:00:48,345 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58698 which had sessionid 0x347fcb5a0130005 2014-08-23 07:00:49,399 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x148010689410000 2014-08-23 07:00:50,459 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0002 2014-08-23 07:00:51,557 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0003 2014-08-23 07:00:51,563 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58706 2014-08-23 07:00:51,563 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58706; will be dropped if server is in r-o mode 2014-08-23 07:00:51,563 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58706 2014-08-23 07:00:51,568 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130006 with negotiated timeout 5000 for client /178.21.116.224:58706 2014-08-23 07:00:51,571 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130006 2014-08-23 07:00:51,575 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58706 which had sessionid 0x347fcb5a0130006 2014-08-23 07:00:52,855 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0004 2014-08-23 07:00:52,870 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0005 2014-08-23 07:00:54,914 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58712 2014-08-23 07:00:54,914 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58712; will be dropped if server is in r-o mode 2014-08-23 07:00:54,915 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58712 2014-08-23 07:00:54,918 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130007 with negotiated timeout 5000 for client /178.21.116.224:58712 2014-08-23 07:00:54,921 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130007 2014-08-23 07:00:54,925 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58712 which had sessionid 0x347fcb5a0130007 2014-08-23 07:00:55,357 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x148010689410001 2014-08-23 07:00:55,997 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x148010689410002 2014-08-23 07:00:56,018 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0006 2014-08-23 07:00:57,047 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0007 2014-08-23 07:00:58,583 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x148010689410003 2014-08-23 07:00:59,642 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0008 2014-08-23 07:01:00,737 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x148010689410004 2014-08-23 07:01:01,904 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff0009 2014-08-23 07:01:08,186 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /178.21.116.224:58730 2014-08-23 07:01:08,187 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /178.21.116.224:58730; will be dropped if server is in r-o mode 2014-08-23 07:01:08,187 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /178.21.116.224:58730 2014-08-23 07:01:08,191 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x347fcb5a0130008 with negotiated timeout 5000 for client /178.21.116.224:58730 2014-08-23 07:01:08,194 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x347fcb5a0130008 2014-08-23 07:01:08,197 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /178.21.116.224:58730 which had sessionid 0x347fcb5a0130008 2014-08-23 07:01:09,735 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff000a 2014-08-23 07:01:11,799 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x24801068aff000b -----Oorspronkelijk bericht----- > Afzender:Ted Yu <[email protected] <mailto:[email protected]> > > Verstuurd: Maandag 25 Augustus 2014 16:45 > Aan: [email protected] <mailto:[email protected]> > Onderwerp: Re: Could not get distributed Hbase stable, stopping every ˜24 > hours. > > Have you checked zookeeper logs ? > What zookeeper release are you using ? > > bq. Could there be a memory leakage? > > You can use jmap to capture heap memory details: > > http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html > <http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html> > > Cheers > > > On Mon, Aug 25, 2014 at 4:51 AM, Ron van der Vegt < > [email protected] <mailto:[email protected]> > wrote: > > > Hi all! > > > > I have setup one master and 5 regionservers to collect log data. But every > > ˜24 hours, at random times, the regionservers generating a fatal error and > > all stopping one by one. Eventually the master will stop. I also see some > > weird characters before the server names in the logs. Seems like some > > encoding issue. > > > > I have read in the documentation, that if the garbage collection is taking > > to long, you will also get the session expired message. But I have logged > > the GC on the master, and it seems oke. Could someone help me figure out > > why this is happening? > > > > Furthermore, I am currently monitoring the memory usage of the master with > > JMX. I notice that the heap size is slowly growing. Could there be a memory > > leakage? > > xmx is set to 1gb. > > > > Setup: > > hbase 0.94.20 > > hadoop 1.2.1 > > debian wheezy > > > > Thanks in advice, > > > > Ron > > > > Logs of master: > > =============== > > > > 2014-08-23 07:00:20,104 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > > ZooKeeper exception: > > org.apache.zookeeper.KeeperException$ConnectionLossException: > > KeeperErrorCode = ConnectionLoss for /hbase/unassigned/70236052 > > 2014-08-23 07:00:20,406 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2060.directvps.nl,60020,1408691165501 reported a fatal > > error: > > ABORTING region server vps2060.directvps.nl,60020,1408691165501: > > regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb > > regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb > > received expired from ZooKeeper, aborting > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:20,911 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2057.directvps.nl,60020,1408691165499 reported a fatal > > error: > > ABORTING region server vps2057.directvps.nl,60020,1408691165499: > > regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea > > regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea > > received expired from ZooKeeper, aborting > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:21,001 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2059.directvps.nl,60020,1408691165851 reported a fatal > > error: > > ABORTING region server vps2059.directvps.nl,60020,1408691165851: > > regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb > > regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb > > received expired from ZooKeeper, aborting > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:21,056 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2058.directvps.nl,60020,1408691165675 reported a fatal > > error: > > ABORTING region server vps2058.directvps.nl,60020,1408691165675: > > regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec > > regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec > > received expired from ZooKeeper, aborting > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:22,140 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > > ZooKeeper exception: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > 2014-08-23 07:00:26,141 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > > ZooKeeper exception: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > 2014-08-23 07:00:34,114 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal > > error: > > ABORTING region server vps2056.directvps.nl,60020,1408691165439: > > Unexpected exception handling nodeDeleted event > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/master > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:420) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:34,118 ERROR org.apache.hadoop.hbase.master.HMaster: > > Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal > > error: > > ABORTING region server vps2056.directvps.nl,60020,1408691165439: > > regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2 > > regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2 > > received expired from ZooKeeper, aborting > > Cause: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > > > 2014-08-23 07:00:34,141 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient > > ZooKeeper exception: > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > 2014-08-23 07:00:34,142 ERROR > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData > > failed after 3 retries > > 2014-08-23 07:00:34,152 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: > > master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001 > > Unable to get data of znode /hbase/unassigned/70236052 > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685) > > at > > org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255) > > at > > org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382) > > at > > org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > 2014-08-23 07:00:34,152 ERROR > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: > > master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001 > > Received unexpected KeeperException, re-throwing exception > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685) > > at > > org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255) > > at > > org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382) > > at > > org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > 2014-08-23 07:00:34,163 FATAL org.apache.hadoop.hbase.master.HMaster: > > Master server abort: loaded coprocessors are: [] > > 2014-08-23 07:00:34,215 WARN > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node > > /hbase/backup-masters/vps2008.directvps.nl,60000,1408691163492 already > > deleted, and this is not a retry > > 2014-08-23 07:05:34,165 WARN > > org.apache.hadoop.hbase.master.SplitLogManager: Interrupted while waiting > > for log splits to be completed > > 2014-08-23 07:05:34,179 FATAL org.apache.hadoop.hbase.master.HMaster: > > Unexpected ZK exception reading unassigned node for region=70236052 > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/unassigned/70236052 > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709) > > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685) > > at > > org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274) > > at > > org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255) > > at > > org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382) > > at > > org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > 2014-08-23 07:05:34,179 WARN > > org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs > > in [hdfs:// > > namenode.openindex.io:8020/hbase/.logs/vps2058.directvps.nl,60020,1408691165675-splitting] > > installed = 2 but only 0 done > > 2014-08-23 07:05:34,184 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > server on 60000 > > 2014-08-23 07:05:34,185 WARN > > org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table > > java.io.IOException: Giving up after tries=1 > > at > > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:210) > > at > > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:188) > > at > > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:82) > > at > > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:67) > > at > > org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:126) > > at > > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:137) > > at > > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:93) > > at org.apache.hadoop.hbase.Chore.run(Chore.java:67) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.InterruptedException: sleep interrupted > > at java.lang.Thread.sleep(Native Method) > > at > > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:207) > > ... 8 more > > 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 5 on 60000: exiting > > 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 6 on 60000: exiting > > 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 0 on 60000: exiting > > 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 2 on 60000: exiting > > 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 3 on 60000: exiting > > 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 4 on 60000: exiting > > 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 7 on 60000: exiting > > 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 8 on 60000: exiting > > 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 9 on 60000: exiting > > 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 2 on 60000: exiting > > 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server listener on 60000 > > 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server Responder > > 2014-08-23 07:05:34,214 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > IPC Server Responder > > 2014-08-23 07:05:34,212 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC > > Server handler 1 on 60000: exiting > > 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 0 on 60000: exiting > > 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > > handler 1 on 60000: exiting > > 2014-08-23 07:05:34,256 INFO org.mortbay.log: Stopped > > [email protected]:60010 > > 2014-08-23 07:05:34,259 FATAL org.apache.hadoop.hbase.master.HMaster: > > Master server abort: loaded coprocessors are: [] > > 2014-08-23 07:05:34,260 FATAL org.apache.hadoop.hbase.master.HMaster: > > master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001 > > master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001 > > received expired from ZooKeeper, aborting > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384) > > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303) > > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > 2014-08-23 07:05:34,414 ERROR > > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master > > java.lang.RuntimeException: HMaster Aborted > > at > > org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) > > at > > org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) > > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129) > > > > GC log: > > ======= > > > > 0.185: Application time: 0.1304320 seconds > > 0.185: [GC0.185: [ParNew: 4288K->511K(4800K), 0.0120520 secs] > > 4288K->1008K(15424K), 0.0121600 secs] [Times: user=0.01 sys=0.01, real=0.01 > > secs] > > 0.197: Total time for which application threads were stopped: 0.0126240 > > seconds > > Heap > > par new generation total 4800K, used 3580K [0x00000000b7200000, > > 0x00000000b7730000, 0x00000000c1860000) > > eden space 4288K, 71% used [0x00000000b7200000, 0x00000000b74ff328, > > 0x00000000b7630000) > > from space 512K, 99% used [0x00000000b76b0000, 0x00000000b772fff8, > > 0x00000000b7730000) > > to space 512K, 0% used [0x00000000b7630000, 0x00000000b7630000, > > 0x00000000b76b0000) > > concurrent mark-sweep generation total 10624K, used 496K > > [0x00000000c1860000, 0x00000000c22c0000, 0x00000000f5a00000) > > concurrent-mark-sweep perm gen total 21248K, used 6688K > > [0x00000000f5a00000, 0x00000000f6ec0000, 0x0000000100000000) > > 0.370: Application time: 0.1728650 seconds > > > > >
