First of all thanks a lot for coming forward with helping hand. Here my answers along with the question you asked
How many zookeeper servers do you have ? Or what is the number of clients you have running per host Ans: I have only one linux box which is only one node system. Basically in a single system I have installed Hbase. what is the configured value of maxClientCnxns in the ZooKeeper servers? Ans: We are using the default configuration. We have not introduced any new value in hbase-site.xml Is the issue impacting clients only or is it also impacting the RegionServers Ans: In this case all regional server, master node, client is same. Because we have installed hbase in a single system Have you looked into why the ZooKeeper server is no longer accepting connections Ans: Now I checked logs of hbase just at the moment my application broke for me it l*ooked like JVM went for Garbage collection after that it newer came back.* *Which resulted in exception.Is my interpretation correct. kindly let me know * Here is the complete log 2015-06-01 19:59:53,808 INFO [pool-55-thread-1] master.HMaster: Master has completed initialization 2015-06-01 19:59:53,808 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2015-06-01 20:00:46,431 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 6885ms GC pool 'ParNew' had collection(s): count=1 time=7383ms 2015-06-01 20:00:46,431 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 6886ms GC pool 'ParNew' had collection(s): count=1 time=7383ms 2015-06-01 20:00:47,032 WARN [M:0;hadoop2:35923.oldLogCleaner] cleaner.CleanerChore: A file cleanerM:0;hadoop2:35923.oldLogCleaner is stopped, won't delete any more files in:file:/home/hadoop/hbaseDataDir/oldWALs 2015-06-01 20:02:05,148 WARN [M:0;hadoop2:35923.oldLogCleaner] util.Sleeper: We slept 78116ms instead of 60000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,148 WARN [M:0;hadoop2:35923.archivedHFileCleaner] util.Sleeper: We slept 78122ms instead of 60000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,149 WARN [hadoop2,35923,1432909409923-ClusterStatusChore] util.Sleeper: We slept 78128ms instead of 60000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,149 WARN [RS:0;hadoop2:40129] util.Sleeper: We slept 39687ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,151 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 39206ms GC pool 'ParNew' had collection(s): count=1 time=39328ms 2015-06-01 20:02:05,151 WARN [M:0;hadoop2:35923] util.Sleeper: We slept 39345ms instead of 100ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,151 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 39205ms GC pool 'ParNew' had collection(s): count=1 time=39328ms 2015-06-01 20:02:05,151 INFO [SessionTracker] server.ZooKeeperServer: Expiring session 0x14da00e69e00001, timeout of 40000ms exceeded 2015-06-01 20:02:05,151 INFO [RS:0;hadoop2:40129-SendThread(hadoop2:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 52055ms for sessionid 0x14da00e69e00001, closing socket connection and attempting reconnect 2015-06-01 20:02:05,151 INFO [RS:0;hadoop2:40129-SendThread(hadoop2:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 52053ms for sessionid 0x14da00e69e00004, closing socket connection and attempting reconnect 2015-06-01 20:02:05,151 WARN [hadoop2,35923,1432909409923.splitLogManagerTimeoutMonitor] util.Sleeper: We slept 39713ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2015-06-01 20:02:05,155 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14da00e69e00001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) On Tue, Jun 2, 2015 at 12:45 AM, jeevi tesh <[email protected]> wrote: > Hi, > I'm running into this issue several times but still not able resolve > kindly help me in this regard. > I have written a crawler which will be keep running for several days after > 4 days of continuous interaction of data base with my application system. > Data base fails to responsed. I'm not able to figure where things all of a > sudden can go wrong after 4 days of proper running. > My configuration i have used hbase 0.96.2 single server. > jdk 1.7 > > issue is this following error > WARN [http-bio-8080-exec-4-SendThread(hadoop2:2181)] zookeeper.ClientCnxn > (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connection refused > If this exception happens only solution i have is restart hbase that is > not a viable solution because that will corrupt my system data. >
