[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900725#comment-14900725 ] Yu Li commented on HBASE-14431: --- >From the HadoopQA report of HBASE-14448, I found TestFastFail failed with >below log: {noformat} 2015-09-21 11:42:58,768 WARN [AsyncRpcChannel-pool2-t17] logging.Slf4JLogger(151): An exception was thrown by org.apache.hadoop.hbase.ipc.AsyncRpcChannel$2.operationComplete() java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.AsyncRpcClient.removeConnection(AsyncRpcClient.java:406) at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.close(AsyncRpcChannel.java:537) at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.retryOrClose(AsyncRpcChannel.java:300) at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.access$200(AsyncRpcChannel.java:82) {noformat} Checking line 406 of AsyncRpcClient.java, we could find below changes in this JIRA: {noformat} -int connectionHashCode = connection.getConnectionHashCode(); +int connectionHashCode = connection.hashCode(); synchronized (connections) { // we use address as cache key, so we should check here to prevent removing the // wrong connection AsyncRpcChannel connectionInPool = this.connections.get(connectionHashCode); - if (connectionInPool == connection) { + if (connectionInPool.equals(connection)) { {noformat} And line 406 is {code} if (connectionInPool.equals(connection)) { {code} I think we lack a null pointer check here, and attached is a straight addendum. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900932#comment-14900932 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.1 #669 (See [https://builds.apache.org/job/HBase-1.1/669/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev 9ae6cead335c5afc298bd192820ecb7af928ab2c) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900972#comment-14900972 ] Hudson commented on HBASE-14431: SUCCESS: Integrated in HBase-1.2-IT #158 (See [https://builds.apache.org/job/HBase-1.2-IT/158/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev 78c8c772db84d88dcecaafd6ba9c7f7e611cc091) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900995#comment-14900995 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.2 #187 (See [https://builds.apache.org/job/HBase-1.2/187/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev 78c8c772db84d88dcecaafd6ba9c7f7e611cc091) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900977#comment-14900977 ] Hudson commented on HBASE-14431: SUCCESS: Integrated in HBase-1.3-IT #169 (See [https://builds.apache.org/job/HBase-1.3-IT/169/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev ca6c7f0a6857a5ac16be6a13c461e2aae0b51821) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900742#comment-14900742 ] Ted Yu commented on HBASE-14431: Integrated addendum to related branches. Thanks, Yu. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901342#comment-14901342 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-TRUNK #6824 (See [https://builds.apache.org/job/HBase-TRUNK/6824/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev 86cf14889462b6947f921c41401a8f925fe2b3b6) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901221#comment-14901221 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.3 #188 (See [https://builds.apache.org/job/HBase-1.3/188/]) HBASE-14431 Addendum checks for null connectionInPool (Yu Li) (tedyu: rev ca6c7f0a6857a5ac16be6a13c461e2aae0b51821) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-addendum.patch, HBASE-14431-v2.patch, > HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877053#comment-14877053 ] Ted Yu commented on HBASE-14431: +1 > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877141#comment-14877141 ] Hadoop QA commented on HBASE-14431: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12761281/HBASE-14431-v2.patch against master branch at commit b0f52332651ecbb8af11557df5af3189c7283212. ATTACHMENT ID: 12761281 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15641//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15641//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15641//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15641//console This message is automatically generated. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877218#comment-14877218 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.1 #667 (See [https://builds.apache.org/job/HBase-1.1/667/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 911c4342ae66447d51ec05e25eeb3b6c4d348a22) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877222#comment-14877222 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.3 #185 (See [https://builds.apache.org/job/HBase-1.3/185/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 88adccd553e4f70a0e5362d5ab5158f45d57d201) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877211#comment-14877211 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-TRUNK #6821 (See [https://builds.apache.org/job/HBase-TRUNK/6821/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 1545e1ed8d68b780dca49084cf5d8173481f72c0) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877206#comment-14877206 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.2-IT #156 (See [https://builds.apache.org/job/HBase-1.2-IT/156/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 388e948dfedab59cfe8fe8cf42001fec0eb32cd3) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877204#comment-14877204 ] Hudson commented on HBASE-14431: SUCCESS: Integrated in HBase-1.3-IT #167 (See [https://builds.apache.org/job/HBase-1.3-IT/167/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 88adccd553e4f70a0e5362d5ab5158f45d57d201) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877214#comment-14877214 ] Hudson commented on HBASE-14431: FAILURE: Integrated in HBase-1.2 #185 (See [https://builds.apache.org/job/HBase-1.2/185/]) HBASE-14431 AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails (Samir Ahmic) (tedyu: rev 388e948dfedab59cfe8fe8cf42001fec0eb32cd3) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcClient.java * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/AsyncRpcChannel.java > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14431-v2.patch, HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790514#comment-14790514 ] Ted Yu commented on HBASE-14431: lgtm nit: connection.hashCode() is computed twice. You can save the return value in a local variable. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790813#comment-14790813 ] Hadoop QA commented on HBASE-14431: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756275/HBASE-14431.patch against master branch at commit d2e338181800ae3cef55ddca491901b65259dc7f. ATTACHMENT ID: 12756275 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFastFail Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//console This message is automatically generated. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791053#comment-14791053 ] Samir Ahmic commented on HBASE-14431: - This is interesting. I have run TestFastFail several times on two different machines and test never fails. I was using java 1.7.0_80 and 1.7.0_71 - > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745146#comment-14745146 ] Heng Chen commented on HBASE-14431: --- Is it a better choice to override {{hashCode}} method in {{AsyncRpcChannel}} ? > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746000#comment-14746000 ] Samir Ahmic commented on HBASE-14431: - Looks like good idea [~chenheng]. Thanks for review. I will include it in patch after some more testing. [~stack] thanks for pushing [HBASE-13337|https://issues.apache.org/jira/browse/HBASE-13337] on master branch > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745518#comment-14745518 ] Heng Chen commented on HBASE-14431: --- {quote} Is it a better choice to override hashCode method in AsyncRpcChannel ? I don't see that method. Can you elaborate ? {quote} In class {{AsyncRpcChannel}}, we override {{hashCode}} method just like {code} @Override public int hashCode() { return getConnectionHashCode(); } {code} Any concerns? > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745379#comment-14745379 ] Samir Ahmic commented on HBASE-14431: - bg. Is it a better choice to override hashCode method in AsyncRpcChannel ? I don't see that method. Can you elaborate ? BTW is there a reason why [HBASE-13337|https://issues.apache.org/jira/browse/HBASE-13337] is not committed to master branch? Without it any testing where restart of servers is included will cause issues and only in master branch AsyncRpcClient is default client implementation. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745457#comment-14745457 ] stack commented on HBASE-14431: --- bq. BTW is there a reason why HBASE-13337 is not committed to master branch? None. Mistake on my part. Fixed. Thanks for noticing [~asamir] > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743966#comment-14743966 ] stack commented on HBASE-14431: --- [~asamir] Nice debugging. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)