cassandra git commit: List clients by protocol versions `nodetool clientstats --by-protocol`

2018-05-04 Thread jasobrown
Repository: cassandra
Updated Branches:
  refs/heads/trunk 78ca3447c -> a9ec46a61


List clients by protocol versions `nodetool clientstats --by-protocol`

patch by Dinesh Joshi; reviewed by jasobrown for CASSANDRA-14335


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a9ec46a6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a9ec46a6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a9ec46a6

Branch: refs/heads/trunk
Commit: a9ec46a613ae5602ced004935c9954638e83e735
Parents: 78ca344
Author: Dinesh A. Joshi 
Authored: Mon Apr 30 17:18:44 2018 -0700
Committer: Jason Brown 
Committed: Fri May 4 11:19:44 2018 -0700

--
 CHANGES.txt |   3 +-
 .../cassandra/service/CassandraDaemon.java  |   6 +
 .../service/NativeTransportService.java |  17 +++
 .../cassandra/service/StorageService.java   |   7 ++
 .../cassandra/service/StorageServiceMBean.java  |   4 +
 .../org/apache/cassandra/tools/NodeProbe.java   |   6 +
 .../cassandra/tools/nodetool/ClientStats.java   |  41 +++
 .../transport/ProtocolVersionTracker.java   | 109 ++
 .../org/apache/cassandra/transport/Server.java  |  32 ++
 .../transport/ProtocolVersionTrackerTest.java   | 115 +++
 10 files changed, 339 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9ec46a6/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 628d0af..25c237f 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,9 +1,10 @@
 4.0
  * Replace deprecated junit.framework.Assert usages with org.junit.Assert 
(CASSANDRA-14431)
  * cassandra-stress throws NPE if insert section isn't specified in user 
profile (CASSSANDRA-14426)
+ * List clients by protocol versions `nodetool clientstats --by-protocol` 
(CASSANDRA-14335)
  * Improve LatencyMetrics performance by reducing write path processing 
(CASSANDRA-14281)
  * Add network authz (CASSANDRA-13985)
- * Use the correct IP/Port for Streaming when localAddress is left unbound 
(CASSANDAR-14389)
+ * Use the correct IP/Port for Streaming when localAddress is left unbound 
(CASSANDRA-14389)
  * nodetool listsnapshots is missing local system keyspace snapshots 
(CASSANDRA-14381)
  * Remove StreamCoordinator.streamExecutor thread pool (CASSANDRA-14402)
  * Rename nodetool --with-port to --print-port to disambiguate from --port 
(CASSANDRA-14392)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9ec46a6/src/java/org/apache/cassandra/service/CassandraDaemon.java
--
diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java 
b/src/java/org/apache/cassandra/service/CassandraDaemon.java
index 5ca3844..7e7649d 100644
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@ -679,6 +679,10 @@ public class CassandraDaemon
 instance.activate();
 }
 
+public void clearConnectionHistory() {
+nativeTransportService.clearConnectionHistory();
+}
+
 private void exitOrFail(int code, String message)
 {
 exitOrFail(code, message, null);
@@ -731,5 +735,7 @@ public class CassandraDaemon
  * Returns whether the server is currently running.
  */
 public boolean isRunning();
+
+public void clearConnectionHistory();
 }
 }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/a9ec46a6/src/java/org/apache/cassandra/service/NativeTransportService.java
--
diff --git a/src/java/org/apache/cassandra/service/NativeTransportService.java 
b/src/java/org/apache/cassandra/service/NativeTransportService.java
index d70e56e..39b334e 100644
--- a/src/java/org/apache/cassandra/service/NativeTransportService.java
+++ b/src/java/org/apache/cassandra/service/NativeTransportService.java
@@ -148,6 +148,17 @@ public class NativeTransportService
 }
 return result;
 });
+
+ClientMetrics.instance.addGauge("clientsByProtocolVersion", () ->
+{
+List> result = new ArrayList<>();
+for (Server server : servers)
+{
+result.addAll(server.getClientsByProtocolVersion());
+}
+return result;
+});
+
 AuthMetrics.init();
 
 initialized = true;
@@ -225,4 +236,10 @@ public class NativeTransportService
 {
 return servers;
 }
+
+public void clearConnectionHistory()
+{
+for (Server server : servers)
+

[jira] [Updated] (CASSANDRA-14335) C* nodetool should report the lowest of the highest CQL protocol version supported by all clients connecting to it

2018-05-04 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-14335:

   Resolution: Fixed
Fix Version/s: 4.0
   Status: Resolved  (was: Patch Available)

+1, and committed with a few minor spacing/naming changes as sha 
{{a9ec46a613ae5602ced004935c9954638e83e735}}

> C* nodetool should report the lowest of the highest CQL protocol version 
> supported by all clients connecting to it
> --
>
> Key: CASSANDRA-14335
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14335
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 4.0
>
>
> While upgrading C*, it makes it hard to tell whether any client will be 
> affected if C* is upgraded. C* should internally store the highest protocol 
> version of all clients connecting to it. The lowest supported version will 
> help determining if any client will be adversely affected by the upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7622) Implement virtual tables

2018-05-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464082#comment-16464082
 ] 

Aleksey Yeschenko commented on CASSANDRA-7622:
--

Pushed another commit that untangles things further, with eventual goal of 
nothing knowing about virtual tables except {{Schema}} and some CQL statements. 
Also streamlines metadata initialisation of the virtual tables - there was no 
need to reinvent a {{TableMetadata.Builder}}, now the tables just use it 
instead.

Part 2 will complete metadata segregation, will look into CQL statements 
integration next.

> Implement virtual tables
> 
>
> Key: CASSANDRA-7622
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7622
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tupshin Harper
>Assignee: Chris Lohfink
>Priority: Major
> Fix For: 4.x
>
> Attachments: screenshot-1.png
>
>
> There are a variety of reasons to want virtual tables, which would be any 
> table that would be backed by an API, rather than data explicitly managed and 
> stored as sstables.
> One possible use case would be to expose JMX data through CQL as a 
> resurrection of CASSANDRA-3527.
> Another is a more general framework to implement the ability to expose yaml 
> configuration information. So it would be an alternate approach to 
> CASSANDRA-7370.
> A possible implementation would be in terms of CASSANDRA-7443, but I am not 
> presupposing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:25 AM:
--

[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.


was (Author: alienth):
{{.88}} was down for about 10 minutes, and was shutdown via `nodetool drain`, 
followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

{{.88}} was down for about 10 minutes, and was shutdown via `nodetool drain`, 
followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464495#comment-16464495
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/4/18 11:43 PM:
---

Captured the socket state for before, during, and after the restart, from the 
POV of a node which the restarted node sees as down:

Before, when nothing has been done:
{{Netid  State  Recv-Q Send-Q Local Address:Port   Peer 
Address:Port}}
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

After 10.0.107.88 has been restarted. Note the {{1}} in in the recv-q:
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min14sec,0) uid:115 
ino:5913893 sk:216e -->}}

When 10.0.107.88 comes back up and 10.0.161.40 tries to respond to the 
EchoMessage, using the previous socket which has been in CLOSE-WAIT. You can 
see the outbounds piling up in the send-q:
{{tcpCLOSE-WAIT 1  36527  10.0.161.40:59739  
10.0.107.88:7000timer:(on,1.932ms,4) uid:115 ino:5913893 
sk:216e -->}}


was (Author: alienth):
Captured the socket state for before, during, and after the restart, from the 
POV of a node which the restarted node sees as down:

Before, when nothing has been done:
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

After 10.0.107.88 has been restarted. Note the {{1}} in in the recv-q:
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min14sec,0) uid:115 
ino:5913893 sk:216e -->}}

When 10.0.107.88 comes back up and 10.0.161.40 tries to respond to the 
EchoMessage, using the previous socket which has been in CLOSE-WAIT:
{{tcpCLOSE-WAIT 1  36527  10.0.161.40:59739  
10.0.107.88:7000timer:(on,1.932ms,4) uid:115 ino:5913893 
sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464622#comment-16464622
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 4:40 AM:
--

Why on earth didn't the failure detector on {{.40}} see anything for 20 
minutes? Possibly due to blocking on an Outbound forever?


was (Author: alienth):
Why on earth didn't the failure detector in {{.40}} see anything wrong for 20 
minutes? Possibly due to blocking on an Outbound forever?

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
>  which conveniently calls connect with a default timeout of "forever". We 
> could hang here forever until the OS kills us.
>  # If we continue, we get to 
> 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464622#comment-16464622
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

Why on earth didn't the failure detector in {{.40}} see anything wrong for 20 
minutes? Possibly due to blocking on an Outbound forever?

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
>  which conveniently calls connect with a default timeout of "forever". We 
> could hang here forever until the OS kills us.
>  # If we continue, we get to 
> [writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
>  which eventually calls 
> 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464604#comment-16464604
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket to {{.40}}:

{{Fri May  4 16:02:15 PDT 2018 }}
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May  4 16:02:20 PDT 2018}}
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min54sec,0) uid:115 
ino:5913893 sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464604#comment-16464604
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:34 AM:
--

Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket to {{.40}}:

{{Fri May  4 16:02:15 PDT 201}}
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May  4 16:02:20 PDT 2018}}
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min54sec,0) uid:115 
ino:5913893 sk:216e -->}}


was (Author: alienth):
Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket to {{.40}}:

{{Fri May  4 16:02:15 PDT 2018 }}
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May  4 16:02:20 PDT 2018}}
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min54sec,0) uid:115 
ino:5913893 sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464495#comment-16464495
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

Captured the socket state for before, during, and after the restart, from the 
POV of a node which the restarted node sees as down:

Before, when nothing has been done:
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

After 10.0.107.88 has been restarted. Note the {{1}} in in the recv-q:
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min14sec,0) uid:115 
ino:5913893 sk:216e -->}}

When 10.0.107.88 comes back up and 10.0.161.40 tries to respond to the 
EchoMessage, using the previous socket which has been in CLOSE-WAIT:
{{tcpCLOSE-WAIT 1  36527  10.0.161.40:59739  
10.0.107.88:7000timer:(on,1.932ms,4) uid:115 ino:5913893 
sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464604#comment-16464604
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:36 AM:
--

Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket from {{.40}} to {{.88}}:

{{Fri May 4 16:02:15 PDT 201}}
 {{tcp ESTAB 0 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min59sec,0) uid:115 ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May 4 16:02:20 PDT 2018}}
 {{tcp CLOSE-WAIT 1 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min54sec,0) uid:115 ino:5913893 sk:216e -->}}


was (Author: alienth):
Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket to {{.40}}:

{{Fri May  4 16:02:15 PDT 201}}
{{tcpESTAB  0  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min59sec,0) uid:115 
ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May  4 16:02:20 PDT 2018}}
{{tcpCLOSE-WAIT 1  0  10.0.161.40:59739  
10.0.107.88:7000timer:(keepalive,4min54sec,0) uid:115 
ino:5913893 sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464604#comment-16464604
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:37 AM:
--

Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket from {{.40}} to {{.88}}:

{{Fri May 4 16:02:15 PDT 2018}}
 {{tcp ESTAB 0 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min59sec,0) uid:115 ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May 4 16:02:20 PDT 2018}}
 {{tcp CLOSE-WAIT 1 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min54sec,0) uid:115 ino:5913893 sk:216e -->}}


was (Author: alienth):
Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket from {{.40}} to {{.88}}:

{{Fri May 4 16:02:15 PDT 201}}
 {{tcp ESTAB 0 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min59sec,0) uid:115 ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May 4 16:02:20 PDT 2018}}
 {{tcp CLOSE-WAIT 1 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min54sec,0) uid:115 ino:5913893 sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 4:24 AM:
--

[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until the socket to 
{{.88}} is killed about 20 minutes later. {{.88}} was shut down at 1602 and 
came back up at 1612:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

{{INFO  [main] 2018-05-04 16:12:39,746 StorageService.java:1449 - JOINING: 
Finish joining ring}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.


was (Author: alienth):
[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until the socket to 
{{.88}} is killed about 20 minutes later. {{.88}} was shut down at 1602 and 
came back up at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

{{INFO  [main] 2018-05-04 16:12:39,746 StorageService.java:1449 - JOINING: 
Finish joining ring}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:27 AM:
--

[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until the socket to 
{{.88}} is killed about 20 minutes later. {{.88}} was shut down at 1602 and 
came back up at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

{{INFO  [main] 2018-05-04 16:12:39,746 StorageService.java:1449 - JOINING: 
Finish joining ring}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.


was (Author: alienth):
[~jasobrown] {{.88}} was down for about 20 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:26 AM:
--

[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.


was (Author: alienth):
[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464596#comment-16464596
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:26 AM:
--

[~jasobrown] {{.88}} was down for about 20 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.


was (Author: alienth):
[~jasobrown] {{.88}} was down for about 10 minutes, and was shutdown via 
`nodetool drain`, followed by stopping the service gracefully.


{{.40}} has no gossiper logs showing {{.88}} going down, until {{.88}} comes 
back 20 minutes later. {{.88}} was shut down at 1602:

{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,055 
StorageService.java:1449 - DRAINING: starting drain process}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,057 
HintsService.java:220 - Paused hints dispatch}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,058 
ThriftServer.java:139 - Stop listening to thrift clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,077 
Server.java:176 - Stop listening for CQL clients}}
{{INFO  [RMI TCP Connection(22)-127.0.0.1] 2018-05-04 16:02:14,078 
Gossiper.java:1540 - Announcing shutdown}}

And {{.40}} has no logs of this happening, and seemingly doesn't see it as down 
until ~20 minutes later:

{{INFO  [GossipStage:1] 2018-05-01 16:20:36,680 Gossiper.java:1034 - 
InetAddress /10.0.107.88 is now DOWN}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,693 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [HANDSHAKE-/10.0.107.88] 2018-05-01 16:20:36,822 
OutboundTcpConnection.java:560 - Handshaking version with /10.0.107.88}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,323 Gossiper.java:1053 - Node 
/10.0.107.88 has restarted, now UP}}
{{INFO  [GossipStage:1] 2018-05-01 16:20:56,325 StorageService.java:2292 - Node 
/10.0.107.88 state jump to NORMAL}}

That is.. bizarre.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464598#comment-16464598
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

One extra thing to note, as expected turning down `tcp_retries2` does greatly 
alleviate this issue. The `CLOSE-WAIT` socket is nuked by the kernel much more 
quickly, resulting in the partition resolving quickly.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
>  which conveniently calls connect with a default timeout of "forever". We 
> could hang here forever until the OS kills us.
>  # If we continue, we get to 
> [writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
>  which eventually calls 
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464604#comment-16464604
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/5/18 3:47 AM:
--

Extra note: At the same second that {{.88}}'s gossiper announced shutdown, 
there was a newly established socket from {{.40}} to {{.88}}:

{{Fri May 4 16:02:15 PDT 2018}}
 {{tcp ESTAB 0 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min59sec,0) uid:115 ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May 4 16:02:20 PDT 2018}}
 {{tcp CLOSE-WAIT 1 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min54sec,0) uid:115 ino:5913893 sk:216e -->}}


was (Author: alienth):
Extra note: One second after {{.88}}'s gossiper announced shutdown, there was a 
newly established socket from {{.40}} to {{.88}}:

{{Fri May 4 16:02:15 PDT 2018}}
 {{tcp ESTAB 0 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min59sec,0) uid:115 ino:5913893 sk:216e <->}}

This is the same socket that would become CLOSE-WAIT only seconds later, and 
remain that way for the following 18 minutes:

{{Fri May 4 16:02:20 PDT 2018}}
 {{tcp CLOSE-WAIT 1 0 10.0.161.40:59739 10.0.107.88:7000 
timer:(keepalive,4min54sec,0) uid:115 ino:5913893 sk:216e -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464368#comment-16464368
 ] 

Jason Harvey commented on CASSANDRA-14358:
--

Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection having a dropped ACK somewhere.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
>  which conveniently calls connect with a default timeout of "forever". We 
> could hang here forever until the OS kills us.
>  # If we continue, we get to 
> [writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
>  which eventually 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464368#comment-16464368
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/4/18 8:44 PM:
--

Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection, and an ACK being dropped somewhere 
resulting in a node not fully closing the connection.


{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}


was (Author: alienth):
Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection having a dropped ACK somewhere.


{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> 

cassandra git commit: ninja commit the minor updates for CASSANDRA-14335

2018-05-04 Thread jasobrown
Repository: cassandra
Updated Branches:
  refs/heads/trunk a9ec46a61 -> 68605cf03


ninja commit the minor updates for CASSANDRA-14335


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/68605cf0
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/68605cf0
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/68605cf0

Branch: refs/heads/trunk
Commit: 68605cf03bdfecb11cd69c6d5260a773e4e87300
Parents: a9ec46a
Author: Jason Brown 
Authored: Fri May 4 12:57:07 2018 -0700
Committer: Jason Brown 
Committed: Fri May 4 12:57:07 2018 -0700

--
 .../apache/cassandra/service/CassandraDaemon.java |  3 ++-
 .../cassandra/tools/nodetool/ClientStats.java | 10 +-
 .../transport/ProtocolVersionTracker.java | 18 ++
 .../org/apache/cassandra/transport/Server.java| 11 ++-
 4 files changed, 23 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/68605cf0/src/java/org/apache/cassandra/service/CassandraDaemon.java
--
diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java 
b/src/java/org/apache/cassandra/service/CassandraDaemon.java
index 7e7649d..80b8b7b 100644
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java
@@ -679,7 +679,8 @@ public class CassandraDaemon
 instance.activate();
 }
 
-public void clearConnectionHistory() {
+public void clearConnectionHistory()
+{
 nativeTransportService.clearConnectionHistory();
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/68605cf0/src/java/org/apache/cassandra/tools/nodetool/ClientStats.java
--
diff --git a/src/java/org/apache/cassandra/tools/nodetool/ClientStats.java 
b/src/java/org/apache/cassandra/tools/nodetool/ClientStats.java
index 9b4ada8..5bd5da1 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/ClientStats.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/ClientStats.java
@@ -36,8 +36,8 @@ public class ClientStats extends NodeToolCmd
 @Option(title = "list_connections", name = "--all", description = "Lists 
all connections")
 private boolean listConnections = false;
 
-@Option(title = "by_protocol", name = "--by-protocol", description = 
"Lists last 100 client connections with oldest protocol version")
-private boolean oldestProtocolConnections = false;
+@Option(title = "by_protocol", name = "--by-protocol", description = 
"Lists most recent client connections by protocol version")
+private boolean connectionsByProtocolVersion = false;
 
 @Option(title = "clear_history", name = "--clear-history", description = 
"Clear the history of connected clients")
 private boolean clearConnectionHistory = false;
@@ -47,17 +47,17 @@ public class ClientStats extends NodeToolCmd
 {
 if (clearConnectionHistory)
 {
-System.out.println("Clearing history");
+System.out.println("Clearing connection history");
 probe.clearConnectionHistory();
 return;
 }
 
-if (oldestProtocolConnections)
+if (connectionsByProtocolVersion)
 {
 SimpleDateFormat sdf = new SimpleDateFormat("MMM dd,  
HH:mm:ss");
 
 System.out.println("Clients by protocol version");
-System.out.println("");
+System.out.println();
 
 List> clients = (List>) 
probe.getClientMetric("clientsByProtocolVersion");
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/68605cf0/src/java/org/apache/cassandra/transport/ProtocolVersionTracker.java
--
diff --git 
a/src/java/org/apache/cassandra/transport/ProtocolVersionTracker.java 
b/src/java/org/apache/cassandra/transport/ProtocolVersionTracker.java
index a2468bd..848917b 100644
--- a/src/java/org/apache/cassandra/transport/ProtocolVersionTracker.java
+++ b/src/java/org/apache/cassandra/transport/ProtocolVersionTracker.java
@@ -40,7 +40,7 @@ public class ProtocolVersionTracker
 public static final int DEFAULT_MAX_CAPACITY = 100;
 
 @VisibleForTesting
-final EnumMap> 
clientsByProto;
+final EnumMap> 
clientsByProtocolVersion;
 
 public ProtocolVersionTracker()
 {
@@ -49,12 +49,12 @@ public class ProtocolVersionTracker
 
 public ProtocolVersionTracker(final int capacity)
 {
-this.clientsByProto = new 

[jira] [Commented] (CASSANDRA-14335) C* nodetool should report the lowest of the highest CQL protocol version supported by all clients connecting to it

2018-05-04 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464329#comment-16464329
 ] 

Jason Brown commented on CASSANDRA-14335:
-

Ughh, forgot to commit my minor changes locally before pushing up. commited 
those changes as {{68605cf03bdfecb11cd69c6d5260a773e4e87300}}

> C* nodetool should report the lowest of the highest CQL protocol version 
> supported by all clients connecting to it
> --
>
> Key: CASSANDRA-14335
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14335
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 4.0
>
>
> While upgrading C*, it makes it hard to tell whether any client will be 
> affected if C* is upgraded. C* should internally store the highest protocol 
> version of all clients connecting to it. The lowest supported version will 
> help determining if any client will be adversely affected by the upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464368#comment-16464368
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/4/18 8:42 PM:
--

Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection having a dropped ACK somewhere.


{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}


was (Author: alienth):
Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection having a dropped ACK somewhere.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464368#comment-16464368
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/4/18 8:53 PM:
--

Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection, and an ACK being dropped somewhere 
resulting in a node not fully closing the connection?

Note that I'm seeing this on cassandra 3.11.2, which has the patches from 
CASSANDRA-9630

{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}


was (Author: alienth):
Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection, and an ACK being dropped somewhere 
resulting in a node not fully closing the connection.

Note that I'm seeing this on cassandra 3.11.2, which has the patches from 
CASSANDRA-9630

{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully 

[jira] [Comment Edited] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464368#comment-16464368
 ] 

Jason Harvey edited comment on CASSANDRA-14358 at 5/4/18 8:52 PM:
--

Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection, and an ACK being dropped somewhere 
resulting in a node not fully closing the connection.

Note that I'm seeing this on cassandra 3.11.2, which has the patches from 
CASSANDRA-9630

{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}


was (Author: alienth):
Reproduced this behaviour several times now,and in all cases the socket that 
cassandra is trying to Outbound on has been in stuck in `CLOSE_WAIT`.

I think that suggests that the more common case might be where AWS stops 
tracking the flow for a FINd connection, and an ACK being dropped somewhere 
resulting in a node not fully closing the connection.


{{tcpCLOSE-WAIT 1  322683 10.0.161.40:18254  
10.0.109.39:7000timer:(on,1min54sec,13) uid:115 ino:5837280 
sk:1a1 -->}}

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node 

[jira] [Commented] (CASSANDRA-14335) C* nodetool should report the lowest of the highest CQL protocol version supported by all clients connecting to it

2018-05-04 Thread Dinesh Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464426#comment-16464426
 ] 

Dinesh Joshi commented on CASSANDRA-14335:
--

Thanks [~jasobrown]

> C* nodetool should report the lowest of the highest CQL protocol version 
> supported by all clients connecting to it
> --
>
> Key: CASSANDRA-14335
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14335
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 4.0
>
>
> While upgrading C*, it makes it hard to tell whether any client will be 
> affected if C* is upgraded. C* should internally store the highest protocol 
> version of all clients connecting to it. The lowest supported version will 
> help determining if any client will be adversely affected by the upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13981) Enable Cassandra for Persistent Memory

2018-05-04 Thread Michael Burman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463929#comment-16463929
 ] 

Michael Burman commented on CASSANDRA-13981:


Just my 2 cents, but the approach seems slightly odd if one wants to take 
advantage of the persistent memory. This stuff should override the whole 
CommitLog path also. So instead of writing first to WAL and then to the 
memtable, this should skip both the memtable as well as commitlog in my opinion 
and basically rewrite the whole storage path. And same applies to reads, 
compactions, etc as it should never create SSTables. Same way, when starting 
Cassandra the recovery process should be changed also.

I would actually redesign also the query parsing, query execution and other 
parts as well as these do not fit the model of persistent memory too well 
either. I have no idea how this would affect the streaming processing as 
suddenly the tables are not necessarily anymore immutable (as LSM isn't 
designed for persistent memory). I'd say this is quite difficult approach to 
integrate to Cassandra if it's anything more than "half-baked exercise". Just 
replacing the data model isn't enough as the whole approach of having a single 
type of memory is so fundamentally different to what Cassandra was designed for.

> Enable Cassandra for Persistent Memory 
> ---
>
> Key: CASSANDRA-13981
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13981
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Preetika Tyagi
>Assignee: Preetika Tyagi
>Priority: Major
> Fix For: 4.0
>
> Attachments: in-mem-cassandra-1.0.patch, readme.txt
>
>
> Currently, Cassandra relies on disks for data storage and hence it needs data 
> serialization, compaction, bloom filters and partition summary/index for 
> speedy access of the data. However, with persistent memory, data can be 
> stored directly in the form of Java objects and collections, which can 
> greatly simplify the retrieval mechanism of the data. What we are proposing 
> is to make use of faster and scalable B+ tree-based data collections built 
> for persistent memory in Java (PCJ: https://github.com/pmem/pcj) and enable a 
> complete in-memory version of Cassandra, while still keeping the data 
> persistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14424) Gossip EchoMessages not being handled somewhere after node restart

2018-05-04 Thread Jason Harvey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463323#comment-16463323
 ] 

Jason Harvey edited comment on CASSANDRA-14424 at 5/4/18 6:45 AM:
--

[~jolynch] I was able to reproduce it again and I've confirmed that this is 
indeed due to the same thing you're detailing in CASSANDRA-14358 . I'm not 
using internode TLS at all, but of course the fundamental firewall timeout 
still exists.

I'm OK with this being marked as a duplicate of CASSANDRA-14358


was (Author: alienth):
[~jolynch] I was able to reproduce it again and I've confirmed that this is 
indeed due to the same thing you're detailing in CASSANDRA-14538 . I'm not 
using internode TLS at all, but of course the fundamental firewall timeout 
still exists.

I'm OK with this being marked as a duplicate of CASSANDRA-14538

> Gossip EchoMessages not being handled somewhere after node restart
> --
>
> Key: CASSANDRA-14424
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14424
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 3.11.2 - brand new ring - 18 nodes.
> ubuntu 16.04
> AWS - cross AZ, with GossipingPropertyFileSnitch setting the rack to the AZs.
>Reporter: Jason Harvey
>Priority: Major
> Fix For: 3.11.x, 4.x
>
>
> Noticing this behaviour on a brand new 3.11.2 ring:
>  # Restart a random node in the ring.
>  # When that node comes back up, around 30% of the time it sees a single 
> other node down. No other node in the ring sees that node is down.
>  # After 10-20 minutes, the DOWN node suddenly appears UP to the restarted 
> node.
>  
> After digging through tracing logs, here's what I know:
>  
> The node seen as DOWN has not gone down, but simply hasn't been seen as UP 
> yet. The restarted node is attempting to `markAlive()` the target node. 
> Relevant logs from the restarted node's POV:
>  
> {{INFO [GossipStage:1] 2018-04-27 14:03:50,950 Gossiper.java:1053 - Node 
> /10.0.225.147 has restarted, now UP}}
>  {{INFO [GossipStage:1] 2018-04-27 14:03:50,969 StorageService.java:2292 - 
> Node /10.0.225.147 state jump to NORMAL}}
>  {{INFO [HANDSHAKE-/10.0.225.147] 2018-04-27 14:03:50,976 
> OutboundTcpConnection.java:560 - Handshaking version with /10.0.225.147}}
>  {{INFO [GossipStage:1] 2018-04-27 14:03:50,977 TokenMetadata.java:479 - 
> Updating topology for /10.0.225.147}}
>  {{INFO [GossipStage:1] 2018-04-27 14:03:50,977 TokenMetadata.java:479 - 
> Updating topology for /10.0.225.147}}
>  
> (note that despite the Gossip seeing the DOWN node as 'UP', nodetool status 
> still shows it as 'DOWN', as markAlive has not completed, and will not 
> actually be seen as 'UP' for 20 more minutes)
>  
> The restarted node is repeatedly sending Echo messages to the DOWN node as 
> part of the `markAlive()` call. The DOWN node is receiving those, and claims 
> to be sending a response. However, the restarted node is not marking the DOWN 
> node as UP even after the DOWN node sends the Echo response.
>  
> Relevant logs from the restarted node's POV:
>  
> {{TRACE [GossipStage:1] 2018-04-27 14:11:28,792 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 99248@/10.0.225.147}}
> {{TRACE [GossipTasks:1] 2018-04-27 14:11:29,792 MessagingService.java:945 - 
> 10.0.103.45 sending GOSSIP_DIGEST_SYN to 99631@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:29,792 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 99632@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:29,793 MessagingService.java:945 - 
> 10.0.103.45 sending GOSSIP_DIGEST_ACK2 to 99633@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:29,793 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 99635@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:31,794 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 100348@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:33,750 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 101157@/10.0.225.147}}
> {{TRACE [GossipStage:1] 2018-04-27 14:11:35,412 MessagingService.java:945 - 
> 10.0.103.45 sending ECHO to 101753@/10.0.225.147}}
>  
>  
> Relevant logs from the DOWN node's POV:
>  
> {{TRACE [GossipStage:1] 2018-04-27 14:18:16,500 EchoVerbHandler.java:39 - 
> Sending a EchoMessage reply /10.0.103.45}}
>  {{TRACE [GossipStage:1] 2018-04-27 14:18:16,500 MessagingService.java:945 - 
> 10.0.225.147 sending REQUEST_RESPONSE to 328389@/10.0.103.45}}
> {{TRACE [GossipStage:1] 2018-04-27 14:18:17,679 EchoVerbHandler.java:39 - 
> Sending a EchoMessage reply /10.0.103.45}}
>  {{TRACE [GossipStage:1] 2018-04-27 14:18:17,679 MessagingService.java:945 - 
> 10.0.225.147 sending REQUEST_RESPONSE to 329412@/10.0.103.45}}
> {{TRACE [GossipStage:1] 2018-04-27