[jira] [Comment Edited] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2018-01-02 Thread SathishKumar Alwar (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308966#comment-16308966
 ] 

SathishKumar Alwar edited comment on CASSANDRA-9630 at 1/3/18 1:25 AM:
---

Is there a plan to fix this issue, we are observing the same behavior in 3.9 
version. We have 3 node Cassandra cluster running in 3 VMs. When we reboot one 
of the node (say VM1), we noticed socket connections on the other nodes (say 
VM2, VM3) are still in CLOSE_WAIT state, hence when we start Cassandra on the 
rebooted node it is not able to join the cluster. We observed nodetool status 
returning "UN" for itself and "DN" for other 2 nodes, however after 5-20 
minutes we notice "Connection Timeout" exception in debug.log on the other 2 
nodes (VM2 and VM3) and new socket connection being established and they are 
able to join the cluster.



was (Author: sathish_alwar):
Is there a plan to fix this issue, we are observing the same behavior.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2016-07-27 Thread Farzad Panahi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396756#comment-15396756
 ] 

Farzad Panahi edited comment on CASSANDRA-9630 at 7/28/16 1:14 AM:
---

I am experiencing a similar problem. 

Cassandra version: 3.0.8
Environment: Amazon EC2

Error Case:
When I restart Cassandra service on a node, after the node comes up it sees 
some or all of other nodes as DN even though other nodes see this node as UN. 

Here is the output of netstat and nodetool status for this error case:

1. right after stopping cassandra service on node 10.4.68.222:
{code}
--
ip-10-4-54-176
tcp0  0 10.4.54.176:51268   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.176:56135   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.54.176:52372   10.4.68.222:7000
TIME_WAIT   
--
--
ip-10-4-54-177
tcp0  0 10.4.54.177:56960   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.177:54539   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.177:32823   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.54.177:48985   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-68-222
tcp0  0 10.4.68.222:700010.4.54.176:43697   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.54.177:48985   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.222:54419   
TIME_WAIT   
tcp0  0 10.4.68.222:700010.4.43.65:43197
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.221:44149   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.222:41302   
TIME_WAIT   
tcp0  0 10.4.68.222:700010.4.43.66:54321
FIN_WAIT2   
--
--
ip-10-4-68-221
tcp0  0 10.4.68.221:49599   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.68.221:55033   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.68.221:51628   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.68.221:44149   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-66
tcp0  0 10.4.43.66:5593010.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.43.66:5432110.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.43.66:6096810.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.66:4908710.4.68.222:7000
TIME_WAIT   
--
--
ip-10-4-43-65
tcp1  0 10.4.43.65:4319710.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.43.65:3646710.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.65:5331710.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.65:5489710.4.68.222:7000
TIME_WAIT   
--
{code}

2. a bit after stopping cassandra service on node 10.4.68.222:
{code}
--
ip-10-4-54-176
tcp1  0 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-54-177
--
--
ip-10-4-68-222
--
--
ip-10-4-68-221
tcp1  0 10.4.68.221:44149   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-66
tcp1  0 10.4.43.66:5432110.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-65
tcp1  0 10.4.43.65:4319710.4.68.222:7000
CLOSE_WAIT  
--
{code}

3. after starting cassandra service on node 10.4.68.222: 
{code}
--
ip-10-4-54-176
tcp0  0 10.4.54.176:42460   10.4.68.222:7000
ESTABLISHED 
tcp1 303403 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
tcp0