[jira] [Comment Edited] (CASSANDRA-9630) Killing cassandra process results in unclosed connections
[ https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308966#comment-16308966 ] SathishKumar Alwar edited comment on CASSANDRA-9630 at 1/3/18 1:25 AM: --- Is there a plan to fix this issue, we are observing the same behavior in 3.9 version. We have 3 node Cassandra cluster running in 3 VMs. When we reboot one of the node (say VM1), we noticed socket connections on the other nodes (say VM2, VM3) are still in CLOSE_WAIT state, hence when we start Cassandra on the rebooted node it is not able to join the cluster. We observed nodetool status returning "UN" for itself and "DN" for other 2 nodes, however after 5-20 minutes we notice "Connection Timeout" exception in debug.log on the other 2 nodes (VM2 and VM3) and new socket connection being established and they are able to join the cluster. was (Author: sathish_alwar): Is there a plan to fix this issue, we are observing the same behavior. > Killing cassandra process results in unclosed connections > - > > Key: CASSANDRA-9630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9630 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata, Streaming and Messaging >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Minor > Fix For: 3.11.x > > Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar > > > After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a > cassandra process (with SIGTERM), some other nodes maintained a connection > with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 > minutes. > So, when we started the killed node again, other nodes could not establish a > handshake because of the connections on the CLOSE_WAIT state, so they > remained on the DOWN state to each other until the initial connection expired. > The problem did not happen if I ran a nodetool disablegossip before killing > the node. > I was able to fix this issue by reverting the CASSANDRA-8336 commits > (including CASSANDRA-9238). After reverting this, cassandra now closes > connection correctly when killed with -TERM, but leaves connections on > CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes. > I did not try to reproduce the problem in a clean environment. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-9630) Killing cassandra process results in unclosed connections
[ https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396756#comment-15396756 ] Farzad Panahi edited comment on CASSANDRA-9630 at 7/28/16 1:14 AM: --- I am experiencing a similar problem. Cassandra version: 3.0.8 Environment: Amazon EC2 Error Case: When I restart Cassandra service on a node, after the node comes up it sees some or all of other nodes as DN even though other nodes see this node as UN. Here is the output of netstat and nodetool status for this error case: 1. right after stopping cassandra service on node 10.4.68.222: {code} -- ip-10-4-54-176 tcp0 0 10.4.54.176:51268 10.4.68.222:7000 TIME_WAIT tcp0 0 10.4.54.176:56135 10.4.68.222:7000 TIME_WAIT tcp1 0 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT tcp0 0 10.4.54.176:52372 10.4.68.222:7000 TIME_WAIT -- -- ip-10-4-54-177 tcp0 0 10.4.54.177:56960 10.4.68.222:7000 TIME_WAIT tcp0 0 10.4.54.177:54539 10.4.68.222:7000 TIME_WAIT tcp0 0 10.4.54.177:32823 10.4.68.222:7000 TIME_WAIT tcp1 0 10.4.54.177:48985 10.4.68.222:7000 CLOSE_WAIT -- -- ip-10-4-68-222 tcp0 0 10.4.68.222:700010.4.54.176:43697 FIN_WAIT2 tcp0 0 10.4.68.222:700010.4.54.177:48985 FIN_WAIT2 tcp0 0 10.4.68.222:700010.4.68.222:54419 TIME_WAIT tcp0 0 10.4.68.222:700010.4.43.65:43197 FIN_WAIT2 tcp0 0 10.4.68.222:700010.4.68.221:44149 FIN_WAIT2 tcp0 0 10.4.68.222:700010.4.68.222:41302 TIME_WAIT tcp0 0 10.4.68.222:700010.4.43.66:54321 FIN_WAIT2 -- -- ip-10-4-68-221 tcp0 0 10.4.68.221:49599 10.4.68.222:7000 TIME_WAIT tcp0 0 10.4.68.221:55033 10.4.68.222:7000 TIME_WAIT tcp0 0 10.4.68.221:51628 10.4.68.222:7000 TIME_WAIT tcp1 0 10.4.68.221:44149 10.4.68.222:7000 CLOSE_WAIT -- -- ip-10-4-43-66 tcp0 0 10.4.43.66:5593010.4.68.222:7000 TIME_WAIT tcp1 0 10.4.43.66:5432110.4.68.222:7000 CLOSE_WAIT tcp0 0 10.4.43.66:6096810.4.68.222:7000 TIME_WAIT tcp0 0 10.4.43.66:4908710.4.68.222:7000 TIME_WAIT -- -- ip-10-4-43-65 tcp1 0 10.4.43.65:4319710.4.68.222:7000 CLOSE_WAIT tcp0 0 10.4.43.65:3646710.4.68.222:7000 TIME_WAIT tcp0 0 10.4.43.65:5331710.4.68.222:7000 TIME_WAIT tcp0 0 10.4.43.65:5489710.4.68.222:7000 TIME_WAIT -- {code} 2. a bit after stopping cassandra service on node 10.4.68.222: {code} -- ip-10-4-54-176 tcp1 0 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT -- -- ip-10-4-54-177 -- -- ip-10-4-68-222 -- -- ip-10-4-68-221 tcp1 0 10.4.68.221:44149 10.4.68.222:7000 CLOSE_WAIT -- -- ip-10-4-43-66 tcp1 0 10.4.43.66:5432110.4.68.222:7000 CLOSE_WAIT -- -- ip-10-4-43-65 tcp1 0 10.4.43.65:4319710.4.68.222:7000 CLOSE_WAIT -- {code} 3. after starting cassandra service on node 10.4.68.222: {code} -- ip-10-4-54-176 tcp0 0 10.4.54.176:42460 10.4.68.222:7000 ESTABLISHED tcp1 303403 10.4.54.176:43697 10.4.68.222:7000 CLOSE_WAIT tcp0