[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2018-01-12 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323944#comment-16323944
 ] 

Robert Stupp commented on CASSANDRA-9630:
-

+1 it should fix the issue.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2018-01-10 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320733#comment-16320733
 ] 

Paulo Motta commented on CASSANDRA-9630:


Even though we didn't hear back from someone who tested the patch, I'm quite 
confident this will fix the hanging sockets problem, so I will set this to 
patch available.

Would you mind having a look [~snazy]? Patch 
[here|https://github.com/pauloricardomg/cassandra/tree/3.0-9630]. Submitted CI, 
will update after results.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2018-01-02 Thread SathishKumar Alwar (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308966#comment-16308966
 ] 

SathishKumar Alwar commented on CASSANDRA-9630:
---

Is there a plan to fix this issue, we are observing the same behavior.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2016-07-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398338#comment-15398338
 ] 

Paulo Motta commented on CASSANDRA-9630:


I noticed we're not closing the socket [if there is an exception while 
connecting|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L496]
 on {{OutboundTcpConnection}}, so a race during a node's shutdown might cause a 
failed connection attempt to that node remain in the {{CLOSE_WAIT}} state until 
the next GC, which could potentially cause this.

[~farzad.panahi] Are you willing to try out [this 
patch|https://github.com/pauloricardomg/cassandra/commit/3f46d414b06afb607b6a97152661b10c53c103e6]
 to see if it fixes it? You need to replace your 
{{lib/apache-cassandra-3.0.8.jar}} with 
[apache-cassandra-3.0.8-SNAPSHOT.jar|https://issues.apache.org/jira/secure/attachment/12820814/apache-cassandra-3.0.8-SNAPSHOT.jar]
 and perform a rolling restart on some of the nodes and check if this will fix 
the issue in these nodes (if you prefer you can generate your own jar by 
cloning [this 
branch|https://github.com/pauloricardomg/cassandra/tree/3.0.6-9630] and running 
{{ant clean jar}}).

If this does not solve it, it would be nice if you could set the logging level 
of the {{org.apache.cassandra.net}} package to {{TRACE}}, either via {{nodetool 
setlogginglevel org.apache.cassandra.net TRACE}} or by adding {{}} to the end of your 
{{conf/logback.xml}}. After this, please attach the relevant information in the 
logs of affected nodes to this ticket for further analysis.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.x
>
> Attachments: apache-cassandra-3.0.8-SNAPSHOT.jar
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2016-07-27 Thread Farzad Panahi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396756#comment-15396756
 ] 

Farzad Panahi commented on CASSANDRA-9630:
--

I am experiencing similar issue. 

Cassandra version: 3.0.8
Environment: Amazon EC2

Error Case:
When I restart Cassandra service on a node, after the node comes up it sees 
some or all of other nodes as DN even though other nodes see this node as UN. 

Here is the output of netstat and nodetool status for this error case:

1. right after stopping cassandra service on node 10.4.68.222:
{code}
--
ip-10-4-54-176
tcp0  0 10.4.54.176:51268   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.176:56135   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.54.176:52372   10.4.68.222:7000
TIME_WAIT   
--
--
ip-10-4-54-177
tcp0  0 10.4.54.177:56960   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.177:54539   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.54.177:32823   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.54.177:48985   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-68-222
tcp0  0 10.4.68.222:700010.4.54.176:43697   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.54.177:48985   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.222:54419   
TIME_WAIT   
tcp0  0 10.4.68.222:700010.4.43.65:43197
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.221:44149   
FIN_WAIT2   
tcp0  0 10.4.68.222:700010.4.68.222:41302   
TIME_WAIT   
tcp0  0 10.4.68.222:700010.4.43.66:54321
FIN_WAIT2   
--
--
ip-10-4-68-221
tcp0  0 10.4.68.221:49599   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.68.221:55033   10.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.68.221:51628   10.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.68.221:44149   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-66
tcp0  0 10.4.43.66:5593010.4.68.222:7000
TIME_WAIT   
tcp1  0 10.4.43.66:5432110.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.43.66:6096810.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.66:4908710.4.68.222:7000
TIME_WAIT   
--
--
ip-10-4-43-65
tcp1  0 10.4.43.65:4319710.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.43.65:3646710.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.65:5331710.4.68.222:7000
TIME_WAIT   
tcp0  0 10.4.43.65:5489710.4.68.222:7000
TIME_WAIT   
--
{code}

2. a bit after stopping cassandra service on node 10.4.68.222:
{code}
--
ip-10-4-54-176
tcp1  0 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-54-177
--
--
ip-10-4-68-222
--
--
ip-10-4-68-221
tcp1  0 10.4.68.221:44149   10.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-66
tcp1  0 10.4.43.66:5432110.4.68.222:7000
CLOSE_WAIT  
--
--
ip-10-4-43-65
tcp1  0 10.4.43.65:4319710.4.68.222:7000
CLOSE_WAIT  
--
{code}

3. after starting cassandra service on node 10.4.68.222: 
{code}
--
ip-10-4-54-176
tcp0  0 10.4.54.176:42460   10.4.68.222:7000
ESTABLISHED 
tcp1 303403 10.4.54.176:43697   10.4.68.222:7000
CLOSE_WAIT  
tcp0  0 10.4.54.176:42109   10.4.68.222:7000   

[jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections

2016-05-11 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280325#comment-15280325
 ] 

T Jake Luciani commented on CASSANDRA-9630:
---

This is likely the lack of MessageService not setting SO_LINGER to 0 on the 
sockets.

> Killing cassandra process results in unclosed connections
> -
>
> Key: CASSANDRA-9630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata, Streaming and Messaging
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.x
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a 
> cassandra process (with SIGTERM), some other nodes maintained a connection 
> with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 
> minutes.
> So, when we started the killed node again, other nodes could not establish a 
> handshake because of the connections on the CLOSE_WAIT state, so they 
> remained on the DOWN state to each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing 
> the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits 
> (including CASSANDRA-9238). After reverting this, cassandra now closes 
> connection correctly when killed with -TERM, but leaves connections on 
> CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)