[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity

2018-12-08 Thread Stanilovsky Evgeny (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanilovsky Evgeny updated IGNITE-10469:

Attachment: GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java

> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---
>
> Key: IGNITE-10469
> URL: https://issues.apache.org/jira/browse/IGNITE-10469
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5, 2.6
>Reporter: Igor Kamyshnikov
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Attachments: 2.6.0.txt, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip
>
>
> TcpCommunicationSpi does not close TCP connections after they have been idle 
> for more than configured in TcpCommunicationSpi#idleConnTimeout amount of 
> time (default is 10 minutes).
> There are environments where idle TCP connections become unusable: 
> connections remain ESTABLISHED while actual data to be sent piles up in 
> Send-Q (according to netstat). For this reason Ignite stack does not 
> recognize a communication problem for a considerable amount of time (~ 10-15 
> minutes), and it does not begin its reconnection procedure (hearbeats use 
> different tcp connections that are not idle and don't have this issue).
> I've discovered though there is a logic in the Ignite code to detect and 
> close idle connections. But due to a problem in the code it does not work 
> reliably.
> This is a test that _sometimes_ reproduces the problem.
> [^ignite_idle_test.zip] - full test project
> [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
>  [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0
> What's the problem in the Ignite code?
> There are two loops in the Ignite code that have a chance to close idle 
> connections:
> 1) 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
>  - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
> close idle connections but it typically turns out that it thinks that 
> connection is not idle, thanks to the second loop).
> 2) 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
>  -> 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
>  - this loop executes:
> {noformat}
> filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
> connection
> // Update timestamp to avoid multiple notifications within one timeout 
> interval.
> ses.resetSendScheduleTime(); <--- resets idle timer
> ses.bytesReceived(0);
> {noformat}
> ---
> To wind up, may be the whole approach should be reviewed:
>  - is it ok not to track message delivery time?
>  - is it ok not to do heartbeating using the same connections as for 
> get/put/... commands?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity

2018-11-29 Thread Igor Kamyshnikov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Kamyshnikov updated IGNITE-10469:
--
Attachment: 2.6.0.txt

> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---
>
> Key: IGNITE-10469
> URL: https://issues.apache.org/jira/browse/IGNITE-10469
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5, 2.6
>Reporter: Igor Kamyshnikov
>Priority: Major
> Attachments: 2.6.0.txt, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip
>
>
> TcpCommunicationSpi does not close TCP connections after they have been idle 
> for more than configured in TcpCommunicationSpi#idleConnTimeout amount of 
> time (default is 10 minutes).
> There are environments where idle TCP connections become unusable: 
> connections remain ESTABLISHED while actual data to be sent piles up in 
> Send-Q (according to netstat). For this reason Ignite stack does not 
> recognize a communication problem for a considerable amount of time (~ 10-15 
> minutes), and it does not begin its reconnection procedure (hearbeats use 
> different tcp connections that are not idle and don't have this issue).
> I've discovered though there is a logic in the Ignite code to detect and 
> close idle connections. But due to a problem in the code it does not work 
> reliably.
> This is a test that _sometimes_ reproduces the problem.
> [^ignite_idle_test.zip] - full test project
> [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
> What's the problem in the Ignite code?
> There are two loops in the Ignite code that have a chance to close idle 
> connections:
> 1) 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
>  - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
> close idle connections but it typically turns out that it thinks that 
> connection is not idle, thanks to the second loop).
> 2) 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
>  -> 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
>  - this loop executes:
> {noformat}
> filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
> connection
> // Update timestamp to avoid multiple notifications within one timeout 
> interval.
> ses.resetSendScheduleTime(); <--- resets idle timer
> ses.bytesReceived(0);
> {noformat}
> ---
> To wind up, may be the whole approach should be reviewed:
>  - is it ok not to track message delivery time?
>  - is it ok not to do heartbeating using the same connections as for 
> get/put/... commands?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity

2018-11-29 Thread Igor Kamyshnikov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Kamyshnikov updated IGNITE-10469:
--
Description: 
TcpCommunicationSpi does not close TCP connections after they have been idle 
for more than configured in TcpCommunicationSpi#idleConnTimeout amount of time 
(default is 10 minutes).

There are environments where idle TCP connections become unusable: connections 
remain ESTABLISHED while actual data to be sent piles up in Send-Q (according 
to netstat). For this reason Ignite stack does not recognize a communication 
problem for a considerable amount of time (~ 10-15 minutes), and it does not 
begin its reconnection procedure (hearbeats use different tcp connections that 
are not idle and don't have this issue).

I've discovered though there is a logic in the Ignite code to detect and close 
idle connections. But due to a problem in the code it does not work reliably.

This is a test that _sometimes_ reproduces the problem.
[^ignite_idle_test.zip] - full test project
[^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
 [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0

What's the problem in the Ignite code?

There are two loops in the Ignite code that have a chance to close idle 
connections:
1) 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
 - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
close idle connections but it typically turns out that it thinks that 
connection is not idle, thanks to the second loop).
2) 
org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
 -> 
org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
 - this loop executes:
{noformat}
filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
connection
// Update timestamp to avoid multiple notifications within one timeout interval.
ses.resetSendScheduleTime(); <--- resets idle timer
ses.bytesReceived(0);
{noformat}

---
To wind up, may be the whole approach should be reviewed:
 - is it ok not to track message delivery time?
 - is it ok not to do heartbeating using the same connections as for 
get/put/... commands?

  was:
TcpCommunicationSpi does not close TCP connections after they have been idle 
for more than configured in TcpCommunicationSpi#idleConnTimeout amount of time 
(default is 10 minutes).

There are environments where idle TCP connections become unusable: connections 
remain ESTABLISHED while actual data to be sent piles up in Send-Q (according 
to netstat). For this reason Ignite stack does not recognize a communication 
problem for a considerable amount of time (~ 10-15 minutes), and it does not 
begin its reconnection procedure (hearbeats use different tcp connections that 
are not idle and don't have this issue).

I've discovered though there is a logic in the Ignite code to detect and close 
idle connections. But due to a problem in the code it does not work reliably.

This is a test that _sometimes_ reproduces the problem.
[^ignite_idle_test.zip] - full test project
[^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code

What's the problem in the Ignite code?

There are two loops in the Ignite code that have a chance to close idle 
connections:
1) 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
 - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
close idle connections but it typically turns out that it thinks that 
connection is not idle, thanks to the second loop).
2) 
org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
 -> 
org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
 - this loop executes:
{noformat}
filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
connection
// Update timestamp to avoid multiple notifications within one timeout interval.
ses.resetSendScheduleTime(); <--- resets idle timer
ses.bytesReceived(0);
{noformat}

---
To wind up, may be the whole approach should be reviewed:
 - is it ok not to track message delivery time?
 - is it ok not to do heartbeating using the same connections as for 
get/put/... commands?


> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---
>
> Key: IGNITE-10469
> URL: https://issues.apache.org/jira/browse/IGNITE-10469
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5, 2.6
>Reporter: Igor Kamyshnikov
>Priority: Major
> Attachments: 2.6.0.txt, 
>