[jira] [Commented] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity

2018-12-08 Thread Stanilovsky Evgeny (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713611#comment-16713611
 ] 

Stanilovsky Evgeny commented on IGNITE-10469:
-

Igor, i recheck your test under 2.7 ver and looks like it work corrctly, can 
you recheck it ?

[^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java]

> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---
>
> Key: IGNITE-10469
> URL: https://issues.apache.org/jira/browse/IGNITE-10469
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5, 2.6
>Reporter: Igor Kamyshnikov
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Attachments: 2.6.0.txt, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip
>
>
> TcpCommunicationSpi does not close TCP connections after they have been idle 
> for more than configured in TcpCommunicationSpi#idleConnTimeout amount of 
> time (default is 10 minutes).
> There are environments where idle TCP connections become unusable: 
> connections remain ESTABLISHED while actual data to be sent piles up in 
> Send-Q (according to netstat). For this reason Ignite stack does not 
> recognize a communication problem for a considerable amount of time (~ 10-15 
> minutes), and it does not begin its reconnection procedure (hearbeats use 
> different tcp connections that are not idle and don't have this issue).
> I've discovered though there is a logic in the Ignite code to detect and 
> close idle connections. But due to a problem in the code it does not work 
> reliably.
> This is a test that _sometimes_ reproduces the problem.
> [^ignite_idle_test.zip] - full test project
> [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
>  [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0
> What's the problem in the Ignite code?
> There are two loops in the Ignite code that have a chance to close idle 
> connections:
> 1) 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
>  - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
> close idle connections but it typically turns out that it thinks that 
> connection is not idle, thanks to the second loop).
> 2) 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
>  -> 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
>  - this loop executes:
> {noformat}
> filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
> connection
> // Update timestamp to avoid multiple notifications within one timeout 
> interval.
> ses.resetSendScheduleTime(); <--- resets idle timer
> ses.bytesReceived(0);
> {noformat}
> ---
> To wind up, may be the whole approach should be reviewed:
>  - is it ok not to track message delivery time?
>  - is it ok not to do heartbeating using the same connections as for 
> get/put/... commands?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity

2018-12-03 Thread Stanilovsky Evgeny (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707078#comment-16707078
 ] 

Stanilovsky Evgeny commented on IGNITE-10469:
-

thanks Igor, i`l take a look as soon as possible.

> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---
>
> Key: IGNITE-10469
> URL: https://issues.apache.org/jira/browse/IGNITE-10469
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5, 2.6
>Reporter: Igor Kamyshnikov
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Attachments: 2.6.0.txt, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip
>
>
> TcpCommunicationSpi does not close TCP connections after they have been idle 
> for more than configured in TcpCommunicationSpi#idleConnTimeout amount of 
> time (default is 10 minutes).
> There are environments where idle TCP connections become unusable: 
> connections remain ESTABLISHED while actual data to be sent piles up in 
> Send-Q (according to netstat). For this reason Ignite stack does not 
> recognize a communication problem for a considerable amount of time (~ 10-15 
> minutes), and it does not begin its reconnection procedure (hearbeats use 
> different tcp connections that are not idle and don't have this issue).
> I've discovered though there is a logic in the Ignite code to detect and 
> close idle connections. But due to a problem in the code it does not work 
> reliably.
> This is a test that _sometimes_ reproduces the problem.
> [^ignite_idle_test.zip] - full test project
> [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
>  [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0
> What's the problem in the Ignite code?
> There are two loops in the Ignite code that have a chance to close idle 
> connections:
> 1) 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
>  - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
> close idle connections but it typically turns out that it thinks that 
> connection is not idle, thanks to the second loop).
> 2) 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
>  -> 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
>  - this loop executes:
> {noformat}
> filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
> connection
> // Update timestamp to avoid multiple notifications within one timeout 
> interval.
> ses.resetSendScheduleTime(); <--- resets idle timer
> ses.bytesReceived(0);
> {noformat}
> ---
> To wind up, may be the whole approach should be reviewed:
>  - is it ok not to track message delivery time?
>  - is it ok not to do heartbeating using the same connections as for 
> get/put/... commands?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)