[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity
[ https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanilovsky Evgeny updated IGNITE-10469: Attachment: GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java > TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout > seconds of inactivity > --- > > Key: IGNITE-10469 > URL: https://issues.apache.org/jira/browse/IGNITE-10469 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5, 2.6 >Reporter: Igor Kamyshnikov >Assignee: Stanilovsky Evgeny >Priority: Major > Attachments: 2.6.0.txt, > GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, > GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip > > > TcpCommunicationSpi does not close TCP connections after they have been idle > for more than configured in TcpCommunicationSpi#idleConnTimeout amount of > time (default is 10 minutes). > There are environments where idle TCP connections become unusable: > connections remain ESTABLISHED while actual data to be sent piles up in > Send-Q (according to netstat). For this reason Ignite stack does not > recognize a communication problem for a considerable amount of time (~ 10-15 > minutes), and it does not begin its reconnection procedure (hearbeats use > different tcp connections that are not idle and don't have this issue). > I've discovered though there is a logic in the Ignite code to detect and > close idle connections. But due to a problem in the code it does not work > reliably. > This is a test that _sometimes_ reproduces the problem. > [^ignite_idle_test.zip] - full test project > [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code > [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0 > What's the problem in the Ignite code? > There are two loops in the Ignite code that have a chance to close idle > connections: > 1) > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle > - this one is executed each *IdleConnectionTimeout* milliseconds. (it can > close idle connections but it typically turns out that it thinks that > connection is not idle, thanks to the second loop). > 2) > org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal > -> > org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle > - this loop executes: > {noformat} > filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle > connection > // Update timestamp to avoid multiple notifications within one timeout > interval. > ses.resetSendScheduleTime(); <--- resets idle timer > ses.bytesReceived(0); > {noformat} > --- > To wind up, may be the whole approach should be reviewed: > - is it ok not to track message delivery time? > - is it ok not to do heartbeating using the same connections as for > get/put/... commands? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity
[ https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Kamyshnikov updated IGNITE-10469: -- Attachment: 2.6.0.txt > TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout > seconds of inactivity > --- > > Key: IGNITE-10469 > URL: https://issues.apache.org/jira/browse/IGNITE-10469 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5, 2.6 >Reporter: Igor Kamyshnikov >Priority: Major > Attachments: 2.6.0.txt, > GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip > > > TcpCommunicationSpi does not close TCP connections after they have been idle > for more than configured in TcpCommunicationSpi#idleConnTimeout amount of > time (default is 10 minutes). > There are environments where idle TCP connections become unusable: > connections remain ESTABLISHED while actual data to be sent piles up in > Send-Q (according to netstat). For this reason Ignite stack does not > recognize a communication problem for a considerable amount of time (~ 10-15 > minutes), and it does not begin its reconnection procedure (hearbeats use > different tcp connections that are not idle and don't have this issue). > I've discovered though there is a logic in the Ignite code to detect and > close idle connections. But due to a problem in the code it does not work > reliably. > This is a test that _sometimes_ reproduces the problem. > [^ignite_idle_test.zip] - full test project > [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code > What's the problem in the Ignite code? > There are two loops in the Ignite code that have a chance to close idle > connections: > 1) > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle > - this one is executed each *IdleConnectionTimeout* milliseconds. (it can > close idle connections but it typically turns out that it thinks that > connection is not idle, thanks to the second loop). > 2) > org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal > -> > org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle > - this loop executes: > {noformat} > filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle > connection > // Update timestamp to avoid multiple notifications within one timeout > interval. > ses.resetSendScheduleTime(); <--- resets idle timer > ses.bytesReceived(0); > {noformat} > --- > To wind up, may be the whole approach should be reviewed: > - is it ok not to track message delivery time? > - is it ok not to do heartbeating using the same connections as for > get/put/... commands? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10469) TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout seconds of inactivity
[ https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Kamyshnikov updated IGNITE-10469: -- Description: TcpCommunicationSpi does not close TCP connections after they have been idle for more than configured in TcpCommunicationSpi#idleConnTimeout amount of time (default is 10 minutes). There are environments where idle TCP connections become unusable: connections remain ESTABLISHED while actual data to be sent piles up in Send-Q (according to netstat). For this reason Ignite stack does not recognize a communication problem for a considerable amount of time (~ 10-15 minutes), and it does not begin its reconnection procedure (hearbeats use different tcp connections that are not idle and don't have this issue). I've discovered though there is a logic in the Ignite code to detect and close idle connections. But due to a problem in the code it does not work reliably. This is a test that _sometimes_ reproduces the problem. [^ignite_idle_test.zip] - full test project [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0 What's the problem in the Ignite code? There are two loops in the Ignite code that have a chance to close idle connections: 1) org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle - this one is executed each *IdleConnectionTimeout* milliseconds. (it can close idle connections but it typically turns out that it thinks that connection is not idle, thanks to the second loop). 2) org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal -> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle - this loop executes: {noformat} filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle connection // Update timestamp to avoid multiple notifications within one timeout interval. ses.resetSendScheduleTime(); <--- resets idle timer ses.bytesReceived(0); {noformat} --- To wind up, may be the whole approach should be reviewed: - is it ok not to track message delivery time? - is it ok not to do heartbeating using the same connections as for get/put/... commands? was: TcpCommunicationSpi does not close TCP connections after they have been idle for more than configured in TcpCommunicationSpi#idleConnTimeout amount of time (default is 10 minutes). There are environments where idle TCP connections become unusable: connections remain ESTABLISHED while actual data to be sent piles up in Send-Q (according to netstat). For this reason Ignite stack does not recognize a communication problem for a considerable amount of time (~ 10-15 minutes), and it does not begin its reconnection procedure (hearbeats use different tcp connections that are not idle and don't have this issue). I've discovered though there is a logic in the Ignite code to detect and close idle connections. But due to a problem in the code it does not work reliably. This is a test that _sometimes_ reproduces the problem. [^ignite_idle_test.zip] - full test project [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code What's the problem in the Ignite code? There are two loops in the Ignite code that have a chance to close idle connections: 1) org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle - this one is executed each *IdleConnectionTimeout* milliseconds. (it can close idle connections but it typically turns out that it thinks that connection is not idle, thanks to the second loop). 2) org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal -> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle - this loop executes: {noformat} filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle connection // Update timestamp to avoid multiple notifications within one timeout interval. ses.resetSendScheduleTime(); <--- resets idle timer ses.bytesReceived(0); {noformat} --- To wind up, may be the whole approach should be reviewed: - is it ok not to track message delivery time? - is it ok not to do heartbeating using the same connections as for get/put/... commands? > TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout > seconds of inactivity > --- > > Key: IGNITE-10469 > URL: https://issues.apache.org/jira/browse/IGNITE-10469 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5, 2.6 >Reporter: Igor Kamyshnikov >Priority: Major > Attachments: 2.6.0.txt, >