[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated
[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549259#comment-16549259 ] ASF GitHub Bot commented on IGNITE-8922: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/4349 > Discovery message delivery guarantee can be violated > > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Denis Mekhanikov >Assignee: Denis Mekhanikov >Priority: Critical > Fix For: 2.7 > > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated
[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549234#comment-16549234 ] Yakov Zhdanov commented on IGNITE-8922: --- Changes look good to me. Thanks! > Discovery message delivery guarantee can be violated > > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Denis Mekhanikov >Assignee: Denis Mekhanikov >Priority: Critical > Fix For: 2.7 > > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated
[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546178#comment-16546178 ] Denis Mekhanikov commented on IGNITE-8922: -- [~dkarachentsev], losing discovery messages is much more harmful, since it may lead to the whole cluster being stuck. On the other hand, if a node fails with OOME, then it's only a problem of one node. And in order to make it happen, discard messages should not be delivered for a really long time, which is quite unlikely. So, I think, that nodes should either guarantee delivery of all discovery messages, that are passed to them, or die. > Discovery message delivery guarantee can be violated > > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Denis Mekhanikov >Assignee: Denis Mekhanikov >Priority: Critical > Fix For: 2.7 > > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated
[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545117#comment-16545117 ] Dmitry Karachentsev commented on IGNITE-8922: - [~dmekhanikov] in general it looks fine, but node is vulnerable for OOME or IllegalStateException, because pending message queue could grow unlimited now. Anyway, both cases: node hang due to lost messages or OOME are harmful, but last is less possible. I'm OK with this change so far. > Discovery message delivery guarantee can be violated > > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Denis Mekhanikov >Assignee: Denis Mekhanikov >Priority: Critical > Fix For: 2.7 > > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated
[ https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540020#comment-16540020 ] ASF GitHub Bot commented on IGNITE-8922: GitHub user dmekhanikov opened a pull request: https://github.com/apache/ignite/pull/4349 IGNITE-8922 Fix delivery of pending messages You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-8922 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/4349.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4349 commit d08c808c4766cfc7538a8fdffe4bcf9b014bfe30 Author: Denis Mekhanikov Date: 2018-07-04T08:44:02Z IGNITE-8922 add tests for pending messages delivery commit fcd100f9da4d6ff3f86971d9988c0cb5ea963603 Author: Denis Mekhanikov Date: 2018-07-06T12:46:53Z IGNITE-8922 check all ensured messages in test commit aea7d9b2e10aaf641daa5e1a74d3d289239e063f Author: Denis Mekhanikov Date: 2018-07-11T08:56:07Z IGNITE-8922 add custom messages to pending list in singleton cluster commit 3325960fa98d12deac5c1d2a275fefd37c80b871 Author: Denis Mekhanikov Date: 2018-07-11T10:04:40Z IGNITE-8922 don't drop undiscarded messages from PendingMessages commit b90687577d7f1b030c2609c6b0bb002b285616f0 Author: Denis Mekhanikov Date: 2018-07-11T10:45:20Z IGNITE-8922 add assertion messages to tests > Discovery message delivery guarantee can be violated > > > Key: IGNITE-8922 > URL: https://issues.apache.org/jira/browse/IGNITE-8922 > Project: Ignite > Issue Type: Bug >Reporter: Denis Mekhanikov >Assignee: Denis Mekhanikov >Priority: Critical > Attachments: PendingMessageResendTest.java > > > Under certain circumstances discovery messages may be delivered only to a > part of nodes. > It happens because pending messages are not resent due to data inconsistency > in {{ServerImpl#PendingMessages}} class. If {{discardId}} or > {{customDiscardId}} point to a message, that is not present in the queue, > then other messages will be skipped and won't be resent. It may happen, for > example, when queue in {{PendingMessages}} is overflown. > PFA test, that reproduces this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)