[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549259#comment-16549259
 ] 

ASF GitHub Bot commented on IGNITE-8922:


Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/4349


> Discovery message delivery guarantee can be violated
> 
>
> Key: IGNITE-8922
> URL: https://issues.apache.org/jira/browse/IGNITE-8922
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Critical
> Fix For: 2.7
>
> Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated

2018-07-19 Thread Yakov Zhdanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549234#comment-16549234
 ] 

Yakov Zhdanov commented on IGNITE-8922:
---

Changes look good to me.

Thanks!

> Discovery message delivery guarantee can be violated
> 
>
> Key: IGNITE-8922
> URL: https://issues.apache.org/jira/browse/IGNITE-8922
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Critical
> Fix For: 2.7
>
> Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated

2018-07-17 Thread Denis Mekhanikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546178#comment-16546178
 ] 

Denis Mekhanikov commented on IGNITE-8922:
--

[~dkarachentsev], losing discovery messages is much more harmful, since it may 
lead to the whole cluster being stuck.

On the other hand, if a node fails with OOME, then it's only a problem of one 
node. And in order to make it happen, discard messages should not be delivered 
for a really long time, which is quite unlikely.

So, I think, that nodes should either guarantee delivery of all discovery 
messages, that are passed to them, or die.

> Discovery message delivery guarantee can be violated
> 
>
> Key: IGNITE-8922
> URL: https://issues.apache.org/jira/browse/IGNITE-8922
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Critical
> Fix For: 2.7
>
> Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated

2018-07-16 Thread Dmitry Karachentsev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545117#comment-16545117
 ] 

Dmitry Karachentsev commented on IGNITE-8922:
-

[~dmekhanikov] in general it looks fine, but node is vulnerable for OOME or 
IllegalStateException, because pending message queue could grow unlimited now. 
Anyway, both cases: node hang due to lost messages or OOME are harmful, but 
last is less possible. I'm OK with this change so far.

> Discovery message delivery guarantee can be violated
> 
>
> Key: IGNITE-8922
> URL: https://issues.apache.org/jira/browse/IGNITE-8922
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Critical
> Fix For: 2.7
>
> Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8922) Discovery message delivery guarantee can be violated

2018-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540020#comment-16540020
 ] 

ASF GitHub Bot commented on IGNITE-8922:


GitHub user dmekhanikov opened a pull request:

https://github.com/apache/ignite/pull/4349

IGNITE-8922 Fix delivery of pending messages



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8922

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4349


commit d08c808c4766cfc7538a8fdffe4bcf9b014bfe30
Author: Denis Mekhanikov 
Date:   2018-07-04T08:44:02Z

IGNITE-8922 add tests for pending messages delivery

commit fcd100f9da4d6ff3f86971d9988c0cb5ea963603
Author: Denis Mekhanikov 
Date:   2018-07-06T12:46:53Z

IGNITE-8922 check all ensured messages in test

commit aea7d9b2e10aaf641daa5e1a74d3d289239e063f
Author: Denis Mekhanikov 
Date:   2018-07-11T08:56:07Z

IGNITE-8922 add custom messages to pending list in singleton cluster

commit 3325960fa98d12deac5c1d2a275fefd37c80b871
Author: Denis Mekhanikov 
Date:   2018-07-11T10:04:40Z

IGNITE-8922 don't drop undiscarded messages from PendingMessages

commit b90687577d7f1b030c2609c6b0bb002b285616f0
Author: Denis Mekhanikov 
Date:   2018-07-11T10:45:20Z

IGNITE-8922 add assertion messages to tests




> Discovery message delivery guarantee can be violated
> 
>
> Key: IGNITE-8922
> URL: https://issues.apache.org/jira/browse/IGNITE-8922
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Critical
> Attachments: PendingMessageResendTest.java
>
>
> Under certain circumstances discovery messages may be delivered only to a 
> part of nodes.
> It happens because pending messages are not resent due to data inconsistency 
> in {{ServerImpl#PendingMessages}} class. If {{discardId}} or 
> {{customDiscardId}} point to a message, that is not present in the queue, 
> then other messages will be skipped and won't be resent. It may happen, for 
> example, when queue in {{PendingMessages}} is overflown.
> PFA test, that reproduces this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)