[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used

2019-02-17 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770647#comment-16770647
 ] 

Dmitriy Govorukhin commented on IGNITE-10898:
-

Merged to master.

> Exchange coordinator failover breaks in some cases when node filter is used
> ---
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
> Attachments: NodeWithFilterRestartTest.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently if a node does not pass cache node filter, we do not store this 
> cache affinity on the node unless the node is coordinator. This, however, may 
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which 
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends 
> partitions single message, but there are two problems here. First, exchange 
> fast-reply does not wait for the new affinity initialization which results in 
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead 
> either to deadlock or to incorrectly fetched affinity (basically, coordinator 
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not 
> passing the filter. In this case, there will be no need to fetch and 
> recalculate affinity ({{initCoordinatorCaches}} will go away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used

2019-02-17 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770646#comment-16770646
 ] 

Ignite TC Bot commented on IGNITE-10898:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3094509buildTypeId=IgniteTests24Java8_RunAll]

> Exchange coordinator failover breaks in some cases when node filter is used
> ---
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
> Attachments: NodeWithFilterRestartTest.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently if a node does not pass cache node filter, we do not store this 
> cache affinity on the node unless the node is coordinator. This, however, may 
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which 
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends 
> partitions single message, but there are two problems here. First, exchange 
> fast-reply does not wait for the new affinity initialization which results in 
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead 
> either to deadlock or to incorrectly fetched affinity (basically, coordinator 
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not 
> passing the filter. In this case, there will be no need to fetch and 
> recalculate affinity ({{initCoordinatorCaches}} will go away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used

2019-02-12 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765958#comment-16765958
 ] 

Ignite TC Bot commented on IGNITE-10898:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3060727buildTypeId=IgniteTests24Java8_RunAll]

> Exchange coordinator failover breaks in some cases when node filter is used
> ---
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
> Attachments: NodeWithFilterRestartTest.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently if a node does not pass cache node filter, we do not store this 
> cache affinity on the node unless the node is coordinator. This, however, may 
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which 
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends 
> partitions single message, but there are two problems here. First, exchange 
> fast-reply does not wait for the new affinity initialization which results in 
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead 
> either to deadlock or to incorrectly fetched affinity (basically, coordinator 
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not 
> passing the filter. In this case, there will be no need to fetch and 
> recalculate affinity ({{initCoordinatorCaches}} will go away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)