[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used
[ https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770647#comment-16770647 ] Dmitriy Govorukhin commented on IGNITE-10898: - Merged to master. > Exchange coordinator failover breaks in some cases when node filter is used > --- > > Key: IGNITE-10898 > URL: https://issues.apache.org/jira/browse/IGNITE-10898 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Assignee: Dmitriy Govorukhin >Priority: Critical > Fix For: 2.8 > > Attachments: NodeWithFilterRestartTest.java > > Time Spent: 20m > Remaining Estimate: 0h > > Currently if a node does not pass cache node filter, we do not store this > cache affinity on the node unless the node is coordinator. This, however, may > fail in the following scenario: > 1) A node passing node filter joins cluster > 2) During the join coordinator fails, new coordinator is selected for which > previous exchange is completed > 3) Next coordinator attempts to fetch the affinity, and joining node resends > partitions single message, but there are two problems here. First, exchange > fast-reply does not wait for the new affinity initialization which results in > {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead > either to deadlock or to incorrectly fetched affinity (basically, coordinator > must be in consensus with other nodes passing node filter) > Test attached reproduces the issue. > I suggest to always calculate and keep affinity on all nodes, even ones not > passing the filter. In this case, there will be no need to fetch and > recalculate affinity ({{initCoordinatorCaches}} will go away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used
[ https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770646#comment-16770646 ] Ignite TC Bot commented on IGNITE-10898: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3094509buildTypeId=IgniteTests24Java8_RunAll] > Exchange coordinator failover breaks in some cases when node filter is used > --- > > Key: IGNITE-10898 > URL: https://issues.apache.org/jira/browse/IGNITE-10898 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Assignee: Dmitriy Govorukhin >Priority: Critical > Fix For: 2.8 > > Attachments: NodeWithFilterRestartTest.java > > Time Spent: 20m > Remaining Estimate: 0h > > Currently if a node does not pass cache node filter, we do not store this > cache affinity on the node unless the node is coordinator. This, however, may > fail in the following scenario: > 1) A node passing node filter joins cluster > 2) During the join coordinator fails, new coordinator is selected for which > previous exchange is completed > 3) Next coordinator attempts to fetch the affinity, and joining node resends > partitions single message, but there are two problems here. First, exchange > fast-reply does not wait for the new affinity initialization which results in > {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead > either to deadlock or to incorrectly fetched affinity (basically, coordinator > must be in consensus with other nodes passing node filter) > Test attached reproduces the issue. > I suggest to always calculate and keep affinity on all nodes, even ones not > passing the filter. In this case, there will be no need to fetch and > recalculate affinity ({{initCoordinatorCaches}} will go away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used
[ https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765958#comment-16765958 ] Ignite TC Bot commented on IGNITE-10898: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3060727buildTypeId=IgniteTests24Java8_RunAll] > Exchange coordinator failover breaks in some cases when node filter is used > --- > > Key: IGNITE-10898 > URL: https://issues.apache.org/jira/browse/IGNITE-10898 > Project: Ignite > Issue Type: Bug >Reporter: Alexey Goncharuk >Assignee: Dmitriy Govorukhin >Priority: Critical > Fix For: 2.8 > > Attachments: NodeWithFilterRestartTest.java > > Time Spent: 20m > Remaining Estimate: 0h > > Currently if a node does not pass cache node filter, we do not store this > cache affinity on the node unless the node is coordinator. This, however, may > fail in the following scenario: > 1) A node passing node filter joins cluster > 2) During the join coordinator fails, new coordinator is selected for which > previous exchange is completed > 3) Next coordinator attempts to fetch the affinity, and joining node resends > partitions single message, but there are two problems here. First, exchange > fast-reply does not wait for the new affinity initialization which results in > {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead > either to deadlock or to incorrectly fetched affinity (basically, coordinator > must be in consensus with other nodes passing node filter) > Test attached reproduces the issue. > I suggest to always calculate and keep affinity on all nodes, even ones not > passing the filter. In this case, there will be no need to fetch and > recalculate affinity ({{initCoordinatorCaches}} will go away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)