[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803774#comment-16803774 ] Amelchev Nikita commented on IGNITE-11460: -- Hi, [~amashenkov] I have investigated issue one more time and suggest next: 1. Fix current incorrect behavior for the case when the current coordinator was set onto disconnect and events will continue processing in the listener. So, we need to check the {{ctx.clientDisconnected()}} flag and skip overriding disconnected coordinator. I added additional synchronization for the case when we can override the coordinator in a moment between check this flag and set to a new coordinator. Because this is done in different threads. 2. Fix discovery logic for the case when previous cluster events can be processed after {{onLocalJoin}} method called. I have filed the issue for this case. I have fixed PR and tested it. Could you take a look, please? > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Attachments: stacktraces.log > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803773#comment-16803773 ] Ignite TC Bot commented on IGNITE-11460: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3433381buildTypeId=IgniteTests24Java8_RunAll] > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Attachments: stacktraces.log > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793557#comment-16793557 ] Andrew Mashenkov commented on IGNITE-11460: --- [~NSAmelchev], I'd think there is a bug in Discovery and onLocalJoin semantic is broken on client. Discovery events should be ordered and we should get events from old topology after 're-connect' event, but no events (node_left\failed) shouldn't be ignored. So, correct fix is to wait somehow for all discovery events from event storage being processed before handling onLocalJoin. Other possible way is to rework 'event' processing to be run in single thread with preserving event order. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Attachments: stacktraces.log > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790551#comment-16790551 ] Amelchev Nikita commented on IGNITE-11460: -- [~amashenkov], At first I did it with this flag. But it doesn't solve the problem correctly. In such case we can get the next situation: client's event queue has some events that can change coordinator. Client disconnect from cluster and reconnect again. Notifier thread sets kernalContext.clientDisconnected=false. The client continues processing events from the queue with previous cluster events and set the wrong coordinator. We should guarantee that no one event will be processed after disconnect +until processed reconnect event+. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790530#comment-16790530 ] Andrew Mashenkov commented on IGNITE-11460: --- [~NSAmelchev], got it. Seems, the root of issue is that we process NODE_FAILED events after CLIENT_DISCONNECT happens. To resolve this, we should ignore all topology change events between onDisconected() and next onLocalJoin(), that is what your fix do. I've found kernalContext.clientDisconnected flag is set to 'true' in onDisconnected() and is set to 'false' in onLocalJoin() methods. I'd think we can use this flag and skip all topology change events in onDicovery() method via simple check "if (ctx.clientDisconnected) return". If any reordering between all those events are possible (e.g. due to event processing from different threads) than it look like bug in discovery. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790472#comment-16790472 ] Amelchev Nikita commented on IGNITE-11460: -- [~amashenkov], In such case we can get a situation when the client already joined to topology (will receive mvcc requests) but the current coordinator is in a disconnected state until processed discovery event. And possible another situation when the client already disconnected but the coordinator wasn't disconnected until processed client disconnect discovery event. Also, it seems there is no event for local node join in the discovery event thread. With my changes coordinator will be set correctly on local join/on disconnected and will not process unnecessary events that happen(will be processed) after the client leave topology. When the client join topology again it will be set a correct coordinator and events will be processed only after get a client reconnect event. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790415#comment-16790415 ] Andrew Mashenkov commented on IGNITE-11460: --- [~NSAmelchev], Thanks for clarification. I've found a place where Ignite handle {{onDisconnected}}, {{onLocalJoin }}events in separate thread. You are right, serializability is lost here and method semantic looks broken or methods are not documented well. {{}}What if we remove these handlers from MvccProcessor and move all their logic into onDiscovery method? > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789730#comment-16789730 ] Amelchev Nikita commented on IGNITE-11460: -- [~amashenkov], Thanks for taking a look. Guarantees of discovery events work fine. Problem is that client disconnection processing happens not on the event. It's processed in the {{onDisconnected}} method (disco-notifier-thread). I attached reproducer in PR. Steps to reproduce: 1. Do something that generates an event to change coordinator. For example, start a node. Add a custom listener to hold up this event. Node's {{notifier thread}} will process messages and put the {{node_joined}} event into the events queue. The custom listener will hold up the event processing. 2. Restart cluster. Client notifier thread will process this and change coordinator onto {{disconneted-coordinator}}. After this, it will put {{client-disconnected-event}} into events queue. 3. The client will find server and process local join in notifier thread and set new coordinator. 4. The event of {{node_joined}} from p.1 will be processed by disco-event-thread and coordinator can be changed onto wrong. After this client gets {{client disconnect event}} and {{client reconnect}} event. Coordinator changing is out of sync by it's processed not only in event-thread with guarantees. It's can be changed by disco-notifier-thread by internal methods ({{onDisconnected}}, {{onLocalJoin}}). There are no guarantees between these two threads. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789680#comment-16789680 ] Andrew Mashenkov commented on IGNITE-11460: --- [~NSAmelchev], Looks like discovery guarantees that no disco events will be occurs between client node got ClientNodeDisconnected and Local join (after which ClientReconnected will be generated). It is not clear how this race can be reproduced as all disco events should be serialized and coordinator can't be changed with late exchange processing in onExchangeDone. > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784220#comment-16784220 ] Amelchev Nikita commented on IGNITE-11460: -- I have prepared PR to fix the issue. I have added disconnected client flag and now listener will not process coordinator changing on discovery events after client disconnected until connected again (get a reconnect event). TC tests look good. [~gvvinblade], could you take a look, please? > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.
[ https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784217#comment-16784217 ] Ignite TC Bot commented on IGNITE-11460: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3210728buildTypeId=IgniteTests24Java8_RunAll] > MVCC: Possible race on coordinator changing on client reconnection. > --- > > Key: IGNITE-11460 > URL: https://issues.apache.org/jira/browse/IGNITE-11460 > Project: Ignite > Issue Type: Bug >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > I found that the wrong coordinator can be set in case of client reconnect: > {noformat} > assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0; > java.lang.AssertionError > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541) > at > org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > I have attached reproducer in PR. > The main reason is that coordinator can be changed from discovery event > thread when the client already disconnect (disconnection processed in > notifier thread and change coordinator on onDisconnected method). > Coordinator can be changed in cases: > 1. notifier disco thread: onDisconnected method > 2. event disco thread: onDiscovery listener. > and events can be processed with some delay and override coordinator that set > in notifier thread. -- This message was sent by Atlassian JIRA (v7.6.3#76005)