[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-28 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803774#comment-16803774
 ] 

Amelchev Nikita commented on IGNITE-11460:
--

Hi, [~amashenkov]

I have investigated issue one more time and suggest next:

1. Fix current incorrect behavior for the case when the current coordinator was 
set onto disconnect and events will continue processing in the listener. So, we 
need to check the {{ctx.clientDisconnected()}} flag and skip overriding 
disconnected coordinator. I added additional synchronization for the case when 
we can override the coordinator in a moment between check this flag and set to 
a new coordinator. Because this is done in different threads.

2. Fix discovery logic for the case when previous cluster events can be 
processed after {{onLocalJoin}} method called. I have filed the issue for this 
case.

I have fixed PR and tested it.
Could you take a look, please?

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Attachments: stacktraces.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-28 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803773#comment-16803773
 ] 

Ignite TC Bot commented on IGNITE-11460:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3433381buildTypeId=IgniteTests24Java8_RunAll]

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Attachments: stacktraces.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-15 Thread Andrew Mashenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793557#comment-16793557
 ] 

Andrew Mashenkov commented on IGNITE-11460:
---

[~NSAmelchev],

I'd think there is a bug in Discovery and onLocalJoin semantic is broken on 
client.
Discovery events should be ordered and we should get events from old topology 
after 're-connect' event, but no events (node_left\failed) shouldn't be ignored.

So, correct fix is to wait somehow for all discovery events from event storage 
being processed before handling onLocalJoin.
Other possible way is to rework 'event' processing to be run in single thread 
with preserving event order.

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Attachments: stacktraces.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-12 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790551#comment-16790551
 ] 

Amelchev Nikita commented on IGNITE-11460:
--

[~amashenkov], At first I did it with this flag. But it doesn't solve the 
problem correctly. In such case we can get the next situation:
client's event queue has some events that can change coordinator. Client 
disconnect from cluster and reconnect again.
Notifier thread sets kernalContext.clientDisconnected=false. The client 
continues processing events from the queue with previous cluster events and set 
the wrong coordinator. 
We should guarantee that no one event will be processed after disconnect +until 
processed reconnect event+. 

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-12 Thread Andrew Mashenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790530#comment-16790530
 ] 

Andrew Mashenkov commented on IGNITE-11460:
---

[~NSAmelchev], got it.

Seems, the root of issue is that we process NODE_FAILED events after 
CLIENT_DISCONNECT happens.
To resolve this, we should ignore all topology change events between 
onDisconected() and next onLocalJoin(), that is what your fix do.

I've found kernalContext.clientDisconnected flag is set to 'true' in 
onDisconnected() and is set to 'false' in onLocalJoin() methods.
I'd think we can use this flag and skip all topology change events in 
onDicovery() method via simple check "if (ctx.clientDisconnected) return".



If any reordering between all those events are possible (e.g. due to event 
processing from different threads) than it look like bug in discovery.

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-12 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790472#comment-16790472
 ] 

Amelchev Nikita commented on IGNITE-11460:
--

[~amashenkov], In such case we can get a situation when the client already 
joined to topology (will receive mvcc requests) but the current coordinator is 
in a disconnected state until processed discovery event.
And possible another situation when the client already disconnected but the 
coordinator wasn't disconnected until processed client disconnect discovery 
event. 
Also, it seems there is no event for local node join in the discovery event 
thread.

With my changes coordinator will be set correctly on local join/on disconnected 
and will not process unnecessary events that happen(will be processed) after 
the client leave topology. When the client join topology again it will be set a 
correct coordinator and events will be processed only after get a client 
reconnect event.

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-12 Thread Andrew Mashenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790415#comment-16790415
 ] 

Andrew Mashenkov commented on IGNITE-11460:
---

[~NSAmelchev], Thanks for clarification.

I've found a place where Ignite handle {{onDisconnected}}, {{onLocalJoin 
}}events in separate thread.
You are right, serializability is lost here and method semantic looks broken or 
methods are not documented well.

{{}}What if we remove these handlers from MvccProcessor and move all their 
logic into onDiscovery method?

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-11 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789730#comment-16789730
 ] 

Amelchev Nikita commented on IGNITE-11460:
--

[~amashenkov], Thanks for taking a look. Guarantees of discovery events work 
fine. 
Problem is that client disconnection processing happens not on the event. It's 
processed in the {{onDisconnected}} method (disco-notifier-thread). 
I attached reproducer in PR. Steps to reproduce:
1. Do something that generates an event to change coordinator. For example, 
start a node. Add a custom listener to hold up this event. Node's {{notifier 
thread}} will process messages and put the {{node_joined}} event into the 
events queue. The custom listener will hold up the event processing.
2. Restart cluster. Client notifier thread will process this and change 
coordinator onto {{disconneted-coordinator}}.  After this, it will put 
{{client-disconnected-event}} into events queue. 
3. The client will find server and process local join in notifier thread and 
set new coordinator. 
4. The event of {{node_joined}} from p.1 will be processed by 
disco-event-thread and coordinator can be changed onto wrong. 
After this client gets {{client disconnect event}} and {{client reconnect}} 
event.
Coordinator changing is out of sync by it's processed not only in event-thread 
with guarantees. It's can be changed by disco-notifier-thread by internal 
methods ({{onDisconnected}}, {{onLocalJoin}}). There are no guarantees between 
these two threads.

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-11 Thread Andrew Mashenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789680#comment-16789680
 ] 

Andrew Mashenkov commented on IGNITE-11460:
---

[~NSAmelchev],

Looks like discovery guarantees that no disco events will be occurs between 
client node got ClientNodeDisconnected and Local join (after which 
ClientReconnected will be generated).

It is not clear how this race can be reproduced as all disco events should be 
serialized and coordinator can't be changed with late exchange processing in 
onExchangeDone.

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-05 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784220#comment-16784220
 ] 

Amelchev Nikita commented on IGNITE-11460:
--

I have prepared PR to fix the issue.
I have added disconnected client flag and now listener will not process 
coordinator changing on discovery events after client disconnected until 
connected again (get a reconnect event). 
TC tests look good.
[~gvvinblade], could you take a look, please?


> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11460) MVCC: Possible race on coordinator changing on client reconnection.

2019-03-05 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784217#comment-16784217
 ] 

Ignite TC Bot commented on IGNITE-11460:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3210728buildTypeId=IgniteTests24Java8_RunAll]

> MVCC: Possible race on coordinator changing on client reconnection.
> ---
>
> Key: IGNITE-11460
> URL: https://issues.apache.org/jira/browse/IGNITE-11460
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found that the wrong coordinator can be set in case of client reconnect:
> {noformat}
> assert newCrd.topologyVersion().compareTo(curCrd.topologyVersion()) > 0;
> java.lang.AssertionError
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onCoordinatorChanged(MvccProcessorImpl.java:541)
> at 
> org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.onLocalJoin(MvccProcessorImpl.java:416)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:851)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2681)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2719)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I have attached reproducer in PR.
> The main reason is that coordinator can be changed from discovery event 
> thread when the client already disconnect (disconnection processed in 
> notifier thread and change coordinator on onDisconnected method).
> Coordinator can be changed in cases:
> 1. notifier disco thread: onDisconnected method
> 2. event disco thread: onDiscovery listener.
> and events can be processed with some delay and override coordinator that set 
> in notifier thread. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)