[jira] [Commented] (IGNITE-8098) Getting affinity for topology version earlier than affinity is calculated because of data race

2019-09-17 Thread Ilya Kasnacheev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931277#comment-16931277
 ] 

Ilya Kasnacheev commented on IGNITE-8098:
-

Possible duplicate of IGNITE-11465

> Getting affinity for topology version earlier than affinity is calculated 
> because of data race
> --
>
> Key: IGNITE-8098
> URL: https://issues.apache.org/jira/browse/IGNITE-8098
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.3
>Reporter: Andrey Aleksandrov
>Priority: Minor
> Fix For: 2.8
>
>
> From time to time the Ignite cluster with services throws next exception 
> during restarting of  some nodes:
> java.lang.IllegalStateException: Getting affinity for topology version 
> earlier than affinity is calculated [locNode=TcpDiscoveryNode 
> [id=c770dbcf-2908-442d-8aa0-bf26a2aecfef, addrs=[10.44.162.169, 127.0.0.1], 
> sockAddrs=[clrv041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], 
> discPort=56500, order=11, intOrder=8, lastExchangeTime=1520931375337, 
> loc=true, ver=2.3.3#20180213-sha1:f446df34, isClient=false], 
> grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=13, 
> minorTopVer=0], head=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> history=[AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> AffinityTopologyVersion [topVer=11, minorTopVer=1], AffinityTopologyVersion 
> [topVer=12, minorTopVer=0], AffinityTopologyVersion [topVer=15, 
> minorTopVer=0]]]
> Looks like the reason of this issue is the data race in GridServiceProcessor 
> class.
> How to reproduce:
> 1)To simulate data race you should update next place in source code:
> Class: GridServiceProcessor
> Method: @Override public void onEvent(final DiscoveryEvent evt, final 
> DiscoCache discoCache) {
> Place:
> 
> try {
>  svcName.set(dep.configuration().getName());
>  ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity().
>  affinityReadyFuture(topVer).get();
> //HERE (between GET and REASSIGN) you should add Thread.sleep(100) for 
> example.
> //try {
> //Thread.sleep(100);
> //}
> //catch (InterruptedException e1) {
> //e1.printStackTrace();
> //}
>  
>  reassign(dep, topVer);
> }
> catch (IgniteCheckedException ex) {
>  if (!(e instanceof ClusterTopologyCheckedException))
>  LT.error(log, ex, "Failed to do service reassignment (will retry): " +
>  dep.configuration().getName());
>  retries.add(dep);
> }
> ...
> 2)After that you should imitate start/shutdown iterations. For reproducing I 
> used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get 
> should be increased to avoid timeout error)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-8098) Getting affinity for topology version earlier than affinity is calculated because of data race

2018-09-27 Thread Vladimir Ozerov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631383#comment-16631383
 ] 

Vladimir Ozerov commented on IGNITE-8098:
-

Moved to AI 2.8 due to inactivity, please feel free to return ticket back to 
2.7 if it is ready by 30 Sep 2018 (AI 2.7 code freeze date).

> Getting affinity for topology version earlier than affinity is calculated 
> because of data race
> --
>
> Key: IGNITE-8098
> URL: https://issues.apache.org/jira/browse/IGNITE-8098
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.3
>Reporter: Andrey Aleksandrov
>Priority: Minor
> Fix For: 2.7
>
>
> From time to time the Ignite cluster with services throws next exception 
> during restarting of  some nodes:
> java.lang.IllegalStateException: Getting affinity for topology version 
> earlier than affinity is calculated [locNode=TcpDiscoveryNode 
> [id=c770dbcf-2908-442d-8aa0-bf26a2aecfef, addrs=[10.44.162.169, 127.0.0.1], 
> sockAddrs=[clrv041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], 
> discPort=56500, order=11, intOrder=8, lastExchangeTime=1520931375337, 
> loc=true, ver=2.3.3#20180213-sha1:f446df34, isClient=false], 
> grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=13, 
> minorTopVer=0], head=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> history=[AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> AffinityTopologyVersion [topVer=11, minorTopVer=1], AffinityTopologyVersion 
> [topVer=12, minorTopVer=0], AffinityTopologyVersion [topVer=15, 
> minorTopVer=0]]]
> Looks like the reason of this issue is the data race in GridServiceProcessor 
> class.
> How to reproduce:
> 1)To simulate data race you should update next place in source code:
> Class: GridServiceProcessor
> Method: @Override public void onEvent(final DiscoveryEvent evt, final 
> DiscoCache discoCache) {
> Place:
> 
> try {
>  svcName.set(dep.configuration().getName());
>  ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity().
>  affinityReadyFuture(topVer).get();
> //HERE (between GET and REASSIGN) you should add Thread.sleep(100) for 
> example.
> //try {
> //Thread.sleep(100);
> //}
> //catch (InterruptedException e1) {
> //e1.printStackTrace();
> //}
>  
>  reassign(dep, topVer);
> }
> catch (IgniteCheckedException ex) {
>  if (!(e instanceof ClusterTopologyCheckedException))
>  LT.error(log, ex, "Failed to do service reassignment (will retry): " +
>  dep.configuration().getName());
>  retries.add(dep);
> }
> ...
> 2)After that you should imitate start/shutdown iterations. For reproducing I 
> used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get 
> should be increased to avoid timeout error)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)