[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-20 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911183#comment-16911183
 ] 

Ignite TC Bot commented on IGNITE-9562:
---

{panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers 
(6)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure 
on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4516785]]

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric 
|https://ci.ignite.apache.org/viewLog.html?buildId=4516787]]

{color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on 
metric |https://ci.ignite.apache.org/viewLog.html?buildId=4516789]]

{color:#d04437}PDS 1{color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=4516793]]
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - 
Test has low fail rate in base branch 0,0% and is not flaky
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly 
- Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 
BuildFailureOnMessage 
|https://ci.ignite.apache.org/viewLog.html?buildId=4516797]]

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4514248buildTypeId=IgniteTests24Java8_RunAll]

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8, 2.7.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-19 Thread Dmitriy Pavlov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910608#comment-16910608
 ] 

Dmitriy Pavlov commented on IGNITE-9562:


[~ivan.glukos] [~mstepachev] thank you for review.

I could merge it to 2.7.6 once TC Bot visa is more or less like N blockers 
found, where N< 5.

Now there is 9: 
https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll=pull%2F6781%2Fhead=Latest=ignite-2.7.6

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8, 2.7.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
>   at 
> 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-18 Thread Dmitriy Pavlov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909937#comment-16909937
 ] 

Dmitriy Pavlov commented on IGNITE-9562:


according to Eduard comments
 IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode   
- needs to be researched
PDS 1 [ tests 5 ]   
  IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly   
   - can be Ignored/failed because of 
https://issues.apache.org/jira/browse/IGNITE-8717

Cache 7 [ tests 2 ]  
  IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable 
- this is an issue, need to be fixed.

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8, 2.7.6
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
>   at 
> 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-18 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909935#comment-16909935
 ] 

Ignite TC Bot commented on IGNITE-9562:
---

{panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers 
(8)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure 
on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4512345]]

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric 
|https://ci.ignite.apache.org/viewLog.html?buildId=4512347]]

{color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on 
metric |https://ci.ignite.apache.org/viewLog.html?buildId=4512349]]

{color:#d04437}Cache (Restarts) 2{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4512351]]
* IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode - 
Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}PDS 1{color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=4512353]]
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - 
Test has low fail rate in base branch 0,0% and is not flaky
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly 
- Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache 7{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4512355]]
* IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable - 
Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 
BuildFailureOnMessage 
|https://ci.ignite.apache.org/viewLog.html?buildId=4512359]]

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4507062buildTypeId=IgniteTests24Java8_RunAll]

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8, 2.7.6
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-16 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908852#comment-16908852
 ] 

Ignite TC Bot commented on IGNITE-9562:
---

{panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers 
(41)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure 
on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503638]]

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric 
|https://ci.ignite.apache.org/viewLog.html?buildId=4503679]]

{color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on 
metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503632]]

{color:#d04437}MVCC Cache{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4503639]]
* IgniteCacheMvccTestSuite: 
CacheMvccPartitionedCoordinatorFailoverTest.testGetReadInsideTxInProgressCoordinatorFails_ReadDelay
 - Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache (Restarts) 2{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4503654]]
* IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode - 
Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}PDS 1{color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=4503673]]
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - 
Test has low fail rate in base branch 0,0% and is not flaky
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly 
- Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache 7{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4503662]]
* IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable - 
Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache 6{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4503661]]
* IgniteCacheTestSuite6: TxRollbackOnTimeoutNearCacheTest.testSimple - Test has 
low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 
BuildFailureOnMessage 
|https://ci.ignite.apache.org/viewLog.html?buildId=4503633]]

{color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 
30|https://ci.ignite.apache.org/viewLog.html?buildId=4503630]]
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation2 
- Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation3 
- Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testConcurrentStartStop2_EventsThrottle - Test has 
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testCustomEventsSimple1_5_Nodes - Test has low fail 
rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testMultipleClusters - Test has low fail rate in base 
branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testConnectionRestore_Coordinator1_1 - Test has low 
fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testWithPersistence1 - Test has low fail rate in base 
branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: 
ZookeeperDiscoverySpiTest.testWithPersistence2 - Test has low fail rate in base 
branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop1 - 
Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop2 - 
Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testDeployService2 
- Test has low fail rate in base branch 0,0% and is not flaky
... and 19 tests blockers

{color:#d04437}Cache (Failover) 1{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4503648]]
* IgniteCacheFailoverTestSuite: 
IgniteAtomicLongChangingTopologySelfTest.testClientQueueCreateCloseFailover - 
Test has low fail rate in base branch 0,0% and is not flaky

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4503707buildTypeId=IgniteTests24Java8_RunAll]

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME

2019-08-14 Thread Dmitriy Pavlov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907266#comment-16907266
 ] 

Dmitriy Pavlov commented on IGNITE-9562:


Could you please cherry-pick commit to Ignite 2.7.6 branch once all tests are 
fixed in master 
?https://github.com/apache/ignite/commit/27e9f705c1f65baae20b7dc3c03e988217dbe3f6

> Destroyed cache that resurrected on an old offline node breaks PME
> --
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249)
>   at 
> 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on a old offline node breaks PME

2019-08-06 Thread Eduard Shangareev (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901162#comment-16901162
 ] 

Eduard Shangareev commented on IGNITE-9562:
---

The proper solution is moving cache configuration to metastore. There was 
implemented a workaround to not get the cluster to hang.


We don’t allow nodes with caches which don’t exist in running cluster and the 
nodes have data on disk.

To allow the nodes join, you need to remove directories with absent caches or 
start the nodes first.

> Destroyed cache that resurrected on a old offline node breaks PME
> -
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
>   at 
> 

[jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on a old offline node breaks PME

2019-08-06 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901157#comment-16901157
 ] 

Ignite TC Bot commented on IGNITE-9562:
---

{panel:title=Branch: [pull/6748/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4467734buildTypeId=IgniteTests24Java8_RunAll]

> Destroyed cache that resurrected on a old offline node breaks PME
> -
>
> Key: IGNITE-9562
> URL: https://issues.apache.org/jira/browse/IGNITE-9562
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Eduard Shangareev
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen 
> before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash 
> recovery process or PME.
> Root cause - we don't start/collect caches from the stopped node on another 
> part of a cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
>   at 
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 
> 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
>  Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee2511, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
>   at 
>