[jira] [Created] (IGNITE-12422) Clean up GG-XXX internal ticket references from code base.
Alexei Scherbakov created IGNITE-12422: -- Summary: Clean up GG-XXX internal ticket references from code base. Key: IGNITE-12422 URL: https://issues.apache.org/jira/browse/IGNITE-12422 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.9 Replace with Apache Ignite equivalent if possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12332) Fix flaky test GridCacheAtomicClientInvalidPartitionHandlingSelfTest#testPrimaryFullAsync
Alexei Scherbakov created IGNITE-12332: -- Summary: Fix flaky test GridCacheAtomicClientInvalidPartitionHandlingSelfTest#testPrimaryFullAsync Key: IGNITE-12332 URL: https://issues.apache.org/jira/browse/IGNITE-12332 Project: Ignite Issue Type: Bug Affects Versions: 2.7.6 Reporter: Alexei Scherbakov Fix For: 2.8 Can be reproduced locally with range = 10_000 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12329) Invalid handling of remote entries causes partition desync and transaction hanging in COMMITTING state.
Alexei Scherbakov created IGNITE-12329: -- Summary: Invalid handling of remote entries causes partition desync and transaction hanging in COMMITTING state. Key: IGNITE-12329 URL: https://issues.apache.org/jira/browse/IGNITE-12329 Project: Ignite Issue Type: Bug Affects Versions: 2.7.6 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 This can happen if transaction is mapped on a partition which is about to be evicted on backup. Due to bugs entry belonging to other cache may be excluded from commit or entry containing a lock can be removed without lock release causes depending transactions to hang. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12328) IgniteException "Failed to resolve nodes topology" during cache.removeAll() and constantly changing topology
Alexei Scherbakov created IGNITE-12328: -- Summary: IgniteException "Failed to resolve nodes topology" during cache.removeAll() and constantly changing topology Key: IGNITE-12328 URL: https://issues.apache.org/jira/browse/IGNITE-12328 Project: Ignite Issue Type: Bug Affects Versions: 2.7.6 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 {noformat} [2019-09-25 13:13:58,339][ERROR][TxThread-threadNum-3] Failed to complete transaction. org.apache.ignite.IgniteException: Failed to resolve nodes topology [cacheGrp=cache_group_36, topVer=AffinityTopologyVersion [topVer=16, minorTopVer=0], history=[AffinityTopologyVersion [topVer=13, minorTopVer=0], AffinityTopologyVersion [topVer=14, minorTopVer=0], AffinityTopologyVersion [topVer=15, minorTopVer=0]], snap=Snapshot [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]], locNode=TcpDiscoveryNode [id=6cbf7666-9a8c-4b61-8b3f-6351ef44bd4a, consistentId=poc-tester-client-172.25.1.21-id-0, addrs=ArrayList [172.25.1.21], sockAddrs=HashSet [lab21.gridgain.local/172.25.1.21:0], discPort=0, order=13, intOrder=0, lastExchangeTime=1569406379934, loc=true, ver=2.5.10#20190922-sha1:02133315, isClient=true]] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:2125) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:2007) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheUtils.affinityNodes(GridCacheUtils.java:465) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map0(GridDhtColocatedLockFuture.java:939) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:911) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.map(GridDhtColocatedLockFuture.java:811) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.lockAllAsync(GridDhtColocatedCache.java:656) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheAdapter.txLockAsync(GridDistributedCacheAdapter.java:109) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync0(GridNearTxLocal.java:1648) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.removeAllAsync(GridNearTxLocal.java:521) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheAdapter$33.inOp(GridCacheAdapter.java:2619) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:4701) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:3780) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll0(GridCacheAdapter.java:2617) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll(GridCacheAdapter.java:2606) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.removeAll(IgniteCacheProxyImpl.java:1553) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.removeAll(GatewayProtectedCacheProxy.java:1026) ~[ignite-core-2.5.10.jar:2.5.10] at org.apache.ignite.scenario.TxBalanceTask$TxBody.doTxRemoveAll(TxBalanceTask.java:291) ~[poc-tester-0.1.0-SNAPSHOT.jar:?] at org.apache.ignite.scenario.TxBalanceTask$TxBody.call(TxBalanceTask.java:93) ~[poc-tester-0.1.0-SNAPSHOT.jar:?] at org.apache.ignite.scenario.TxBalanceTask$TxBody.call(TxBalanceTask.java:70) ~[poc-tester-0.1.0-SNAPSHOT.jar:?] at org.apache.ignite.scenario.internal.AbstractTxTask.doInTransaction(AbstractTxTask.java:290) ~[poc-tester-0.1.0-SNAPSHOT.jar:?] at org.apache.ignite.scenario.internal.AbstractTxTask.access$400(AbstractTxTask.java:56) ~[poc-tester-0.1.0-SNAPSHOT.jar:?] at org.apache.ignite.scenario.internal.AbstractTxTask$TxRunner.call(AbstractTxTask.java:470) [poc-tester-0.1.0-SNAPSHOT.jar:?] at
[jira] [Created] (IGNITE-12327) Cross-cache tx is mapped on wrong primary when enlisted caches have incompatible assignments.
Alexei Scherbakov created IGNITE-12327: -- Summary: Cross-cache tx is mapped on wrong primary when enlisted caches have incompatible assignments. Key: IGNITE-12327 URL: https://issues.apache.org/jira/browse/IGNITE-12327 Project: Ignite Issue Type: Bug Affects Versions: 2.7.6 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 This is happening when supplier node is left while rebalancing is partially completed on demander. Suppose we have 2 cache groups, rebalance is in progress and for first group rebalance is done and for second group rebalance is partially done (some partitions are still MOVING). At this moment supplier node dies and corresponding topology version is (N,0). New assignment is computed using current state of partitions and for first group will be ideal and the same as for next topology (N,1) which will be triggered after all rebalancing is completed by CacheAffinityChangeMessage. For second group affinity will not be ideal. If transaction is started while PME is in progress (N, 0)->(N,1), first lock request will pass remap check if it's enslists rebalanced group. All subsequent lock requests will use invalid topology from previous assignment. Possible fix: return actual locked topology version from first lock request and use it for all subsequent requests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12038) Fix several failing tests after IGNITE-10078
Alexei Scherbakov created IGNITE-12038: -- Summary: Fix several failing tests after IGNITE-10078 Key: IGNITE-12038 URL: https://issues.apache.org/jira/browse/IGNITE-12038 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11939) IgnitePdsTxHistoricalRebalancingTest.testTopologyChangesWithConstantLoad test failure
Alexei Scherbakov created IGNITE-11939: -- Summary: IgnitePdsTxHistoricalRebalancingTest.testTopologyChangesWithConstantLoad test failure Key: IGNITE-11939 URL: https://issues.apache.org/jira/browse/IGNITE-11939 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Caused by exception on releasing reserved segments: {noformat} [12:51:23]W: [org.apache.ignite:ignite-indexing] [2019-06-21 12:51:23,967][ERROR][exchange-worker-#33825%persistence.IgnitePdsTxHistoricalRebalancingTest1%][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped) : GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=7, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=CacheAffinityChangeMessage [id=08de0ff7b61-276ac575-e4dc-4525-b24b-d0a5d1d7633d, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0], exc hId=null, partsMsg=null, exchangeNeeded=true], affTopVer=AffinityTopologyVersion [topVer=7, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=97e46568-6aa0-4a4b-864c-f05415c0, consistentId=persistence.IgnitePdsTxHistoricalRebalancingTest0, addrs=Arra yList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1561110643882, loc=false, ver=2.8.0#20190621-sha1:, isClient=false], topVer=7, nodeId8=0ff3354e, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=15611106839 58]], nodeId=97e46568, evt=DISCOVERY_CUSTOM_EVT] [12:51:23]W: [org.apache.ignite:ignite-indexing] java.lang.AssertionError: cur=null, absIdx=0 [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentReservationStorage.release(SegmentReservationStorage.java:55) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.release(SegmentAware.java:207) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.release(FileWriteAheadLogManager.java:983) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.releaseHistoryForPreloading(GridCacheDatabaseSharedManager.java:1844) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1431) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:862) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3079) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2928) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [12:51:23]W: [org.apache.ignite:ignite-indexing]at java.lang.Thread.run(Thread.java:748) [12:51:23]W: [org.apache.ignite:ignite-indexing] [12:51:23] (err) Failed to notify listener: o.a.i.i.processors.timeout.GridTimeoutProcessor$2...@79ba1907java.lang.AssertionError: cur=null, absIdx=0 [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentReservationStorage.release(SegmentReservationStorage.java:55) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.aware.SegmentAware.release(SegmentAware.java:207) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.release(FileWriteAheadLogManager.java:983) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.releaseHistoryForPreloading(GridCacheDatabaseSharedManager.java:1844) [12:51:23]W: [org.apache.ignite:ignite-indexing]at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1431) [12:51:23]W: [org.apache.ignite:ignite-indexing]at
[jira] [Created] (IGNITE-11937) Fix MVCC PDS flaky suites timeout
Alexei Scherbakov created IGNITE-11937: -- Summary: Fix MVCC PDS flaky suites timeout Key: IGNITE-11937 URL: https://issues.apache.org/jira/browse/IGNITE-11937 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Currently we have non-zero failure rate for some MVCC PDS suites in master. Seems this is due to failure [1] in testRebalancingDuringLoad* tests group, which leads to dumping WAL and lock states at the time proportional to current WAL length increasing test duration for random time depending on WAL length. Worse thing the test remains green despite throwing a critical exception. [1] Stacktrace {noformat} [2019-06-19 15:56:53,386][ERROR][sys-stripe-6-#134%persistence.IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes3%][IgniteTestResources] Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailure Handler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, page Id)=[IgniteBiTuple [val1=81227264, val2=844420635164676]], msg=Runtime failure on search row: TxKey [major=1560948946388, minor=17286 class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=81227264, val2=844420635164676]], msg=Runtime failure on search row: TxKey [major=1560948946388, minor=17286]] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:5909) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1859) at org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog.put(TxLog.java:293) at org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl.updateState(MvccProcessorImpl.java:699) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.setMvccState(IgniteTxManager.java:2570) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1228) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1070) at org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.prepareRemoteTx(GridDistributedTxRemoteAdapter.java:421) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.startRemoteTx(IgniteTxHandler.java:1837) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxPrepareRequest(IgniteTxHandler.java:1198) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$400(IgniteTxHandler.java:118) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:224) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$5.apply(IgniteTxHandler.java:222) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1558) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1186) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1083) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Unexpected new transaction state. [currState=2, newState=1, cntr=17286] at org.apache.ignite.internal.processors.cache.mvcc.txlog.TxLog$TxLogUpdateClosure.invalid(TxLog.java:629) at
[jira] [Created] (IGNITE-11887) Add more test scenarious for OWNING -> RENTING -> MOVING scenario
Alexei Scherbakov created IGNITE-11887: -- Summary: Add more test scenarious for OWNING -> RENTING -> MOVING scenario Key: IGNITE-11887 URL: https://issues.apache.org/jira/browse/IGNITE-11887 Project: Ignite Issue Type: Test Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Relevant test GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11867) Fix flaky test GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions
Alexei Scherbakov created IGNITE-11867: -- Summary: Fix flaky test GridCacheRebalancingWithAsyncClearingTest#testCorrectRebalancingCurrentlyRentingPartitions Key: IGNITE-11867 URL: https://issues.apache.org/jira/browse/IGNITE-11867 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11862) Cache stopping on supplier during rebalance causes NPE and supplying failure.
Alexei Scherbakov created IGNITE-11862: -- Summary: Cache stopping on supplier during rebalance causes NPE and supplying failure. Key: IGNITE-11862 URL: https://issues.apache.org/jira/browse/IGNITE-11862 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov {noformat} [21:12:14]W: [org.apache.ignite:ignite-core] [2019-05-20 21:12:14,376][ERROR][sys-#60310%distributed.CacheParallelStartTest0%][GridDhtPartitionSupplier] Failed to continue supplying [grp=static-cache-group45, demander=ed1c0109-8721-4cd8-80d9-d36e8251, top Ver=AffinityTopologyVersion [topVer=2, minorTopVer=0], topic=0] [21:12:14]W: [org.apache.ignite:ignite-core] java.lang.NullPointerException [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.CacheGroupContext.addRebalanceSupplyEvent(CacheGroupContext.java:525) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:422) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:397) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:455) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:440) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1141) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1706) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1566) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:129) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2795) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1523) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:129) [21:12:14]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1492) [21:12:14]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [21:12:14]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [21:12:14]W: [org.apache.ignite:ignite-core] at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11857) Investigate performance drop after IGNITE-10078
Alexei Scherbakov created IGNITE-11857: -- Summary: Investigate performance drop after IGNITE-10078 Key: IGNITE-11857 URL: https://issues.apache.org/jira/browse/IGNITE-11857 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov After IGNITE-1078 yardstick tests show performance drop up to 8% in some scenarios: * tx-optim-repRead-put-get * tx-optimistic-put * tx-putAll Partially this is due new update counter implementation, but not only. Investigation is required. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11820) Add partition consistency tests for multiple caches in group.
Alexei Scherbakov created IGNITE-11820: -- Summary: Add partition consistency tests for multiple caches in group. Key: IGNITE-11820 URL: https://issues.apache.org/jira/browse/IGNITE-11820 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11804) Assertion error
Alexei Scherbakov created IGNITE-11804: -- Summary: Assertion error Key: IGNITE-11804 URL: https://issues.apache.org/jira/browse/IGNITE-11804 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Reproducer (needs some cleanup) {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.transactions; import java.util.ArrayList; import java.util.List; import java.util.concurrent.atomic.AtomicReference; import org.apache.ignite.Ignite; import org.apache.ignite.IgniteCache; import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.DataRegionConfiguration; import org.apache.ignite.configuration.DataStorageConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.failure.StopNodeFailureHandler; import org.apache.ignite.internal.IgniteEx; import org.apache.ignite.internal.processors.cache.GridCacheContext; import org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl; import org.apache.ignite.internal.processors.cache.persistence.db.wal.IgniteWalRebalanceTest; import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi; import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import org.apache.ignite.transactions.Transaction; import org.junit.Test; import static java.util.concurrent.TimeUnit.DAYS; import static java.util.concurrent.TimeUnit.MILLISECONDS; import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL; import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC; import static org.apache.ignite.configuration.WALMode.LOG_ONLY; import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC; import static org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ; /** * Test framework for ordering transaction's prepares and commits by intercepting messages and releasing then * in user defined order. */ public class TxPartitionCounterStateAbstractTest extends GridCommonAbstractTest { /** IP finder. */ private static final TcpDiscoveryVmIpFinder IP_FINDER = new TcpDiscoveryVmIpFinder(true); /** */ private static final int MB = 1024 * 1024; /** */ protected int backups; /** */ public static final int TEST_TIMEOUT = 30_000; public static final String DEFAULT_CACHE_NAME_2 = DEFAULT_CACHE_NAME + "2"; /** */ private AtomicReference testFailed = new AtomicReference<>(); /** Number of keys to preload before txs to enable historical rebalance. */ protected static final int PRELOAD_KEYS_CNT = 1; /** */ protected static final String CLIENT_GRID_NAME = "client"; /** */ protected static final int PARTS_CNT = 32; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setConsistentId("node" + igniteInstanceName); cfg.setFailureHandler(new StopNodeFailureHandler()); cfg.setRebalanceThreadPoolSize(4); // Necessary to reproduce some issues. ((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER); // TODO set this only for historical rebalance tests. cfg.setCommunicationSpi(new IgniteWalRebalanceTest.WalRebalanceCheckingCommunicationSpi()); boolean client = igniteInstanceName.startsWith(CLIENT_GRID_NAME); cfg.setClientMode(client); cfg.setDataStorageConfiguration(new DataStorageConfiguration(). setWalHistorySize(1000). setWalSegmentSize(8 * MB).setWalMode(LOG_ONLY).setPageSize(1024). setCheckpointFrequency(MILLISECONDS.convert(365, DAYS)). setDefaultDataRegionConfiguration(new DataRegionConfiguration().setPersistenceEnabled(true). setInitialSize(100 * MB).setMaxSize(100 * MB))); if
[jira] [Created] (IGNITE-11801) Clearing of moving partition may lead to partition desync.
Alexei Scherbakov created IGNITE-11801: -- Summary: Clearing of moving partition may lead to partition desync. Key: IGNITE-11801 URL: https://issues.apache.org/jira/browse/IGNITE-11801 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov {{o.a.i.i.processors.cache.distributed.dht.topology.GridDhtLocalPartition#tryClear}} calls {{org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition#clearAll}} Inside clearAll {{clearVer = ctx.versions().next();}} is defined on call time, but this may happen after exchange future is finished and some update already applied to MOVING partition resulting in removal of actual data from partition. Fix: assign clear version before exchange future is finished. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11800) Update counters in o.a.i.i.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl#update could be applied from stale messages
Alexei Scherbakov created IGNITE-11800: -- Summary: Update counters in o.a.i.i.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl#update could be applied from stale messages Key: IGNITE-11800 URL: https://issues.apache.org/jira/browse/IGNITE-11800 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Stale check goes after applying incoming counters which seems wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11799) Do not always clear partition in MOVING state before exchange
Alexei Scherbakov created IGNITE-11799: -- Summary: Do not always clear partition in MOVING state before exchange Key: IGNITE-11799 URL: https://issues.apache.org/jira/browse/IGNITE-11799 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov After IGNITE-10078 if partition was in moving state before exchange and choosed for full rebalance (for example, this will happen if any minor PME cancels previous rebalance) we always will clear it to avoid desync issues if some removals were not delivered to demander. This is not always necessary to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11797) Repair historical rebalancing for atomic and mixed tx-atomic cache groups.
Alexei Scherbakov created IGNITE-11797: -- Summary: Repair historical rebalancing for atomic and mixed tx-atomic cache groups. Key: IGNITE-11797 URL: https://issues.apache.org/jira/browse/IGNITE-11797 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov IGNITE-10078 only solves consistency problems for tx mode. For atomic caches the rebalance consistency issues still remain and should be fixed together with improvement of atomic cache protocol consistency. Mixed tx-atomic mode for cache group should be not allowed at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11794) Remove initial counter from update counter contract.
Alexei Scherbakov created IGNITE-11794: -- Summary: Remove initial counter from update counter contract. Key: IGNITE-11794 URL: https://issues.apache.org/jira/browse/IGNITE-11794 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov We gave org.apache.ignite.internal.processors.cache.PartitionUpdateCounter#initial and org.apache.ignite.internal.processors.cache.PartitionUpdateCounter#updateInitial method in patition update counter contract but they are not needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11793) Failover for isolated updater mode.
Alexei Scherbakov created IGNITE-11793: -- Summary: Failover for isolated updater mode. Key: IGNITE-11793 URL: https://issues.apache.org/jira/browse/IGNITE-11793 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Currently with isolated updater (datastream + allowOverride=false) even for transactional mode counters are generated independently on all owners. In case of some nodes fail there is high risk of partition desync. Also this mode couldn't be used together with concurrent transactions after IGNITE-10078. I suggest to introduce special loading mode for cache where concurrent updates are prohibited until initial data loading (using isolated updater) is completed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11790) Optimize rebalance history calculation.
Alexei Scherbakov created IGNITE-11790: -- Summary: Optimize rebalance history calculation. Key: IGNITE-11790 URL: https://issues.apache.org/jira/browse/IGNITE-11790 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Currently we pass initial update counters to coordinator during PME. But this is not needed for calculation rebalance history. It can be calculated like: maxCntr - updateCounter(last counter for sequential history) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11663) Dispose of copypaste code in org.apache.ignite.internal.processors.cache.persistence.wal.record.RecordTypes
Alexei Scherbakov created IGNITE-11663: -- Summary: Dispose of copypaste code in org.apache.ignite.internal.processors.cache.persistence.wal.record.RecordTypes Key: IGNITE-11663 URL: https://issues.apache.org/jira/browse/IGNITE-11663 Project: Ignite Issue Type: Improvement Environment: org.apache.ignite.internal.pagemem.wal.record.WALRecord.RecordPurpose Reporter: Alexei Scherbakov Fix For: 2.8 We already have org.apache.ignite.internal.pagemem.wal.record.WALRecord.RecordPurpose for defining record relation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11611) If partition cannot be recovered during rebalance it should be moved to LOST state.
Alexei Scherbakov created IGNITE-11611: -- Summary: If partition cannot be recovered during rebalance it should be moved to LOST state. Key: IGNITE-11611 URL: https://issues.apache.org/jira/browse/IGNITE-11611 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11607) Historical rebalance is not possible from partition which was recently rebalanced itself
Alexei Scherbakov created IGNITE-11607: -- Summary: Historical rebalance is not possible from partition which was recently rebalanced itself Key: IGNITE-11607 URL: https://issues.apache.org/jira/browse/IGNITE-11607 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11594) IgnitePdsContinuousRestartTestWithExpiryPolicy test reports partition sizes validation error.
Alexei Scherbakov created IGNITE-11594: -- Summary: IgnitePdsContinuousRestartTestWithExpiryPolicy test reports partition sizes validation error. Key: IGNITE-11594 URL: https://issues.apache.org/jira/browse/IGNITE-11594 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Most probably this is due to concurrent expiration during PME. Looks like validation of sizes for expiring cache partitions have no meaning. Also base test IgnitePdsContinuousRestartTest doesn't test any invariant. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11256) Implement read-only mode for grid
Alexei Scherbakov created IGNITE-11256: -- Summary: Implement read-only mode for grid Key: IGNITE-11256 URL: https://issues.apache.org/jira/browse/IGNITE-11256 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Should be triggered from control.sh utility. Useful for maintenance work, for example checking partition consistency (idle_verify) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11171) Assertion on tx preparing
Alexei Scherbakov created IGNITE-11171: -- Summary: Assertion on tx preparing Key: IGNITE-11171 URL: https://issues.apache.org/jira/browse/IGNITE-11171 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.8 {noformat} 2019-01-22 14:00:01.203[ERROR][sys-stripe-15-#16%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandle r, failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.AssertionError: Got entry removed exception while holding transactional lock on entry [e=o.a.i.i.processors.cache.GridCacheEntryRemovedException, cached=GridDhtCacheEntry [rdrs=ReaderId[] [], part=7042, super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=7042, val=SCHEDULED_CHECK_STOP_PAYMENTS_TASK_DPL_defaultSection, hasValBytes=true], val=null, startVer=1548154332959, ver=GridCacheVersion [topVer=159054171, order=1548061479047, nodeOrder=20], hash=1755381247, extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion [topVer=2147483647, order=0, nodeOrder=0]], flags=2]] java.lang.AssertionError: Got entry removed exception while holding transactional lock on entry [e=org.apache.ignite.internal.processors.cache.GridCacheEntryRemovedException, cached=GridDhtCacheEntry [rdrs=ReaderId[] [], part=7042, supe r=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=7042, val=SCHEDULED_CHECK_STOP_PAYMENTS_TASK_DPL_defaultSection, hasValBytes=true], val=null, startVer=1548154332959, ver=GridCacheVersion [topVer=159054 171, order=1548061479047, nodeOrder=20], hash=1755381247, extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion [topVer=2147483647, order=0, nodeOrder=0]], flags=2 at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:512) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1231) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:520) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:161) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:139) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:101) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:181) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:179) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1058) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:583) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:382) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:297) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:496) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11131) Invalid use of static system properties in AffinityAssignment
Alexei Scherbakov created IGNITE-11131: -- Summary: Invalid use of static system properties in AffinityAssignment Key: IGNITE-11131 URL: https://issues.apache.org/jira/browse/IGNITE-11131 Project: Ignite Issue Type: Task Reporter: Alexei Scherbakov Fix For: 2.8 Recently added properties {{org.apache.ignite.internal.processors.affinity.AffinityAssignment#IGNITE_AFFINITY_BACKUPS_THRESHOLD}} and {{org.apache.ignite.internal.processors.affinity.AffinityAssignment#IGNITE_DISABLE_AFFINITY_MEMORY_OPTIMIZATION}} have flaw - the are defined as static making it impossible to change between node restarts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11099) Implement test framework for measuring heap utilization
Alexei Scherbakov created IGNITE-11099: -- Summary: Implement test framework for measuring heap utilization Key: IGNITE-11099 URL: https://issues.apache.org/jira/browse/IGNITE-11099 Project: Ignite Issue Type: Task Reporter: Alexei Scherbakov Fix For: 2.8 It's necessary to create special test framwork capable of heap usage comparison in "before optimization" vs "after opitmization" modes. Most probably it should be implemented as special test suite running with instrumentation support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11059) Print information about pending locks queue in case of dht local tx timeout.
Alexei Scherbakov created IGNITE-11059: -- Summary: Print information about pending locks queue in case of dht local tx timeout. Key: IGNITE-11059 URL: https://issues.apache.org/jira/browse/IGNITE-11059 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Currently in case of dht local tx timeout it's hard to understand which keys was not locked. Addtional information should be printed in log on timeout containing information about pending keys: key, tx info holding a lock (xid, label if present) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11058) Possible OOM due to large discard queue in TcpDiscoverySpi
Alexei Scherbakov created IGNITE-11058: -- Summary: Possible OOM due to large discard queue in TcpDiscoverySpi Key: IGNITE-11058 URL: https://issues.apache.org/jira/browse/IGNITE-11058 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Currently it's necessary to store every ensured (marked with TcpDiscoveryEnsureDelivery annotation) in pending message queue until it's discarded from coordinator for implementing guaranteed delivery, otherwise if subsequent nodes will fail while forwarding message the guarantee couldn't be fulfilled. On large topologies with active changes the queue may contain many very large messages causing heap usage bursts and possible OOM. Possible solution: # off-load pending messages payload to off-heap or even on disk. # store messages in serialized form for avoiding JVM Object overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10957) Reduce EnsuredMessageHistory heap occupation
Alexei Scherbakov created IGNITE-10957: -- Summary: Reduce EnsuredMessageHistory heap occupation Key: IGNITE-10957 URL: https://issues.apache.org/jira/browse/IGNITE-10957 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 EnsuredMessageHistory can hold up to 512 discovery messages to ensure message delivery on client reconnect and clears lazily. With large topology and a large amount caches/partitions this can take up to several Gbs of heap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10920) Optimize HistoryAffinityAssignment heap usage.
Alexei Scherbakov created IGNITE-10920: -- Summary: Optimize HistoryAffinityAssignment heap usage. Key: IGNITE-10920 URL: https://issues.apache.org/jira/browse/IGNITE-10920 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov With large topology and large amount of caches/partitions many server discovery events may quickly produce large affinity history, eating gigabytes of heap. Solution: implement some kind of a compression for affinity cache map. On example, affinity history could be stored as delta to some previous version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10913) Reduce heap occupation by o.a.i.i.processors.cache.persistence.file.FilePageStore instances.
Alexei Scherbakov created IGNITE-10913: -- Summary: Reduce heap occupation by o.a.i.i.processors.cache.persistence.file.FilePageStore instances. Key: IGNITE-10913 URL: https://issues.apache.org/jira/browse/IGNITE-10913 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 With large topology and large amount of caches/partitions and enabled persistence could be millions of FilePageStore objects in heap (for each partition). Each instance has a reference to a File (field cfgFile) storing as String absolute path to a partition. Also internal File inplementation (on example UnixFile) also allocates space for file path. I observed about 2Gb of heap space occupied by these objects in one of environments. Solution: dereference (set to null) cfgFile after object creation, create File object lazily on demand when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10912) Huge node join request discovery message slows down node joining and corresponding PME
Alexei Scherbakov created IGNITE-10912: -- Summary: Huge node join request discovery message slows down node joining and corresponding PME Key: IGNITE-10912 URL: https://issues.apache.org/jira/browse/IGNITE-10912 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 WIth large topology and large number of caches/groups node join message can reach a size > 30M due to a large amount of transferred discovery data. It adds overhead on ring traversal and slows down "node join" PME. Possible solution: # introduce pre-join message with discovery data which doesn't increment topology version. After all nodes wil have corressponding discovery data start actual joining. Discovery data probably should be stored off-heap(or even on disk) to avoid heap usage bursts on joining of multiple nodes. # Add compression to discovery data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10894) Reduce heap utilization for grids with big topologies and caches numbers.
Alexei Scherbakov created IGNITE-10894: -- Summary: Reduce heap utilization for grids with big topologies and caches numbers. Key: IGNITE-10894 URL: https://issues.apache.org/jira/browse/IGNITE-10894 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov This is an unbrella ticket for all optimizations related to reducing of heap utilization of large Ignite deployments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10443) Fix flaky GridCommandHandlerTest.testKillHangingLocalTransactions
Alexei Scherbakov created IGNITE-10443: -- Summary: Fix flaky GridCommandHandlerTest.testKillHangingLocalTransactions Key: IGNITE-10443 URL: https://issues.apache.org/jira/browse/IGNITE-10443 Project: Ignite Issue Type: Test Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10418) Implement lightweight profiling for message processing
Alexei Scherbakov created IGNITE-10418: -- Summary: Implement lightweight profiling for message processing Key: IGNITE-10418 URL: https://issues.apache.org/jira/browse/IGNITE-10418 Project: Ignite Issue Type: New Feature Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10255) Avoid history reservation on affinity change.
Alexei Scherbakov created IGNITE-10255: -- Summary: Avoid history reservation on affinity change. Key: IGNITE-10255 URL: https://issues.apache.org/jira/browse/IGNITE-10255 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Currently WAL history is reserved even if exchange is triggered by affinity change message, which means rebalance completed and assignment is ideal. Reservation is not needed in such case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10251) Get rid of the code left from times when lateAffinity=false was supported
Alexei Scherbakov created IGNITE-10251: -- Summary: Get rid of the code left from times when lateAffinity=false was supported Key: IGNITE-10251 URL: https://issues.apache.org/jira/browse/IGNITE-10251 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov This code can hide errors and lead to inefficient processing in some scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10117) Node is mistakenly excluded from history suppliers preventing historical rebalance.
Alexei Scherbakov created IGNITE-10117: -- Summary: Node is mistakenly excluded from history suppliers preventing historical rebalance. Key: IGNITE-10117 URL: https://issues.apache.org/jira/browse/IGNITE-10117 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.8 This is because org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager#reserveHistoryForExchange is called before org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager#beforeExchange, which restores correct partition state. {noformat} public void testHistory() throws Exception { IgniteEx crd = startGrid(0); startGrid(1); crd.cluster().active(true); awaitPartitionMapExchange(); int part = 0; List keys = loadDataToPartition(part, DEFAULT_CACHE_NAME, 100, 0, 1); forceCheckpoint(); // Prevent IGNITE-10088 stopAllGrids(); awaitPartitionMapExchange(); List keys1 = loadDataToPartition(part, DEFAULT_CACHE_NAME, 100, 100, 1); startGrid(0); startGrid(1); awaitPartitionMapExchange(); // grid0 will not provide history. } {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10112) Prioritize processing of tx finish=false message due to timeout
Alexei Scherbakov created IGNITE-10112: -- Summary: Prioritize processing of tx finish=false message due to timeout Key: IGNITE-10112 URL: https://issues.apache.org/jira/browse/IGNITE-10112 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Currently tx rollback messages are processed in the same way as others. For forced rollback on example triggered by tx timeouts on PME (see org.apache.ignite.configuration.TransactionConfiguration#getTxTimeoutOnPartitionMapExchange) they should be prioritized to avoid timeout violation under load. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10088) Partition can be restored in moving state instead of owning if node crashed before first checkpoint.
Alexei Scherbakov created IGNITE-10088: -- Summary: Partition can be restored in moving state instead of owning if node crashed before first checkpoint. Key: IGNITE-10088 URL: https://issues.apache.org/jira/browse/IGNITE-10088 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Scenario: 1. Start grid with large enough checkpoint freq, wait for rebalance, put some data. 2. Observe all partitions in OWNING state. 3. Kill of trigger FH for node before checkpoint is started. 4. Return node to grid, observe all partitions created with moving state and unnecessary rebalanced. Problem in org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#restorePartitionStates which doesn't apply owning partition state. {noformat} public void testMoving() throws Exception { IgniteEx crd = startGrid(0); startGrid(1); crd.cluster().active(true); awaitPartitionMapExchange(); stopGrid(1); awaitPartitionMapExchange(); startGrid(1); awaitPartitionMapExchange(); } {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10078) Node failure during concurrent partition updates may cause partition desync between primary and backup.
Alexei Scherbakov created IGNITE-10078: -- Summary: Node failure during concurrent partition updates may cause partition desync between primary and backup. Key: IGNITE-10078 URL: https://issues.apache.org/jira/browse/IGNITE-10078 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 This is possible if some updates with lower partition counter are not written to WAL before node failure. Scenario: 1. Start grid with 3 nodes, 2 backups. 2. Preload some data to partition P. 3. Start two concurrent transactions writing single key to the same partition, keys are different {noformat} try(Transaction tx = client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 0, 1)) { client.cache(DEFAULT_CACHE_NAME).put(k, v); tx.commit(); } {noformat} 4. Order updates on backup in the way such update with max partition counter is written to WAL and update with lesser partition counter failed due to triggering of FH before it's added to WAL 5. Return failed node to grid, observe no rebalancing due to same partition counters. Possible solution: detect gaps in update counters on recovery and force rebalance from a node without gaps if detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10040) Auto rebalance throttling
Alexei Scherbakov created IGNITE-10040: -- Summary: Auto rebalance throttling Key: IGNITE-10040 URL: https://issues.apache.org/jira/browse/IGNITE-10040 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Currently we provide a few options to control rebalance overhead, most important org.apache.ignite.configuration.CacheConfiguration#setRebalanceThrottle org.apache.ignite.configuration.IgniteConfiguration#setRebalanceThreadPoolSize In general proper option values could be only derived from load testing, which is very inconvenient. Moreover, changing the settings requires grid restart. It's desirable to implement automatic rebalance throttling defined by user configuration option, in terms of ratio between dirty pages produced by rebalance and dirty pages produced by user activity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10029) Node attributes are not restored from metastore after node restart.
Alexei Scherbakov created IGNITE-10029: -- Summary: Node attributes are not restored from metastore after node restart. Key: IGNITE-10029 URL: https://issues.apache.org/jira/browse/IGNITE-10029 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.8 Scenario: 1. Start node with enabled persistence, configure some user attributes. 2. Restart node without directly setting node attributes again. 3. Read any user attribute and observe NULL value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10027) Optimistic transaction doesn't throw TransactionTimeoutException on lock acquisition timeout.
Alexei Scherbakov created IGNITE-10027: -- Summary: Optimistic transaction doesn't throw TransactionTimeoutException on lock acquisition timeout. Key: IGNITE-10027 URL: https://issues.apache.org/jira/browse/IGNITE-10027 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.transactions; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import org.apache.ignite.Ignite; import org.apache.ignite.IgniteCheckedException; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.internal.IgniteEx; import org.apache.ignite.internal.IgniteInternalFuture; import org.apache.ignite.internal.IgniteInterruptedCheckedException; import org.apache.ignite.internal.TestRecordingCommunicationSpi; import org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareResponse; import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2; import org.apache.ignite.internal.util.typedef.X; import org.apache.ignite.internal.util.typedef.internal.U; import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi; import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import org.apache.ignite.transactions.Transaction; import org.apache.ignite.transactions.TransactionConcurrency; import org.apache.ignite.transactions.TransactionTimeoutException; import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL; import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC; import static org.apache.ignite.testframework.GridTestUtils.runAsync; import static org.apache.ignite.transactions.TransactionConcurrency.OPTIMISTIC; import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC; import static org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ; /** * Tests rollback on timeout scenarios for one-phase commit protocol. */ public class TxRollbackOnTimeoutOnePhaseCommitTest extends GridCommonAbstractTest { /** IP finder. */ private static final TcpDiscoveryVmIpFinder IP_FINDER = new TcpDiscoveryVmIpFinder(true); /** */ private static final int GRID_CNT = 2; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); ((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER); cfg.setCommunicationSpi(new TestRecordingCommunicationSpi()); boolean client = igniteInstanceName.startsWith("client"); cfg.setClientMode(client); if (!client) { CacheConfiguration ccfg = new CacheConfiguration(DEFAULT_CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); ccfg.setBackups(1); ccfg.setWriteSynchronizationMode(FULL_SYNC); ccfg.setOnheapCacheEnabled(false); cfg.setCacheConfiguration(ccfg); } return cfg; } /** {@inheritDoc} */ @Override protected void beforeTest() throws Exception { super.beforeTest(); startGridsMultiThreaded(GRID_CNT); startGrid("client"); } /** */ public void testUnlockOptimistic() throws IgniteCheckedException { IgniteEx client = grid("client"); assertNotNull(client.cache(DEFAULT_CACHE_NAME)); int key = 0; CountDownLatch lock = new CountDownLatch(1); CountDownLatch finish = new CountDownLatch(1); IgniteInternalFuture fut = runAsync(() -> { try (Transaction tx = client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 0, 1)) { client.cache(DEFAULT_CACHE_NAME).put(key, key + 1); lock.countDown(); try { assertTrue(U.await(finish, 30, TimeUnit.SECONDS));
[jira] [Created] (IGNITE-10019) Documentation: partition preloading
Alexei Scherbakov created IGNITE-10019: -- Summary: Documentation: partition preloading Key: IGNITE-10019 URL: https://issues.apache.org/jira/browse/IGNITE-10019 Project: Ignite Issue Type: Task Reporter: Alexei Scherbakov Assignee: Artem Budnikov Fix For: 2.8 We have to add documentation for partition preloading feature: IgniteCache.preloadPartition -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9998) .NET: Implement partition preload API
Alexei Scherbakov created IGNITE-9998: - Summary: .NET: Implement partition preload API Key: IGNITE-9998 URL: https://issues.apache.org/jira/browse/IGNITE-9998 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9926) Improve metada distribution speed in a scenario with concurrent updates for the same schema
Alexei Scherbakov created IGNITE-9926: - Summary: Improve metada distribution speed in a scenario with concurrent updates for the same schema Key: IGNITE-9926 URL: https://issues.apache.org/jira/browse/IGNITE-9926 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 If multiple threads start putting same object with non-existent schema in the cache simultaneously every update will trigger full propose-accept round trip in current implementation. Propose message should be send only for first update, others should wait for it's completion instead of sending messages for same schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9896) TxRollbackOnTimeoutNoDeadlockDetectionTest fails in master for many tests
Alexei Scherbakov created IGNITE-9896: - Summary: TxRollbackOnTimeoutNoDeadlockDetectionTest fails in master for many tests Key: IGNITE-9896 URL: https://issues.apache.org/jira/browse/IGNITE-9896 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.8 Example of 100% failing test: org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutNoDeadlockDetectionTest#testRollbackOnTimeoutTxServerRemapPessimisticReadCommitted -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9830) o.a.i.i.b.BinaryReaderExImpl#getOrCreateSchema sometimes misses latest metadata version resulting in failed tx commit because of missed schema.
Alexei Scherbakov created IGNITE-9830: - Summary: o.a.i.i.b.BinaryReaderExImpl#getOrCreateSchema sometimes misses latest metadata version resulting in failed tx commit because of missed schema. Key: IGNITE-9830 URL: https://issues.apache.org/jira/browse/IGNITE-9830 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9806) Legacy tx invalidation code breaks data consistency between owners.
Alexei Scherbakov created IGNITE-9806: - Summary: Legacy tx invalidation code breaks data consistency between owners. Key: IGNITE-9806 URL: https://issues.apache.org/jira/browse/IGNITE-9806 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 Reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.transactions; import java.util.UUID; import java.util.function.Supplier; import org.apache.ignite.Ignite; import org.apache.ignite.IgniteCheckedException; import org.apache.ignite.IgniteTransactions; import org.apache.ignite.cache.CacheAtomicityMode; import org.apache.ignite.cache.CacheMode; import org.apache.ignite.cache.CacheWriteSynchronizationMode; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.internal.IgniteEx; import org.apache.ignite.internal.managers.communication.GridIoPolicy; import org.apache.ignite.internal.processors.cache.GridCacheSharedContext; import org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal; import org.apache.ignite.internal.util.typedef.G; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import org.apache.ignite.testsuites.IgniteIgnore; import org.apache.ignite.transactions.Transaction; import org.apache.ignite.transactions.TransactionConcurrency; import org.apache.ignite.transactions.TransactionIsolation; import org.jetbrains.annotations.Nullable; import org.mockito.Mockito; import org.mockito.invocation.InvocationOnMock; import org.mockito.stubbing.Answer; /** * Tests data consistency if transaction is failed due to heuristic exception on originating node. */ public class TxDataConsistencyOnCommitFailureTest extends GridCommonAbstractTest { /** */ public static final int KEY = 0; /** */ public static final String CLIENT = "client"; /** */ private int nodesCnt; /** */ private int backups; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setClientMode(igniteInstanceName.startsWith(CLIENT)); cfg.setCacheConfiguration(new CacheConfiguration(DEFAULT_CACHE_NAME). setCacheMode(CacheMode.PARTITIONED). setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL). setBackups(backups). setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC)); return cfg; } /** {@inheritDoc} */ @Override protected void afterTest() throws Exception { super.afterTest(); stopAllGrids(); } /** */ @IgniteIgnore(value = "https://issues.apache.org/jira/browse/IGNITE-590;, forceFailure = false) public void testCommitErrorOnColocatedNode2PC() throws Exception { nodesCnt = 3; backups = 2; doTestCommitError(() -> primaryNode(KEY, DEFAULT_CACHE_NAME)); } /** * @param factory Factory. */ private void doTestCommitError(Supplier factory) throws Exception { Ignite crd = startGridsMultiThreaded(nodesCnt); crd.cache(DEFAULT_CACHE_NAME).put(KEY, KEY); Ignite ignite = factory.get(); if (ignite == null) ignite = startGrid("client"); assertNotNull(ignite.cache(DEFAULT_CACHE_NAME)); injectMockedTxManager(ignite); checkKey(); IgniteTransactions transactions = ignite.transactions(); try(Transaction tx = transactions.txStart(TransactionConcurrency.PESSIMISTIC, TransactionIsolation.REPEATABLE_READ, 0, 1)) { assertNotNull(transactions.tx()); ignite.cache(DEFAULT_CACHE_NAME).put(KEY, KEY + 1); tx.commit(); fail(); } catch (Exception t) { // No-op. } checkKey(); checkFutures(); } /** * @param ignite Ignite. */ private void
[jira] [Created] (IGNITE-9672) Move o.a.i.i.processors.cache.persistence.tree.io.PageMetaIO to metastore.
Alexei Scherbakov created IGNITE-9672: - Summary: Move o.a.i.i.processors.cache.persistence.tree.io.PageMetaIO to metastore. Key: IGNITE-9672 URL: https://issues.apache.org/jira/browse/IGNITE-9672 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.8 We have in current implementation special meta page related to snapshot functionality. Meta page is stored in index partition. If index.bin is removed (for triggering index rebuild), all information is lost and incremental snapshot logic is broken. Solution: move snapshot metadata in metastore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9612) Improve checkpoint mark phase speed.
Alexei Scherbakov created IGNITE-9612: - Summary: Improve checkpoint mark phase speed. Key: IGNITE-9612 URL: https://issues.apache.org/jira/browse/IGNITE-9612 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.7 I'm observing regular slow checkpoints due to long mark duration, which is not related to dirty pages number: {noformat} 2018-09-01 14:55:20.408 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=01e0c7bf-842f-4ed6-8589-b4904063434f, startPtr=FileWALPointer [idx=19814, fileOff=948996096, len=5233457], checkpointLockWait=0ms, checkpointLockHoldTime=951ms, walCpRecordFsyncDuration=39ms, pages=78477, reason='timeout'] 2018-09-01 14:55:21.307 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint finished [cpId=01e0c7bf-842f-4ed6-8589-b4904063434f, pages=78477, markPos=FileWALPointer [idx=19814, fileOff=948996096, len=5233457], walSegmentsCleared=0, walSegmentsCovered=[], *markDuration=1002m*s, pagesWrite=478ms, fsync=421ms, total=1901ms] {noformat} {noformat} 2018-09-01 14:58:20.355 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, startPtr=FileWALPointer [idx=19814, fileOff=124208, len=5233457], checkpointLockWait=0ms, checkpointLockHoldTime=926ms, walCpRecordFsyncDuration=14ms, pages=10837, reason='timeout'] 2018-09-01 14:58:20.480 [INFO ][db-checkpoint-thread-#241%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint finished [cpId=09d1f4bc-d3f3-4a16-b291-89d7fa745ea5, pages=10837, markPos=FileWALPointer [idx=19814, fileOff=124208, len=5233457], walSegmentsCleared=0, walSegmentsCovered=[], *markDuration=943ms*, pagesWrite=64ms, fsync=61ms, total=1068ms] {noformat} Debugging has revealed what this is due to large amount of work required to save metadata for metapages and free/reuse lists. Because this is done under checkpoint write lock, all other activities are blocked, resulting in increased tx and atomic ops latency. Simple solution: parallelize metadata processing during mark phase. Best way to solve the problem is described in IGNITE-9520. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9548) Transaction with short timeout is not rolled back on primary node resulting in blocked PME
Alexei Scherbakov created IGNITE-9548: - Summary: Transaction with short timeout is not rolled back on primary node resulting in blocked PME Key: IGNITE-9548 URL: https://issues.apache.org/jira/browse/IGNITE-9548 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov {noformat} 2018-09-10 12:38:24.237 [WARN ][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.apache.ignite.internal.diagnostic] Pending transactions: 2018-09-10 12:38:24.242 [WARN ][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.apache.ignite.internal.diagnostic] >>> [txVer=AffinityTopologyVersion [topVer=343, minorTopVer=0], exchWait=true, tx=GridDhtTxLocal [nearNodeId=eb94406c-a132-4998-bf22-b7d74960b866, nearFut Id=b7cff46b561-0b500010-3ed6-4b79-8cc8-65b3b3b16738, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=147809766, order=1536687716227, nodeOrder=182], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[-1934881220], recovery=false, txMap=[IgniteTxEntry [key=KeyCacheObjectImpl [part=12715, val=ucp_ids_counter_name_DP L_ucp_ids_section_name, hasValBytes=true], cacheId=-1934881220, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=12715, val=ucp_ids_counter_name_DPL_ucp_ids_section_name, hasValBytes=true], cacheId=-1934881220], val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], oldVal =[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], part=12715, super=GridDistributedCacheEntry [super=GridCach eMapEntry [key=KeyCacheObjectImpl [part=12715, val=ucp_ids_counter_name_DPL_ucp_ids_section_name, hasValBytes=true], val=CacheObjectImpl [val=null, hasValBytes=true], startVer=1536665387604, ver=GridCacheVersion [topVer=147809766, order=1536737564543, nodeOrder=33], hash =-864500235, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=[GridCacheMvccCandidate [nodeId=a4823893-be8f-4b24-abca-0a28efde604a, ver=GridCacheVersion [topVer=147809766, order=1536737580715, nodeOrder=33], threadId=887, id=43118359, topVer=AffinityTopologyVers ion [topVer=343, minorTopVer=0], reentry=null, otherNodeId=eb94406c-a132-4998-bf22-b7d74960b866, otherVer=GridCacheVersion [topVer=147809766, order=1536687716227, nodeOrder=182], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=GridCacheVersion [topVer=147809766, orde r=1536737580560, nodeOrder=33], serOrder=null, key=KeyCacheObjectImpl [part=12715, val=ucp_ids_counter_name_DPL_ucp_ids_section_name, hasValBytes=true], masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0, prevV er=null, nextVer=null]], rmts=null]], flags=2]]], prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=147809766, order=1536737580715, nodeOrder=33 , super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=147809766, order=1536737580715, nodeOrder=33], writeVer=null, implicit=false, loc=true, threadId=887, startTime=1536416065902, nodeId=a4823893-be8f-4b24-abca-0a28efde604a, startVer=GridCacheVersion [topVer=14780976 6, order=1536737580715, nodeOrder=33], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=200, sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, topVer=AffinityTopologyV ersion [topVer=343, minorTopVer=0], duration=156238330ms, onePhaseCommit=false], size=1 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9512) testRollbackOnTopologyLockPessimistic still fails on master.
Alexei Scherbakov created IGNITE-9512: - Summary: testRollbackOnTopologyLockPessimistic still fails on master. Key: IGNITE-9512 URL: https://issues.apache.org/jira/browse/IGNITE-9512 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Looks like fix in [1] was incomplete. [1] https://issues.apache.org/jira/browse/IGNITE-9401 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9445) Use valid tag for page write unlock while reading cold page from disk.
Alexei Scherbakov created IGNITE-9445: - Summary: Use valid tag for page write unlock while reading cold page from disk. Key: IGNITE-9445 URL: https://issues.apache.org/jira/browse/IGNITE-9445 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov The problem arises when passing pageId with not actual page rotation tag to org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl#acquirePage(int, long, boolean). It's not possible in advance to know the actual value without reading stored page. Such scenario may lead to locked forever page if passed and persisted tags are different. Solution - unlock page using actual(persisted) tag value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9401) Newly added testRollbackOnTopologyLockPessimistic has a race which leads to suite hang.
Alexei Scherbakov created IGNITE-9401: - Summary: Newly added testRollbackOnTopologyLockPessimistic has a race which leads to suite hang. Key: IGNITE-9401 URL: https://issues.apache.org/jira/browse/IGNITE-9401 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9386) control.sh --tx can produce confusing results when limit is set to small value
Alexei Scherbakov created IGNITE-9386: - Summary: control.sh --tx can produce confusing results when limit is set to small value Key: IGNITE-9386 URL: https://issues.apache.org/jira/browse/IGNITE-9386 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov This is happening because currently the limit is applied to primary and backup transactions, which breaks output post-filtering (removal of primary and backup transactions from output if near is present). Possible solution: apply limit only to near valid transactions. If some txs have no near part (broken tx topology), they should be always visible in output, probably with special "broken" marking. Best way to achieve this - implement tx paging on client side (using continuous mapping) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9380) Assertion in TxRollbackOnTimeoutTest
Alexei Scherbakov created IGNITE-9380: - Summary: Assertion in TxRollbackOnTimeoutTest Key: IGNITE-9380 URL: https://issues.apache.org/jira/browse/IGNITE-9380 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov {noformat} java.lang.AssertionError at org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart0(IgniteTransactionsImpl.java:182) at org.apache.ignite.internal.processors.cache.transactions.IgniteTransactionsImpl.txStart(IgniteTransactionsImpl.java:94) at org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:454) at org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254) at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) {noformat} Looks like it's possible because tx can be rolled back by very short timeout before onCreated is called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9364) SetTxTimeoutOnPartitionMapExchangeTest.java hangs on TC
Alexei Scherbakov created IGNITE-9364: - Summary: SetTxTimeoutOnPartitionMapExchangeTest.java hangs on TC Key: IGNITE-9364 URL: https://issues.apache.org/jira/browse/IGNITE-9364 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Ivan Daschinskiy Fix For: 2.7 Attachments: Ignite_Tests_2.4_Java_8_Basic_1_3255.log.zip Failed run: https://ci.ignite.apache.org/viewLog.html?buildId=1707476=IgniteTests24Java8_Basic1=buildLog -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9319) CacheAsyncOperationsFailoverTxTest.testPutAllAsyncFailover is flaky in master.
Alexei Scherbakov created IGNITE-9319: - Summary: CacheAsyncOperationsFailoverTxTest.testPutAllAsyncFailover is flaky in master. Key: IGNITE-9319 URL: https://issues.apache.org/jira/browse/IGNITE-9319 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.7 https://ci.ignite.apache.org/viewLog.html?buildId=1688647=queuedBuildOverviewTab https://ci.ignite.apache.org/viewLog.html?buildId=1688542=queuedBuildOverviewTab -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9246) Optimistic transactions can wait for topology future on remap for a long time even if timeout is set.
Alexei Scherbakov created IGNITE-9246: - Summary: Optimistic transactions can wait for topology future on remap for a long time even if timeout is set. Key: IGNITE-9246 URL: https://issues.apache.org/jira/browse/IGNITE-9246 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov This is possible if long PME is occured during tx remap phase. Fix: wait for new topology on remap with timeout if set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9208) Allow proper handling of transactions if node is stopped using stop(false)
Alexei Scherbakov created IGNITE-9208: - Summary: Allow proper handling of transactions if node is stopped using stop(false) Key: IGNITE-9208 URL: https://issues.apache.org/jira/browse/IGNITE-9208 Project: Ignite Issue Type: Improvement Affects Versions: 2.7 Reporter: Alexei Scherbakov Currently if node is stopped, on example, for maintenance using standard Ignition.stop(false) all active transactions are most likely will be rolled back or event stopped during commit leading to partition desync, which is not desirable. If cancel=false node must wait for graceful termination of all active transactions while blocking new requests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9188) Unexpected eviction leading to data lost in a scenario with stopping/restarting nodes during rebalancing
Alexei Scherbakov created IGNITE-9188: - Summary: Unexpected eviction leading to data lost in a scenario with stopping/restarting nodes during rebalancing Key: IGNITE-9188 URL: https://issues.apache.org/jira/browse/IGNITE-9188 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.7 Scenario: 1. Split grid nodes in two groups with distinct partition mapping. One group holds even partitions, other - odd. Rebalancing of even partitions is only triggered when number of nodes in grid exceeds n/2 threshold. 2. Start n/2 nodes, activate, put data into even partitions. 3. Start other n/2 nodes, change BLT, delay rebalancing of even partitions. 4. Stop newly started nodes before rebalancing is finished. Expected behavior: parttiions in "even" group will keep owning state. Actual behavior: even partitions are evicted leading to data loss. Unit test reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.distributed; import java.util.ArrayList; import java.util.Collection; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.UUID; import org.apache.ignite.Ignite; import org.apache.ignite.cache.CacheAtomicityMode; import org.apache.ignite.cache.CacheMode; import org.apache.ignite.cache.affinity.AffinityFunctionContext; import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction; import org.apache.ignite.cluster.ClusterNode; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.DataRegionConfiguration; import org.apache.ignite.configuration.DataStorageConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.configuration.WALMode; import org.apache.ignite.internal.TestRecordingCommunicationSpi; import org.apache.ignite.internal.processors.cache.GridCacheUtils; import org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition; import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage; import org.apache.ignite.internal.util.typedef.G; import org.apache.ignite.internal.util.typedef.internal.CU; import org.apache.ignite.internal.util.typedef.internal.U; import org.apache.ignite.lang.IgniteBiPredicate; import org.apache.ignite.plugin.extensions.communication.Message; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import org.jetbrains.annotations.Nullable; import static org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionState.OWNING; /** * */ public class CacheLostPartitionsRestoreStateTest extends GridCommonAbstractTest { /** */ public static final long MB = 1024 * 1024L; /** */ public static final String GRP_ATTR = "grp"; /** */ public static final int GRIDS_CNT = 6; /** */ public static final String CACHE_1 = "filled"; /** */ public static final String CACHE_2 = "empty"; /** */ public static final String EVEN_GRP = "event"; /** */ public static final String ODD_GRP = "odd"; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setCommunicationSpi(new TestRecordingCommunicationSpi()); CacheConfiguration ccfg = new CacheConfiguration("default"); ccfg.setAffinity(new RendezvousAffinityFunction(false, CacheConfiguration.MAX_PARTITIONS_COUNT)); cfg.setCacheConfiguration(ccfg); cfg.setPeerClassLoadingEnabled(true); Map attrs = new HashMap<>(); attrs.put(GRP_ATTR, grp(getTestIgniteInstanceIndex(igniteInstanceName))); cfg.setUserAttributes(attrs); DataStorageConfiguration memCfg = new DataStorageConfiguration() .setDefaultDataRegionConfiguration( new DataRegionConfiguration().setPersistenceEnabled(true).setInitialSize(50
[jira] [Created] (IGNITE-9094) Request for commit check is sent to backup nodes twice on primary node left.
Alexei Scherbakov created IGNITE-9094: - Summary: Request for commit check is sent to backup nodes twice on primary node left. Key: IGNITE-9094 URL: https://issues.apache.org/jira/browse/IGNITE-9094 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Fix For: 2.7 This causes twice as needed messages during recovery. First place: {noformat} at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxFinishRequest.(GridDhtTxFinishRequest.java:161) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.checkCommittedRequest(GridNearTxFinishFuture.java:911) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.access$400(GridNearTxFinishFuture.java:71) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture$FinishMiniFuture.onNodeLeft(GridNearTxFinishFuture.java:1005) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:820) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:741) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:479) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3354) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) at org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onDone(GridNearPessimisticTxPrepareFuture.java:409) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onDone(GridNearPessimisticTxPrepareFuture.java:58) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) at org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285) at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144) at org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture$MiniFuture.onError(GridNearPessimisticTxPrepareFuture.java:515) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture$MiniFuture.onNodeLeft(GridNearPessimisticTxPrepareFuture.java:496) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearPessimisticTxPrepareFuture.onNodeLeft(GridNearPessimisticTxPrepareFuture.java:87) at org.apache.ignite.internal.processors.cache.GridCacheMvccManager$4.onEvent(GridCacheMvccManager.java:266) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$LocalListenerWrapper.onEvent(GridEventStorageManager.java:1384) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873) at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:858) at
[jira] [Created] (IGNITE-8966) IgnitePdsContinuousRestartTest is often timed out in master
Alexei Scherbakov created IGNITE-8966: - Summary: IgnitePdsContinuousRestartTest is often timed out in master Key: IGNITE-8966 URL: https://issues.apache.org/jira/browse/IGNITE-8966 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Easily reproducible event locally. On example for testRebalancingDuringLoad_1000_2_1_1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8949) Unexpected exception after node restart during rebalance.
Alexei Scherbakov created IGNITE-8949: - Summary: Unexpected exception after node restart during rebalance. Key: IGNITE-8949 URL: https://issues.apache.org/jira/browse/IGNITE-8949 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov I've got: {noformat} Caused by: org.apache.ignite.IgniteCheckedException: Failed to process invalid partitions response (remote node reported invalid partitions but remote topology version does not differ from local) {noformat} during implicit get tx. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.
Alexei Scherbakov created IGNITE-8942: - Summary: In some cases grid cannot be deactivated because of hanging CQ internal cleanup. Key: IGNITE-8942 URL: https://issues.apache.org/jira/browse/IGNITE-8942 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Attachments: thread_dump_eip-server_2018-07-05-18-02.log See the attachment for thread dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8921) Add control.sh --cache affinity command to output current and ideal assignment and optionally show diff between them
Alexei Scherbakov created IGNITE-8921: - Summary: Add control.sh --cache affinity command to output current and ideal assignment and optionally show diff between them Key: IGNITE-8921 URL: https://issues.apache.org/jira/browse/IGNITE-8921 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Will help debugging. Ex: control.sh --cache affinity current control.sh --cache affinity ideal control.sh --cache affinity diff -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8902) GridDhtTxRemote sometimes not rolled back in one phase commit scenario.
Alexei Scherbakov created IGNITE-8902: - Summary: GridDhtTxRemote sometimes not rolled back in one phase commit scenario. Key: IGNITE-8902 URL: https://issues.apache.org/jira/browse/IGNITE-8902 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.6 Near node log: {noformat} 2018-06-28 18:37:14,541][WARN ][sys-#77] The transaction was forcibly rolled back because a timeout is reached: GridNearTxLocal[xid=c8c6b184461--0871-da69--0010, xidVersion=GridCacheVersion [topVer=141679209, order=1530218114188, nodeOrder=16], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, nodeId=36f1c741-dc02-417a-a27d-fcbc90dd8cf1, timeout=100, duration=101, label=null] {noformat} {noformat} [2018-06-28 18:37:14,560][ERROR][pool-356018-thread-1] Timeout (0 sec) is exceeded. org.apache.ignite.transactions.TransactionTimeoutException: Failed to acquire lock within provided timeout for transaction [timeout=100, tx=GridDhtTxLocal [nearNodeId=36f1c741-dc02-417a-a27d-fcbc90dd8cf1, nearFutId=a8563574461-ec96bd57-6a94-4303-8ff5-56eaac137f30, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=141679209, order=1530218114188, nodeOrder=16], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[06630e42-1c4d-4011-a388-4ec1dd1824fd], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[117538306,117541069], recovery=false, txMap=[IgniteTxEntry [key=KeyCacheObjectImpl [part=779, val=5899, hasValBytes=true], cacheId=117541069, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=779, val=5899, hasValBytes=true], cacheId=117541069], val=[op=UPDATE, val=org.apache.ignite.scenario.internal.model.SampleObject [idHash=1226505441, hash=-1035741988, balance=100051, salary=1, fields=HashMap {field19=iiwxvrhxlpwqyixvpiregkuqpxuhtuir, field17=dyyxoefmichqvstteqjkbdpgmevifvmt, field18=iakcqzxcswsxncvztsotrjrlreuvpnsv, field22=wvewstllgkwvcxxujbkqkoihudgkkyve, field23=blgtxqcnwmexardyujbibiconowvyxvh, field20=mhvicfpnmptjreacgatiyobrmvvloxic, field21=bxajcavvwuhjvpugfoqohgulihzdbymr, field26=xceztfgnlpfoyciwnvhkorrgfllveocl, field27=sxzqvvckcgxgjctmygsibtouuzkfievo, field24=lsidfhurdjgjlmkrxyqbrdjzmbcicxie, field25=vfnmohbvezajifkqiwqbdqpulnynumfz, field28=zcewigkcryznakzsyzqzfdrbhklycjer, field29=vkctdybyrmtbitxuuqdlsrilxayorjjd, field11=lbwqnwwpwgewyjvlobyqwnvifuiggzio, field12=rmxclhojshtijttdjirppbkyudpvunht, field1=gvfrrpwkhmiziaortptiytwhviwjcpcr, field31=yktxbcjiyqfpaytacoajsiybtqocmezz, field0=vcorrbnevfunwssjzckdjlbvkynbogce, field10=sawaysrchykcvutlwfvglbvrlxvwlghh, field15=udrsigcjfetptnmlcnwjgccdqfmhdabv, field16=xjyjehlldwwnpbgjjtzwozqthwoefrin, field13=hwooamfugkijverkyqyzfccxvqrqjexx, field14=doxxkivwxqdhoozzsvwkkimgswrwoegj, field7=sxomkgtpjqyqpkrbxqnuknkmpzzpxuou, field6=urnknauwekxtgfbaqmesjwllzokdyktt, field9=yqhnowhjfrfueoryqlcvdnaddueliwyr, field8=nolotdhjdfyotpcvxnrxshaheofsisnd, field3=wijyypzycilbqvjirjkorjfrazfmptrj, field2=nvznimfolbszmwiosdpyimlvnbrbmxqx, field30=xnvglxqnyseduswirxbmxnwhyxlvptch, field5=vxzgcyngwzjpopxascdyltgvxcnckzvv, field4=gnweoorjfqsbtbsbeiwronzucyzpjwje}, key=5899]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], part=779, super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=779, val=5899, hasValBytes=true], val=org.apache.ignite.scenario.internal.model.SampleObject [idHash=1532725782, hash=-640361617, balance=10, salary=1, fields=HashMap {field19=iiwxvrhxlpwqyixvpiregkuqpxuhtuir, field17=dyyxoefmichqvstteqjkbdpgmevifvmt, field18=iakcqzxcswsxncvztsotrjrlreuvpnsv, field22=wvewstllgkwvcxxujbkqkoihudgkkyve, field23=blgtxqcnwmexardyujbibiconowvyxvh, field20=mhvicfpnmptjreacgatiyobrmvvloxic, field21=bxajcavvwuhjvpugfoqohgulihzdbymr, field26=xceztfgnlpfoyciwnvhkorrgfllveocl, field27=sxzqvvckcgxgjctmygsibtouuzkfievo, field24=lsidfhurdjgjlmkrxyqbrdjzmbcicxie, field25=vfnmohbvezajifkqiwqbdqpulnynumfz, field28=zcewigkcryznakzsyzqzfdrbhklycjer, field29=vkctdybyrmtbitxuuqdlsrilxayorjjd, field11=lbwqnwwpwgewyjvlobyqwnvifuiggzio, field12=rmxclhojshtijttdjirppbkyudpvunht, field1=gvfrrpwkhmiziaortptiytwhviwjcpcr, field31=yktxbcjiyqfpaytacoajsiybtqocmezz, field0=vcorrbnevfunwssjzckdjlbvkynbogce, field10=sawaysrchykcvutlwfvglbvrlxvwlghh, field15=udrsigcjfetptnmlcnwjgccdqfmhdabv, field16=xjyjehlldwwnpbgjjtzwozqthwoefrin, field13=hwooamfugkijverkyqyzfccxvqrqjexx,
[jira] [Created] (IGNITE-8876) Deactivate before checkpoint may lead to assertion and node failture.
Alexei Scherbakov created IGNITE-8876: - Summary: Deactivate before checkpoint may lead to assertion and node failture. Key: IGNITE-8876 URL: https://issues.apache.org/jira/browse/IGNITE-8876 Project: Ignite Issue Type: Bug Environment: {noformat} 2018-06-10 17:42:34.453 [INFO ][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=04d24209-ceaf-4c05-bcaa-bfebc8c83148, startPtr=FileWALPointer [idx=690, fileOff=656836779, len=41], checkpointLockWait=0ms, checkpointLockHoldTime=0ms, walCpRecordFsyncDuration=0ms, pages=80236, reason='partition destroy'] 2018-06-10 17:42:34.470 [ERROR][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=cla ss o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: Cache group is not initialized [grpId=-1903385190]]] java.lang.AssertionError: Cache group is not initialized [grpId=-1903385190] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.destroyEvictedPartitions(GridCacheDatabaseSharedManager.java:3350) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3262) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3053) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) 2018-06-10 17:42:34.470 [ERROR][db-checkpoint-thread-#164%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM _WORKER_TERMINATION, err=java.lang.AssertionError: Cache group is not initialized [grpId=-1903385190]]] {noformat} Reporter: Alexei Scherbakov Fix For: 2.6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8873) Optimize cache scans with enabled persistence.
Alexei Scherbakov created IGNITE-8873: - Summary: Optimize cache scans with enabled persistence. Key: IGNITE-8873 URL: https://issues.apache.org/jira/browse/IGNITE-8873 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Fix For: 2.6 Currently cache scans with enabled persistence involve link resolution, which can lead to radom disk access resulting in bad performace on SAS disks. One possibility is to preload cache data pages to remove slow random disk access. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8863) Race on rollback and prepare on near tx can cause remote tx hang
Alexei Scherbakov created IGNITE-8863: - Summary: Race on rollback and prepare on near tx can cause remote tx hang Key: IGNITE-8863 URL: https://issues.apache.org/jira/browse/IGNITE-8863 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov {noformat} [16:33:56]W: [org.apache.ignite:ignite-core] [2018-06-08 13:33:56,931][WARN ][sys-#66696%client%][GridNearTxLocal] The transaction was forcibly rolled back because a timeout is reached: GridNearTxLocal[xid=e198a9fd361--0857-6387--0004, xidVersion=GridCacheVersion [topVer=139944839, order=1528464836894, nodeOrder=4], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, nodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, timeout=1, duration=11] [16:35:55]W: [org.apache.ignite:ignite-core] [2018-06-08 13:35:55,056][WARN ][grid-timeout-worker-#66394%transactions.TxRollbackOnTimeoutTest0%][diagnostic] Found long running transaction [startTime=13:33:56.931, curTime=13:35:55.054, tx=GridDhtTxRemote [nearNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, rmtFutId=af940d0e361-79c59341-3292-46e4-92ce-5c4ef4eddef8, nearXidVer=GridCacheVersion [topVer=139944839, order=1528464836894, nodeOrder=4], storeWriteThrough=false, super=GridDistributedTxRemoteAdapter [explicitVers=null, started=true, commitAllowed=0, txState=IgniteTxRemoteSingleStateImpl [entry=IgniteTxEntry [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], cacheId=3556498, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], cacheId=3556498], val=[op=CREATE, val=CacheObjectImpl [val=null, hasValBytes=true]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[], part=1, super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], val=CacheObjectImpl [val=null, hasValBytes=true], startVer=1528464836879, ver=GridCacheVersion [topVer=139944839, order=1528464836863, nodeOrder=2], hash=1, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=null, rmts=[GridCacheMvccCandidate [nodeId=97ee44cd-73c9-4e79-95df-e1a03481, ver=GridCacheVersion [topVer=139944839, order=1528464836897, nodeOrder=2], threadId=75880, id=2310313, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], reentry=null, otherNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], masks=local=0|owner=0|ready=0|reentry=0|used=0|tx=1|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null], GridCacheMvccCandidate [nodeId=97ee44cd-73c9-4e79-95df-e1a03481, ver=GridCacheVersion [topVer=139944839, order=1528464836900, nodeOrder=2], threadId=75875, id=2310317, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], reentry=null, otherNodeId=3c8d85b2-4eb9-46b2-8bd1-6f18f542fc7a, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], masks=local=0|owner=1|ready=0|reentry=0|used=1|tx=1|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null, flags=2]]], prepared=1, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], skipCompletedVers=false, super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=139944839, order=1528464836897, nodeOrder=2], writeVer=GridCacheVersion [topVer=139944839, order=1528464836898, nodeOrder=2], implicit=false, loc=false, threadId=75880, startTime=1528464836931, nodeId=97ee44cd-73c9-4e79-95df-e1a03481, startVer=GridCacheVersion [topVer=139944839, order=1528464836864, nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=1, sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=PREPARED, timedOut=false, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], duration=118123ms, onePhaseCommit=false {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8846) Optimize entry transform operations.
Alexei Scherbakov created IGNITE-8846: - Summary: Optimize entry transform operations. Key: IGNITE-8846 URL: https://issues.apache.org/jira/browse/IGNITE-8846 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov 1. For pessimistic transactions entryProcessor is invoked twice if tx entry is already exists in [1] and after lock acquistion in [2] Actually this is enough to do it only once in postLockWrite. 2. Cache entry value is not needed on near node if EntryProcessor declares Void return type. We should try to detect this in runtime or provide some kind of annotation to mark EntryProcessor not caring about return value. This will bring huge performance benefit for transactions updating large values using transformations. [1] org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal#enlistWriteEntry [2] org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter#postLockWrite -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8809) Add ability to control.sh to force rebalance for specific partitions on given nodes.
Alexei Scherbakov created IGNITE-8809: - Summary: Add ability to control.sh to force rebalance for specific partitions on given nodes. Key: IGNITE-8809 URL: https://issues.apache.org/jira/browse/IGNITE-8809 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Sometimes it's desirable to force rebalance for specific partitions on given nodes, for example, for test reasons or fixing synchronizations issues without nodes downtime. control.sh should contain new command: rebalance, which will execute the exchange request carried by new message type, containing partitions for rebalancing and mode: full (evict + move) or delta (historical, using counters). Example: control.sh --rebalance [full|delta] nodeId:p1,p2,p3 node2:p4,p5 ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8808) Improve control.sh --tx command to show local and remote transactions.
Alexei Scherbakov created IGNITE-8808: - Summary: Improve control.sh --tx command to show local and remote transactions. Key: IGNITE-8808 URL: https://issues.apache.org/jira/browse/IGNITE-8808 Project: Ignite Issue Type: Improvement Affects Versions: 2.5 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.6 Currently --tx option for control.sh shows only transactions found on near(initiating) nodes. Due to various issues it's possible to have corresponding dht local and remote transaction without near part. Such transactions must be visible to utility. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8743) TcpCommunicationSpi hangs in rare circumstances on outgoing descriptor reservation.
Alexei Scherbakov created IGNITE-8743: - Summary: TcpCommunicationSpi hangs in rare circumstances on outgoing descriptor reservation. Key: IGNITE-8743 URL: https://issues.apache.org/jira/browse/IGNITE-8743 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Relevant stack trace: {noformat} java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.reserve(GridNioRecoveryDescriptor.java:275) - locked <0x7fca4b14f560> (a org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3140) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2863) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2750) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2611) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2575) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642) at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166) at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:311) at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208) at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.loadAsync(GridDhtColocatedCache.java:389) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.loadMissing(GridNearTxLocal.java:2506) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.checkMissed(GridNearTxLocal.java:3888) at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1927) at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$4.op(GridDhtColocatedCache.java:197) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8684) Partition state exchange during rebalance continues to keep sending state messages (single,full) in loop even if no changes in partitions states
Alexei Scherbakov created IGNITE-8684: - Summary: Partition state exchange during rebalance continues to keep sending state messages (single,full) in loop even if no changes in partitions states Key: IGNITE-8684 URL: https://issues.apache.org/jira/browse/IGNITE-8684 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8651) VisorTxTask fails then printing transactions having implicit single type.
Alexei Scherbakov created IGNITE-8651: - Summary: VisorTxTask fails then printing transactions having implicit single type. Key: IGNITE-8651 URL: https://issues.apache.org/jira/browse/IGNITE-8651 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.6 org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal#mappings returns null for IgniteTxMappingsSingleImpl -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8481) VisorValidateIndexesJob works very slowly in case of many partitions/keys for each partition.
Alexei Scherbakov created IGNITE-8481: - Summary: VisorValidateIndexesJob works very slowly in case of many partitions/keys for each partition. Key: IGNITE-8481 URL: https://issues.apache.org/jira/browse/IGNITE-8481 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Alexei Scherbakov Fix For: 2.6 Attachments: ignite.zip, thrdump-server.log I tried to validate indexes using newly introduced VisorValidateIndexesTask from control.sh and found what on large data set it works very slowly. Process was not finished for 12 hours from start. Looking through a thread dump I've noticed following problems: 1. ValidateIndexesClosure works not in optimal way by doing btree lookup for each index for each entry of each partition. It should be faster to validate by scanning index tree. 2. Thread dump shows contention on acquiring segment read lock by worker pool-XXX threads, but no obvious reason for holding write lock (no load on grid) 3. org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.Segment#partGeneration generates garbage on each page access. Check attachment for log and thread dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8412) Bug with cache name in org.apache.ignite.util.GridCommandHandlerTest#testCacheContention brokes tests in security module.
Alexei Scherbakov created IGNITE-8412: - Summary: Bug with cache name in org.apache.ignite.util.GridCommandHandlerTest#testCacheContention brokes tests in security module. Key: IGNITE-8412 URL: https://issues.apache.org/jira/browse/IGNITE-8412 Project: Ignite Issue Type: Bug Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8375) NPE due to race on cache stop and timeout handler execution.
Alexei Scherbakov created IGNITE-8375: - Summary: NPE due to race on cache stop and timeout handler execution. Key: IGNITE-8375 URL: https://issues.apache.org/jira/browse/IGNITE-8375 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Alexei Scherbakov Fix For: 2.6 NPE caused by execution of method [1] during timeout handler execution [2]: cacheCfg.isLoadPreviousValue() throws NPE because cacheCfg can be nulled by [3] on stop. [1] org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture#loadMissingFromStore [2] org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.LockTimeoutObject#onTimeout [3] org.apache.ignite.internal.processors.cache.GridCacheContext#cleanup -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8360) Page recovery from WAL can be very slow.
Alexei Scherbakov created IGNITE-8360: - Summary: Page recovery from WAL can be very slow. Key: IGNITE-8360 URL: https://issues.apache.org/jira/browse/IGNITE-8360 Project: Ignite Issue Type: Improvement Components: persistence Affects Versions: 2.4 Reporter: Alexei Scherbakov Fix For: 2.6 Current implementation tries to recover corrupted page from WAL, potentially scanning all archived segments [1] If archive is very large, on example due to large history or enabled point-in-time recovery, this might take significant time preventing cache start with consequences like hanging PME. [1] org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl#tryToRestorePage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8358) Deadlock in IgnitePdsAtomicCacheRebalancingTest
Alexei Scherbakov created IGNITE-8358: - Summary: Deadlock in IgnitePdsAtomicCacheRebalancingTest Key: IGNITE-8358 URL: https://issues.apache.org/jira/browse/IGNITE-8358 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Alexei Scherbakov Fix For: 2.6 Deadlocked threads are: {noformat} [14:21:46] : [Step 3/4] # DEADLOCKED Thread [name="sys-#22788%persistence.IgnitePdsAtomicCacheRebalancingTest2%", id=25953, state=WAITING, blockCnt=0, waitCnt=2] [14:21:46] : [Step 3/4] Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@adcfad9, ownerName=exchange-worker-#22778%persistence.IgnitePdsAtomicCacheRebalancingTest2%, ownerId=25941] [14:21:46] : [Step 3/4] at sun.misc.Unsafe.park(Native Method) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.localPartitionMap(GridDhtPartitionTopologyImpl.java:1000) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager.createPartitionsSingleMessage(GridCachePartitionExchangeManager.java:1250) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:1205) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:1036) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ResendTimeoutObject$1.run(GridCachePartitionExchangeManager.java:2663) [14:21:46] : [Step 3/4] at o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6751) [14:21:46] : [Step 3/4] at o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) [14:21:46] : [Step 3/4] at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110) [14:21:46] : [Step 3/4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [14:21:46] : [Step 3/4] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [14:21:46] : [Step 3/4] at java.lang.Thread.run(Thread.java:745) [14:21:46] : [Step 3/4] [14:21:46] : [Step 3/4] Locked synchronizers: [14:21:46] : [Step 3/4] java.util.concurrent.ThreadPoolExecutor$Worker@469d36ed [14:21:46] : [Step 3/4] # DEADLOCKED Thread [name="sys-#22787%persistence.IgnitePdsAtomicCacheRebalancingTest2%", id=25952, state=WAITING, blockCnt=0, waitCnt=3] [14:21:46] : [Step 3/4] Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@3a2e9f5b, ownerName=exchange-worker-#22778%persistence.IgnitePdsAtomicCacheRebalancingTest2%, ownerId=25941] [14:21:46] : [Step 3/4] at sun.misc.Unsafe.park(Native Method) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) [14:21:46] : [Step 3/4] at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) [14:21:46] : [Step 3/4] at o.a.i.i.util.StripedCompositeReadWriteLock$WriteLock.lock0(StripedCompositeReadWriteLock.java:154) [14:21:46] : [Step 3/4] at o.a.i.i.util.StripedCompositeReadWriteLock$WriteLock.lock(StripedCompositeReadWriteLock.java:123) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.onEvicted(GridDhtPartitionTopologyImpl.java:2444) [14:21:46] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.onPartitionEvicted(GridDhtPreloader.java:433)
[jira] [Created] (IGNITE-8075) Add support for two new public methods in .NET API
Alexei Scherbakov created IGNITE-8075: - Summary: Add support for two new public methods in .NET API Key: IGNITE-8075 URL: https://issues.apache.org/jira/browse/IGNITE-8075 Project: Ignite Issue Type: Improvement Affects Versions: 2.4 Reporter: Alexei Scherbakov Assignee: Pavel Tupitsyn Fix For: 2.5 Neet to add two described method as part of .NET API. withLabel localActiveTransactions Java implementation is currently available in branch [1] [1] https://github.com/gridgain/apache-ignite/tree/ignite-6827-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8074) Allow changing of tx rollback timeout on exchange in runtime.
Alexei Scherbakov created IGNITE-8074: - Summary: Allow changing of tx rollback timeout on exchange in runtime. Key: IGNITE-8074 URL: https://issues.apache.org/jira/browse/IGNITE-8074 Project: Ignite Issue Type: Improvement Affects Versions: 2.4 Reporter: Alexei Scherbakov Fix For: 2.5 It's desirable to have the possibility changing in runtime tx rollback timeout, introduced in IGNITE-6827. Simplest implementation: use JMX method call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8000) Implicit transactions may not finish properly on unstable topology.
Alexei Scherbakov created IGNITE-8000: - Summary: Implicit transactions may not finish properly on unstable topology. Key: IGNITE-8000 URL: https://issues.apache.org/jira/browse/IGNITE-8000 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Alexei Scherbakov Fix For: 2.5 Add default tx timeout [1] to IgniteCacheMultiTxLockSelfTest test configuration. [1] c.getTransactionConfiguration().setDefaultTxTimeout(10); Looks like in some case remote tx is added to rolled back version (because partition is gone) and subsequent near request for the same tx to this node fails. This is not happen if timeouts are disabled because corresponding check is skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7915) Add transaction debugging support in JMX
Alexei Scherbakov created IGNITE-7915: - Summary: Add transaction debugging support in JMX Key: IGNITE-7915 URL: https://issues.apache.org/jira/browse/IGNITE-7915 Project: Ignite Issue Type: Improvement Reporter: Alexei Scherbakov Detailed description in IGNITE-7910, paragraph 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7914) Add transaction debugging support in control.sh
Alexei Scherbakov created IGNITE-7914: - Summary: Add transaction debugging support in control.sh Key: IGNITE-7914 URL: https://issues.apache.org/jira/browse/IGNITE-7914 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.5 Detailed description in IGNITE-7910, paragraph 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7913) Current implementation of Internal Diagnostics may cause OOM on server nodes.
Alexei Scherbakov created IGNITE-7913: - Summary: Current implementation of Internal Diagnostics may cause OOM on server nodes. Key: IGNITE-7913 URL: https://issues.apache.org/jira/browse/IGNITE-7913 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.5 If many transactions are active in grid, Internal Diagnostics can cause OOM on server nodes serving IgniteDiagnosticMessage because of heap buffering. See the stack trace demonstrating the issue: {noformat} at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1012) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:762) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:710) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.toString(GridDhtCacheEntry.java:818) at java.lang.String.valueOf(String.java:2994) at org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) at org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxEntry.toString(IgniteTxEntry.java:1267) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at java.util.AbstractMap.toString(AbstractMap.java:559) at java.lang.String.valueOf(String.java:2994) at org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) at org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:864) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxRemoteStateImpl.toString(IgniteTxRemoteStateImpl.java:180) at java.lang.String.valueOf(String.java:2994) at org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) at org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783) at org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.toString(GridDistributedTxRemoteAdapter.java:926) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxRemote.toString(GridDhtTxRemote.java:373) at java.lang.String.valueOf(String.java:2994) at org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) at org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826) at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter$TxFinishFuture.toString(IgniteTxAdapter.java:2405) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at java.util.AbstractCollection.toString(AbstractCollection.java:462) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at
[jira] [Created] (IGNITE-7910) Improve transaction debugging support
Alexei Scherbakov created IGNITE-7910: - Summary: Improve transaction debugging support Key: IGNITE-7910 URL: https://issues.apache.org/jira/browse/IGNITE-7910 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.5 Currently there is no good means to debug problematic transactions without parsing cryptic logs on whole grid. I suggest adding several improvents to mitigate the issue: 1. Add chaining method Transaction.withMeta(String) to attach transaction descrtiption. 2. Add method localActiveTransaction to IgniteTransactions interface, which will return all active near transactions for local node. 3. Extend control.sh to support retrieving active transactions information from grid nodes. By default it shows N (specified by user) transactions ordered by longest duration. For each transaction is shown: Near node id(IP, hostname) / xid / state / duration / dht topology / meta from 1 if presents It should support filtering by near node / state / duration and printing info for single tx if single xid is specified as argument. In addition to that each transaction from the list may be forcibly rolled back by xid. 4. Add mbean with same functionality as in 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7787) Better error reporting when issuing PDS corruptions.
Alexei Scherbakov created IGNITE-7787: - Summary: Better error reporting when issuing PDS corruptions. Key: IGNITE-7787 URL: https://issues.apache.org/jira/browse/IGNITE-7787 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.5 If PDS is corrupted in any way and update hits bad page shown error message is not very helping, usually something like "Failed to get page IO instance (page content is corrupted)" For corruptions related to CacheDataRowStore error should contain information about how to fix the issue: clear data for cache/group and restart node for partition reloading. For corruptions related to H2Tree (SQL indexes) error should contain suggestion to remove index.bin for broken partition and restart node allowing index to rebuild. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7648) Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
Alexei Scherbakov created IGNITE-7648: - Summary: Revert IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. Key: IGNITE-7648 URL: https://issues.apache.org/jira/browse/IGNITE-7648 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.5 IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 as a way to prevent unnecessary node drops in case of short network problems. I suppose it's wrong decision to fix it in such way. We had faced some issues in our production due to lack of automatic kicking of ill-behaving nodes (on example, hanging due to long GC pauses) until we realised the necessity of changing default behavior via property. Right solution is to kick nodes only if failure threshold is reached. Such behavior should be always enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7585) GridDhtLockFuture related memory leak
Alexei Scherbakov created IGNITE-7585: - Summary: GridDhtLockFuture related memory leak Key: IGNITE-7585 URL: https://issues.apache.org/jira/browse/IGNITE-7585 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexei Scherbakov Assignee: Alexei Scherbakov Fix For: 2.5 Attachments: memleak.jpg GridDhtLockFuture related LockTimeoutObject is not removed on commit, resulting in tx reference until timeout handler is triggered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7204) Unexpected behavior if passing null to binaryObject.field method
Alexei Scherbakov created IGNITE-7204: - Summary: Unexpected behavior if passing null to binaryObject.field method Key: IGNITE-7204 URL: https://issues.apache.org/jira/browse/IGNITE-7204 Project: Ignite Issue Type: Improvement Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.4 If assertions are disabled, when first field will be returned. If not, an AssertionError will be thrown. Reproducer: {noformat} public void testNullField() throws Exception { try { final IgniteEx ex = startGrid(0); final IgniteCachetest = ex.cache("test").withKeepBinary(); final BinaryObjectBuilder bldr = ex.binary().builder("bldr"); bldr.setField("x", 1); test.put(0, bldr.build()); test.query(new ScanQuery<>(new IgniteBiPredicate () { @Override public boolean apply(Integer o, BinaryObject o2) { final Object q = o2.field(null); return false; } })).getAll(); } finally { stopAllGrids(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7166) SQL join with partition and replicated caches fails if number of partitions is too low.
Alexei Scherbakov created IGNITE-7166: - Summary: SQL join with partition and replicated caches fails if number of partitions is too low. Key: IGNITE-7166 URL: https://issues.apache.org/jira/browse/IGNITE-7166 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.4 Reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.distributed.replicated; import java.util.List; import org.apache.ignite.Ignite; import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction; import org.apache.ignite.cache.query.FieldsQueryCursor; import org.apache.ignite.cache.query.SqlFieldsQuery; import org.apache.ignite.cache.query.annotations.QuerySqlField; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL; import static org.apache.ignite.cache.CacheMode.PARTITIONED; import static org.apache.ignite.cache.CacheMode.REPLICATED; import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC; /** * Tests non collocated join with replicated cache. */ public class IgniteCacheReplicatedJoinSelfTest extends GridCommonAbstractTest { /** */ public static final String REP_CACHE_NAME = "repCache"; /** */ public static final String PART_CACHE_NAME = "partCache"; /** */ public static final int REP_CNT = 3; /** */ public static final int PART_CNT = 10_000; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { final IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setClientMode("client".equals(igniteInstanceName)); final CacheConfiguration ccfg1 = new CacheConfiguration(PART_CACHE_NAME); ccfg1.setCacheMode(PARTITIONED); ccfg1.setAtomicityMode(TRANSACTIONAL); ccfg1.setWriteSynchronizationMode(FULL_SYNC); ccfg1.setIndexedTypes(Integer.class, PartValue.class); final CacheConfiguration ccfg2 = new CacheConfiguration(REP_CACHE_NAME); ccfg2.setAffinity(new RendezvousAffinityFunction(false, REP_CNT)); ccfg2.setCacheMode(REPLICATED); ccfg2.setAtomicityMode(TRANSACTIONAL); ccfg2.setWriteSynchronizationMode(FULL_SYNC); ccfg2.setIndexedTypes(Integer.class, RepValue.class); cfg.setCacheConfiguration(ccfg1, ccfg2); return cfg; } /** * * @throws Exception */ public void testJoinNonCollocated() throws Exception { startGridsMultiThreaded(3); final Ignite client = startGrid("client"); for (int i = 0; i < REP_CNT; i++) client.cache(REP_CACHE_NAME).put(i, new RepValue(i, "rep" + i)); for (int i = 0; i < PART_CNT; i++) client.cache(PART_CACHE_NAME).put(i, new PartValue(i, "part" + i, ((i + 1) % REP_CNT))); final FieldsQueryCursorqry = client.cache(PART_CACHE_NAME). query(new SqlFieldsQuery("select PartValue._VAL, r._VAL from PartValue, \"repCache\".RepValue as r where PartValue.repId=r.id")); final List
all = qry.getAll(); assertEquals(10_000, all.size()); for (List objects : all) { final PartValue pv = (PartValue)objects.get(0); final RepValue rv = (RepValue)objects.get(1); assertNotNull(rv); assertEquals(rv.getId(), pv.getRepId()); } } /** */ public static class PartValue { /** Id. */ @QuerySqlField private int id; /** Name. */ @QuerySqlField private String name; /** Rep id. */ @QuerySqlField private int repId; /** * @param id Id. * @param name Name. * @param repId Rep id. */ public PartValue(int id, String name, int repId) {
[jira] [Created] (IGNITE-7049) Optimistic transaction is not properly rolled back if timed out before sending prepare response.
Alexei Scherbakov created IGNITE-7049: - Summary: Optimistic transaction is not properly rolled back if timed out before sending prepare response. Key: IGNITE-7049 URL: https://issues.apache.org/jira/browse/IGNITE-7049 Project: Ignite Issue Type: Bug Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.4 Reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.transactions; import org.apache.ignite.Ignite; import org.apache.ignite.cluster.ClusterNode; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.internal.TestRecordingCommunicationSpi; import org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareResponse; import org.apache.ignite.internal.util.typedef.G; import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi; import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; import org.apache.ignite.transactions.Transaction; import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL; import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC; import static org.apache.ignite.transactions.TransactionConcurrency.OPTIMISTIC; import static org.apache.ignite.transactions.TransactionIsolation.SERIALIZABLE; /** * Tests an ability to eagerly rollback timed out optimistic transactions. */ public class TxRollbackOnTimeoutOptimisticTest extends GridCommonAbstractTest { /** */ private static final String CACHE_NAME = "test"; /** IP finder. */ private static final TcpDiscoveryVmIpFinder IP_FINDER = new TcpDiscoveryVmIpFinder(true); /** */ private static final int GRID_CNT = 3; /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); ((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER); TestRecordingCommunicationSpi commSpi = new TestRecordingCommunicationSpi(); cfg.setCommunicationSpi(commSpi); boolean client = "client".equals(igniteInstanceName); cfg.setClientMode(client); if (!client) { CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME); ccfg.setAtomicityMode(TRANSACTIONAL); ccfg.setBackups(2); ccfg.setWriteSynchronizationMode(FULL_SYNC); cfg.setCacheConfiguration(ccfg); } return cfg; } /** * @return Near cache flag. */ protected boolean nearCacheEnabled() { return false; } /** {@inheritDoc} */ @Override protected void beforeTest() throws Exception { super.beforeTest(); startGridsMultiThreaded(GRID_CNT); } /** {@inheritDoc} */ @Override protected void afterTest() throws Exception { super.afterTest(); stopAllGrids(); } /** */ public void testOptimisticTimeout() throws Exception { final Ignite client = startGrid("client"); assertNotNull(client.cache(CACHE_NAME)); final ClusterNode n0 = client.affinity(CACHE_NAME).mapKeyToNode(0); final Ignite prim = G.ignite(n0.id()); for (Ignite ignite : G.allGrids()) { if (ignite == prim) continue; final TestRecordingCommunicationSpi spi = (TestRecordingCommunicationSpi)ignite.configuration().getCommunicationSpi(); spi.blockMessages(GridDhtTxPrepareResponse.class, prim.name()); } final int val = 0; try { multithreaded(new Runnable() { @Override public void run() { try (Transaction txOpt = client.transactions().txStart(OPTIMISTIC, SERIALIZABLE, 300, 1)) { client.cache(CACHE_NAME).put(val, val); txOpt.commit();
[jira] [Created] (IGNITE-6998) Activation on bigger topology with enabled persistence doesn't work as expected.
Alexei Scherbakov created IGNITE-6998: - Summary: Activation on bigger topology with enabled persistence doesn't work as expected. Key: IGNITE-6998 URL: https://issues.apache.org/jira/browse/IGNITE-6998 Project: Ignite Issue Type: Bug Components: cache, persistence Affects Versions: 2.3 Reporter: Alexei Scherbakov Fix For: 2.4 Reproducer: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.ignite.internal.processors.cache.persistence; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.ignite.Ignite; import org.apache.ignite.IgniteCheckedException; import org.apache.ignite.cache.CacheWriteSynchronizationMode; import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction; import org.apache.ignite.cache.query.ScanQuery; import org.apache.ignite.cluster.ClusterNode; import org.apache.ignite.configuration.CacheConfiguration; import org.apache.ignite.configuration.IgniteConfiguration; import org.apache.ignite.configuration.MemoryConfiguration; import org.apache.ignite.configuration.MemoryPolicyConfiguration; import org.apache.ignite.configuration.PersistentStoreConfiguration; import org.apache.ignite.configuration.WALMode; import org.apache.ignite.internal.IgniteEx; import org.apache.ignite.internal.IgniteKernal; import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion; import org.apache.ignite.internal.processors.cache.IgniteInternalCache; import org.apache.ignite.internal.util.typedef.G; import org.apache.ignite.internal.util.typedef.internal.U; import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi; import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder; import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder; import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest; /** * Check correctness of activation on bigger topology. */ public class IgnitePdsActivationOnBiggerTopologyTest extends GridCommonAbstractTest { /** */ private static TcpDiscoveryIpFinder ipFinder = new TcpDiscoveryVmIpFinder(true); /** {@inheritDoc} */ @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception { IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName); cfg.setMemoryConfiguration(new MemoryConfiguration().setDefaultMemoryPolicyName("d"). setPageSize(1024).setMemoryPolicies(new MemoryPolicyConfiguration().setName("d"). setInitialSize(50 * 1024 * 1024L).setMaxSize(50 * 1024 * 1024))); cfg.setPersistentStoreConfiguration(new PersistentStoreConfiguration().setWalMode(WALMode.LOG_ONLY)); ((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(ipFinder); CacheConfigurationccfg = new CacheConfiguration<>(DEFAULT_CACHE_NAME); ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); ccfg.setAffinity(new RendezvousAffinityFunction(false, 64)); cfg.setCacheConfiguration(ccfg); return cfg; } /** {@inheritDoc} */ @Override protected void beforeTest() throws Exception { super.beforeTest(); deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), "db", false)); } /** {@inheritDoc} */ @Override protected void afterTest() throws Exception { stopAllGrids(); deleteRecursively(U.resolveWorkDirectory(U.defaultWorkDirectory(), "db", false)); super.afterTest(); } /** */ public void testActivationOnBiggerTopology() throws Exception { IgniteEx ignite = (IgniteEx)startGridsMultiThreaded(2); final int keysCnt = 1_000; for (int i = 0; i < keysCnt; i++) ignite.cache(DEFAULT_CACHE_NAME).put(i, i); forceCheckpoint(); assertEquals("Wrong size (before restart)", keysCnt, ignite.cache(DEFAULT_CACHE_NAME).size()); assertEquals("Wrong size for scan (before restart)", keysCnt, ignite.cache(DEFAULT_CACHE_NAME).query(new