[jira] [Created] (IGNITE-14619) Refactoring of GridDeploymentCommunication
Denis Chudov created IGNITE-14619: - Summary: Refactoring of GridDeploymentCommunication Key: IGNITE-14619 URL: https://issues.apache.org/jira/browse/IGNITE-14619 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication#sendResourceRequest uses "while" loop with mutex instead of future, and creates listeners for discovery events and communication messages for each request. This complicates the code and may affect class loading performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14530) Add offline utility for analyzing WAL
Denis Chudov created IGNITE-14530: - Summary: Add offline utility for analyzing WAL Key: IGNITE-14530 URL: https://issues.apache.org/jira/browse/IGNITE-14530 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov It would be useful for investigating problems of data consistency, PDS and other to have the possibility of reading WAL files in offline. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14529) Add offline utility for analyzing indexes
Denis Chudov created IGNITE-14529: - Summary: Add offline utility for analyzing indexes Key: IGNITE-14529 URL: https://issues.apache.org/jira/browse/IGNITE-14529 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov It would be useful to have the possibility for indexes offline validation. The utility must check that every configured index (available in MetaTree [1]) is reachable from corresponding root page, tree structure is valid and no orphan(unreachable) index pages are present in the index.bin persistent page store. [1] org.apache.ignite.internal.processors.cache.persistence.IndexStorageImpl#getIndexNames -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14528) AssertionError in GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture:1528
Denis Chudov created IGNITE-14528: - Summary: AssertionError in GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture:1528 Key: IGNITE-14528 URL: https://issues.apache.org/jira/browse/IGNITE-14528 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov the fall of the node during a rebalance {code:java} 2021-01-31 07:45:29.174[ERROR][exchange-worker-#168%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: RebalanceFuture [grp=CacheGroupContext [grp=CACHEGROUP_OBJECT_TO_EVICTION_REGISTRY], topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], rebalanceId=27299, routines=22, receivedBytes=0, receivedKeys=0, partitionsLeft=1659, startTime=-1, endTime=-1, lastCancelledTime=1612068328290, result=true]]] java.lang.AssertionError: RebalanceFuture [grp=CacheGroupContext [grp=CACHEGROUP_OBJECT_TO_EVICTION_REGISTRY], topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], rebalanceId=27299, routines=22, receivedBytes=0, receivedKeys=0, partitionsLeft=1659, startTime=-1, endTime=-1, lastCancelledTime=1612068328290, result=true] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture(GridDhtPartitionDemander.java:1528) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.finishPreloading(GridDhtPartitionDemander.java:2064) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.finishPreloading(GridDhtPreloader.java:577) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:419) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:3133) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3280) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3195) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14474) Improve error message in case rebalance fails
Denis Chudov created IGNITE-14474: - Summary: Improve error message in case rebalance fails Key: IGNITE-14474 URL: https://issues.apache.org/jira/browse/IGNITE-14474 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Currently we can get a message like this when rebalance fails with an exception (examples from ignite 2.5, in newer versions the log messages were changed but the problem is still actual): {code:java} 2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller 2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander] Cancelled rebalancing [grp=ignite-sys-cache, supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], time=88 ms] 2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to unmarshal object with optimized marshaller {code} In the case above, a marshalling exception leads to rebalance failure which will never be resolved - i.e. the cluster enters into a erroneous state. We should report issues like this as ERROR. The message should explain that the rebalance has failed, data for the cache was not fully copied to the node, the backup factor is not recovered and the cluster may not work correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14425) Hang transactions in FINISH [COMMIT] phase when сommunication spi is blocked
Denis Chudov created IGNITE-14425: - Summary: Hang transactions in FINISH [COMMIT] phase when сommunication spi is blocked Key: IGNITE-14425 URL: https://issues.apache.org/jira/browse/IGNITE-14425 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov scenario: >From a client for two concurrent transactions on a single key. At the same time, the GridNearTxFinishRequest message is blocked from the client, a partial network failure is emulated, as a result, one of the transactions is not completed even if the node is no longer working Reproducer: (insert this test into *TxRollbackOnTimeoutTest*) {code:java} /** * */ @Test public void testRollbackOnNearNodeLeft() throws Exception { Ignite client = startClient(); Integer pk = primaryKey(grid(0).cache(CACHE_NAME)); CountDownLatch locked = new CountDownLatch(1); CountDownLatch blocked = new CountDownLatch(1); IgniteInternalFuture fut = runAsync(new Callable() { @Override public Void call() throws Exception { try (Transaction tx0 = client.transactions().txStart()) { client.cache(CACHE_NAME).put(pk, 0); locked.countDown(); U.awaitQuiet(blocked); tx0.commit(); } catch (Exception e) { // Ignored. } return null; } }); IgniteInternalFuture fut2 = runAsync(new Runnable() { @Override public void run() { try (Transaction tx1 = client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 1000, 0)) { U.awaitQuiet(locked); TestRecordingCommunicationSpi.spi(client).blockMessages(new IgniteBiPredicate() { @Override public boolean apply(ClusterNode clusterNode, Message msg) { return msg instanceof GridNearTxFinishRequest; } }); TestRecordingCommunicationSpi.spi(grid(0)).blockMessages(new IgniteBiPredicate() { @Override public boolean apply(ClusterNode clusterNode, Message msg) { return msg instanceof GridNearLockResponse; } }); client.cache(CACHE_NAME).put(pk, 1); fail(); } catch (Exception e) { assertTrue(X.hasCause(e, TransactionTimeoutException.class)); } } }); TestRecordingCommunicationSpi.spi(client).waitForBlocked(); TestRecordingCommunicationSpi.spi(grid(0)).waitForBlocked(); fut2.get(); client.close(); TestRecordingCommunicationSpi.spi(grid(0)).stopBlock(); blocked.countDown(); fut.get(); assertTrue(grid(0).context().cache().context().tm().activeTransactions().isEmpty()); } {code} As the result, transaction hangs on server node in MARKED_ROLLBACK state forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14423) Node failure caused by AssertionError: Transaction does not own lock for update
Denis Chudov created IGNITE-14423: - Summary: Node failure caused by AssertionError: Transaction does not own lock for update Key: IGNITE-14423 URL: https://issues.apache.org/jira/browse/IGNITE-14423 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Scenario: 1. Start 3 servers 2. Start 2 clients 3. Start two OPTIMISTIC transactions with the same key from different client nodes 4. Transfer transactions to PREPARED STATE on primary node 5. Stop one client node(whose transaction changed state to PREPARED last) Error in log: {code:java} [2021-03-03 08:52:59,807][ERROR][sys-#499%transactions.TxRecoveryWithConcurrentRollbackTest1%][GridNearTxLocal] Failed completing the transaction: [commit=true, tx=GridDhtTxLocal [nearNodeId=29551b46-74ef-4e35-af4a-d97809cc5260, nearFutId=08b25a6f771-eec14fa0-02ef-46b9-97d9-0f2b7c851ddb, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=226230774, order=1614750773047, nodeOrder=4], lb=tx, super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[1544803905], recovery=false, mvccEnabled=false, mvccCachingCacheIds=[], txMap=ArrayList [IgniteTxEntry [txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], cacheId=1544803905], val=TxEntryValueHolder [val=CacheObjectImpl [val=null, hasValBytes=true], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=ReaderId[] [], part=1, super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=226230774, order=1614750774035, nodeOrder=2], hash=1, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=LinkedList [GridCacheMvccCandidate [nodeId=b1f1a8e0-e1e8-4084-b9c6-bdd2a271, ver=GridCacheVersion [topVer=226230774, order=1614750774033, nodeOrder=2], threadId=364, id=5, topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], reentry=null, otherNodeId=207c7f09-21a9-4b99-b631-3304a522b002, otherVer=GridCacheVersion [topVer=226230774, order=1614750774032, nodeOrder=5], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], rmts=null]], flags=3]]], prepared=1, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]]], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=226230774, order=1614750774035, nodeOrder=2], writeVer=null, implicit=false, loc=true, threadId=686, startTime=1614750774753, nodeId=b1f1a8e0-e1e8-4084-b9c6-bdd2a271, isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=5000, sysInvalidate=true, sys=false, plc=2, commitVer=GridCacheVersion [topVer=226230774, order=1614750774035, nodeOrder=2], finalizing=RECOVERY_FINISH, invalidParts=null, state=UNKNOWN, timedOut=false, topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], mvccSnapshot=null, skipCompletedVers=false, parentTx=null, duration=5048ms, onePhaseCommit=false], size=1 class org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: Committing a transaction has produced runtime exception at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:813) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:969) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitAsync(GridDhtTxLocal.java:542) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.finishTxOnRecovery(IgniteTxManager.java:2370) at
[jira] [Created] (IGNITE-13418) Deadlock on multiple cache delete
Denis Chudov created IGNITE-13418: - Summary: Deadlock on multiple cache delete Key: IGNITE-13418 URL: https://issues.apache.org/jira/browse/IGNITE-13418 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Thread 1: - acquires checkpoint read lock in GridCacheProcessor#processCacheStopRequestOnExchangeDone - acquires GridQueryProcessor#stateMux in GridQueryProcessor.onCacheStop - enters H2TreeIndex.destroy - releases checkpoint read lock in H2Tree.temporaryReleaseLock, can't take it again because of db-checkpoint-thread Thread 2: - acquires checkpoint read lock in GridCacheProcessor#processCacheStopRequestOnExchangeDone - trying to acquire GridQueryProcessor#stateMux in GridQueryProcessor.onCacheStop which is held by thread 1 db-checkpoint-thread: - trying to acquire checkpoint write lock, can't do it because of Thread 2 Decision: H2Tree.temporaryReleaseLock should release lock only in case when tree deletion is asynchronous (H2TreeIndex.destroy is called with async=true), i.e. it happens inside of DurableBackgroundTask. Such tasks are executed in separate threads, which don't hold any other locks. Thread dump: {code:java} Thread [name="sys-#1220%DPL_GRID%DplGridNodeName%", id=3200, state=BLOCKED, blockCnt=1, waitCnt=0] Lock [object=java.lang.Object@6a9a92ba, ownerName=sys-#1215%DPL_GRID%DplGridNodeName%, ownerId=3195] at o.a.i.i.processors.query.GridQueryProcessor.onCacheStop0(GridQueryProcessor.java:1695) at o.a.i.i.processors.query.GridQueryProcessor.onCacheStop(GridQueryProcessor.java:902) at o.a.i.i.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1169) at o.a.i.i.processors.cache.GridCacheProcessor.prepareCacheStop(GridCacheProcessor.java:2644) at o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$629e8679$1(GridCacheProcessor.java:2803) at o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$620/1418386924.apply(Unknown Source) at o.a.i.i.util.IgniteUtils.lambda$null$1(IgniteUtils.java:10879) at o.a.i.i.util.IgniteUtils$$Lambda$436/321848940.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked synchronizers: java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2582f93c java.util.concurrent.ThreadPoolExecutor$Worker@4da1cafe Thread [name="sys-#1215%DPL_GRID%DplGridNodeName%", id=3195, state=BLOCKED, blockCnt=4, waitCnt=437520] Lock [object=o.a.i.i.processors.failure.FailureProcessor@78edb1e9, ownerName=async-durable-background-task-executor-1-#1222%DPL_GRID%DplGridNodeName%, ownerId=3202] at o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:162) at o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:151) at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1787) at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1722) at o.a.i.i.processors.query.h2.database.H2Tree.temporaryReleaseLock(H2Tree.java:690) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.temporaryReleaseLock(BPlusTree.java:2367) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2548) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroy(BPlusTree.java:2441) at o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroy(BPlusTree.java:2392) at o.a.i.i.processors.query.h2.database.H2TreeIndex.destroy0(H2TreeIndex.java:671) at o.a.i.i.processors.query.h2.database.H2TreeIndex.destroy(H2TreeIndex.java:639) at o.a.i.i.processors.query.h2.opt.GridH2Table.destroy(GridH2Table.java:567) at o.a.i.i.processors.query.h2.H2TableDescriptor.onDrop(H2TableDescriptor.java:347) at o.a.i.i.processors.query.h2.H2Schema.drop(H2Schema.java:127) at o.a.i.i.processors.query.h2.IgniteH2Indexing.unregisterCache(IgniteH2Indexing.java:2595) at o.a.i.i.processors.query.GridQueryProcessor.onCacheStop0(GridQueryProcessor.java:1727) - locked java.lang.Object@6a9a92ba at o.a.i.i.processors.query.GridQueryProcessor.onCacheStop(GridQueryProcessor.java:902) at o.a.i.i.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1169) at
[jira] [Created] (IGNITE-12760) Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left
Denis Chudov created IGNITE-12760: - Summary: Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left Key: IGNITE-12760 URL: https://issues.apache.org/jira/browse/IGNITE-12760 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Following assertion error triggers failure handler and crashes the node. Can possibly crash the whole cluster. {code:java} 2020-02-18 14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager] Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, msg=GridCacheQueryRequest \[id=178, cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, incMeta=false, all=false, keepBinary=true, subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean \[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, userVer=0, locDepOwner=false, participants=null], lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], err=null, skipPrepare=false java.lang.AssertionError: null at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.(GridCacheDeploymentManager.java:918) at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.(GridCacheDeploymentManager.java:889) at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130) at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} There is no fair reproducer for now, but it seems that we should prevent such situation in general like following: 1) check the correctness of the message before it will be sent - inside of GridCacheDeploymentManager#prepare. If we have the corresponding class loader on local node, we can try to fix message and replace wrong class loader with local one. 2) log suspicious deployments which we receive from GridDeploymentManager#deploy - maybe we have obsolete deployments in caches. 3) possibly we can remove this assertion, we should have this class on sender node and use it as class loader id, and if we don't, we will receive exception on finishUnmarshall (Failed to peer load class) and try to process this situation with GridCacheIoManager#processFailedMessage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12612) Failing test IoStatisticsBasicIndexSelfTest.testMetricRegistryRemovedOnIndexDrop after IGNITE-12496
Denis Chudov created IGNITE-12612: - Summary: Failing test IoStatisticsBasicIndexSelfTest.testMetricRegistryRemovedOnIndexDrop after IGNITE-12496 Key: IGNITE-12612 URL: https://issues.apache.org/jira/browse/IGNITE-12612 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov In IGNITE-12496 a new method of destroying index was added (H2TreeIndex#asyncDestroy), and there are no changes from IGNITE-11987 in it, the changes are only present in old H2TreeIndex#destroy. We should apply changes to new method. Also it would be better to refactor both methods to prevent such errors in future. Failing test history [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1153260173537232526=%3Cdefault%3E=testDetails] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12516) Dump active transaction from near node does not work if transaction not follow first
Denis Chudov created IGNITE-12516: - Summary: Dump active transaction from near node does not work if transaction not follow first Key: IGNITE-12516 URL: https://issues.apache.org/jira/browse/IGNITE-12516 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov For this code: {code:java} for (IgniteInternalTx tx : tm.activeTransactions()) { if (curTime - tx.startTime() > timeout) { found = true; if (warnings.canAddMessage()) { warnings.add(longRunningTransactionWarning(tx, curTime)); if (ltrDumpLimiter.allowAction(tx)) dumpLongRunningTransaction(tx); } else warnings.incTotal(); } } {code} you can see, if transaction is not ACTIVE dumping closure will skipped. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12496) Index deletion blocks checkpoint for all of its duration, which can cause "Critical system error: system critical thread blocked"
Denis Chudov created IGNITE-12496: - Summary: Index deletion blocks checkpoint for all of its duration, which can cause "Critical system error: system critical thread blocked" Key: IGNITE-12496 URL: https://issues.apache.org/jira/browse/IGNITE-12496 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov GridH2Table#removeIndex(Session, Index) acquires checkpoint read lock and releases it only after full completion of deletion process. It happens because H2TreeIndex#destroy requires to be run when checkpoint lock is held. Meanwhile, checkpoint thread stops on Checkpointer#markCheckpointBegin, trying to acquire write lock, and stays locked for all the time of index deletion. The possible fix is that checkpoint read lock is periodically released while index deletion is in progress. To avoid persistence corruption in case of node crush in the middle of the process, we should put index root into some persistent structure like index meta tree and remember it as "pending delete". Then we must delete tree pages from leafs to root, this allows to avoid links to deleted pages. When deletion is complete, tree root can be removed from "pending delete". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12340) Extend test coverage of ability to track system/user time held in transaction
Denis Chudov created IGNITE-12340: - Summary: Extend test coverage of ability to track system/user time held in transaction Key: IGNITE-12340 URL: https://issues.apache.org/jira/browse/IGNITE-12340 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov Test coverage of this feature for now does not cover all use cases and should be improved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12167) I think we should add logic for clearing custom pageHndWrapper in afterTestsStopped() in GridCommonAbstractTest or GridAbstractTest https://ggtc.gridgain.com/viewLog.ht
Denis Chudov created IGNITE-12167: - Summary: I think we should add logic for clearing custom pageHndWrapper in afterTestsStopped() in GridCommonAbstractTest or GridAbstractTest https://ggtc.gridgain.com/viewLog.html?buildId=2364333 Key: IGNITE-12167 URL: https://issues.apache.org/jira/browse/IGNITE-12167 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov We should move logic for clearing custom pageHndWrapperin {{afterTestsStopped()}} in {{GridCommonAbstractTest}} or {{GridAbstractTest to ensure that we will not have problems in any tests which set custom wrapper}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12165) Negative time in Transaction time dump
Denis Chudov created IGNITE-12165: - Summary: Negative time in Transaction time dump Key: IGNITE-12165 URL: https://issues.apache.org/jira/browse/IGNITE-12165 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov After implementing ticket https://ggsystems.atlassian.net/browse/GG-21272 we have transaction dumps in the logs. there are some issues with information in these dumps: {code:java} [11:53:36,154][INFO][snapshot-scheduler-restats-#69][GridNearTxLocal] Transaction time dump [startTime=11:53:36.081, totalTime=65, systemTime=3, userTime=62, cacheOperationsTime=-41943785508, rollbackTime=41943785512, tx=GridNearTxLocal [mappings=IgniteTxMappingsImpl [], nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=true, trackTimeout=false, systemTime=3953576, systemStartTime=0, prepareStartTime=0, prepareTime=0, commitOrRollbackStartTime=0, commitOrRollbackTime=41943785512318555, txDumpsThrottling=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxDumpsThrottling@760d8025, lb=null, thread=snapshot-scheduler-restats-#69, mappings=IgniteTxMappingsImpl [], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[-2100569601], recovery=false, txMap=HashSet [IgniteTxEntry [key=KeyCacheObjectImpl [part=2, val=SnapshotScheduleKey [id=_SCHEDULES_], hasValBytes=true], cacheId=-2100569601, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=2, val=SnapshotScheduleKey [id=_SCHEDULES_], hasValBytes=true], cacheId=-2100569601], val=[op=READ, val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=true, entry=GridCacheMapEntry [key=KeyCacheObjectImpl [part=2, val=SnapshotScheduleKey [id=_SCHEDULES_], hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=886348002, extras=null, flags=0]GridDistributedCacheEntry [super=]GridDhtDetachedCacheEntry [super=], prepared=0, locked=true, nodeId=1adbae78-40fe-480d-803d-4d498919ae63, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=179672005, order=1568192005565, nodeOrder=1, super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=179672005, order=1568192005565, nodeOrder=1], writeVer=null, implicit=false, loc=true, threadId=125, startTime=1568192016081, nodeId=9e00ae45-4084-4f29-949f-c19c933f4299, startVer=GridCacheVersion [topVer=179672005, order=1568192005565, nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=true, plc=5, commitVer=null, finalizing=NONE, invalidParts=null, state=ROLLED_BACK, timedOut=false, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], duration=60ms, onePhaseCommit=false], size=1 {code} For example: # Negative time: cacheOperationsTime=-41943785508, # Huge times: rollbackTime=41943785512 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12158) Failing test RebuildIndexLogMessageTest on ignite-2.5-master
Denis Chudov created IGNITE-12158: - Summary: Failing test RebuildIndexLogMessageTest on ignite-2.5-master Key: IGNITE-12158 URL: https://issues.apache.org/jira/browse/IGNITE-12158 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov [https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8==testDetails=-1672212793500055752=START_DATE_DESC_IgniteTests24Java8=pull%2F4611%2Fhead=500] Test is constantly failing. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IGNITE-12112) H2TreeIndex should throw CorruptTreeException with cacheId, cacheName and indexName
Denis Chudov created IGNITE-12112: - Summary: H2TreeIndex should throw CorruptTreeException with cacheId, cacheName and indexName Key: IGNITE-12112 URL: https://issues.apache.org/jira/browse/IGNITE-12112 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov At the moment we can't define problem cache, If we have many caches in one cache group and CorruptTreeException was thrown CorruptTreeException. For example: {code:java} org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on row: Row@346ba4aa[ key: 29003, val: com.sbt.acquiring.processing.entities.dictionaries.TurnoverRange_DPL_PROXY [idHash=492721050, hash=-616731792, valueStart=2001, isImmutable=false, lastChangeDate=1560917668719, name=Ñâûøå 20, valueEnd=null, changeState=null, id=29003, pkey=7c0d0d59-f33e-41d7-a5c9-4d8c7e74b976, ownerId=acquiring-processing-replication, base=true], ver: GridCacheVersion [topVer=172396060, order=1560917613306, nodeOrder=2] ][ Ñâûøå 20, null, 2001, TRUE, null, 29003, 7c0d0d59-f33e-41d7-a5c9-4d8c7e74b976 ]" [5-195] at org.h2.message.DbException.get(DbException.java:168) at org.h2.message.DbException.convert(DbException.java:295) at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:251) at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:548) at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:480) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:709) at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1863) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:403) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:1402) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1263) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1625) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:358) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3629) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2793) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:902) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:772) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:344) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:418) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:408) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1061) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:586) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1624) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752) at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at
[jira] [Created] (IGNITE-12070) Document the new ability to track system/user time of transactions
Denis Chudov created IGNITE-12070: - Summary: Document the new ability to track system/user time of transactions Key: IGNITE-12070 URL: https://issues.apache.org/jira/browse/IGNITE-12070 Project: Ignite Issue Type: Task Components: documentation Reporter: Denis Chudov Now there is ability to track system/user time of transactions. System time is the time that is spent for system activities - i.e. time while aquiring locks, preparing, commiting, etc.User time is the time that is spent for user activities when client node runs some code while holding transaction. We have ability to log info about transactions that exceed some threshold execution timeout, or some percentage of all transactions. Log record in case of long-running transactions looks like following: {code:java} [2019-08-09 13:39:49,130][WARN ][sys-stripe-1-#101%client%][root] Long transaction time dump [startTime=13:39:47.970, totalTime=1160, systemTime=157, userTime=1003, cacheOperationsTime=141, prepareTime=15, commitTime=0, tx=GridNearTxLocal [...]] {code} In case of sampling of all transactions: {code:java} [2019-08-09 13:39:54,079][INFO ][sys-stripe-2-#102%client%][root] Transaction time dump [startTime=13:39:54.063, totalTime=15, systemTime=6, userTime=9, cacheOperationsTime=2, prepareTime=3, commitTime=0, tx=GridNearTxLocal [...]] {code} Also some of transactions can be skipped to not overflow the log, information about this log throttling looks like this: {code:java} [2019-08-09 13:39:55,109][INFO ][sys-stripe-0-#100%client%][root] Transaction time dumps skipped because of log throttling: 2 {code} There are JMX parameters and JVM options to control this behavior: 1) JVM option: IGNITE_LONG_TRANSACTION_TIME_DUMP_THRESHOLD JMX parameter: TransactionsMXBean.longTransactionTimeDumpThreshold Threshold timeout in milliseconds for long transactions, if transaction exceeds it, it will be dumped in log with information about how much time did it spent in system time and user time. Default value is 0. No info about system/user time of long transactions is dumped in log if this parameter is not set. 2) JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesCoefficient The coefficient for samples of completed transactions that will be dumped in log. Must be float value between 0.0 and 1.0 inclusive. Default value is 0.0. 3) JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_PER_SECOND_LIMIT JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesPerSecondLimit The limit of samples of completed transactions that will be dumped in log per second, if IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT is above 0.0. Must be integer value greater than 0. Default value is 5. For the existing long running transaction warning was added information about current system and user time of transaction: {code:java} [2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] First 10 long running transactions [total=1] [2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] >>> Transaction [startTime=14:10:31.170, curTime=14:10:31.750, systemTime=32, userTime=548, tx=GridNearTxLocal [...]] {code} Also added following metrics to monitor system and user time for single node: diagnostic.transactions.totalNodeSystemTime - Total transactions system time on node. diagnostic.transactions.totalNodeUserTime - Total transactions user time on node. diagnostic.transactions.nodeSystemTimeHistogram - Transactions system times on node represented as histogram. diagnostic.transactions.nodeUserTimeHistogram - Transactions user times on node represented as histogram. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-12063) Add ability to track system/user time held in transaction
Denis Chudov created IGNITE-12063: - Summary: Add ability to track system/user time held in transaction Key: IGNITE-12063 URL: https://issues.apache.org/jira/browse/IGNITE-12063 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov Fix For: 2.8 We should dump user/system times in transaction to log on commit/rollback, if duration of transaction more then threshold. I want to see in log on tx coordinator node: # Transaction duration # System time: #* How long we were getting locks on keys? #* How long we were preparing transaction? #* How long we were commiting transaction? # User time (transaction time - total system time) # Transaction status (commit/rollback) The threshold could be set by system property and overwrite by JMX. We shouldn't dump times, if the property not set. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-12047) senderNodeId is absent in StatusCheckMessage
Denis Chudov created IGNITE-12047: - Summary: senderNodeId is absent in StatusCheckMessage Key: IGNITE-12047 URL: https://issues.apache.org/jira/browse/IGNITE-12047 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Fix For: 2.8 TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized {code:java} [2019-07-29 02:42:11,609][ERROR][tcp-disco-sock-reader-[]-#21611%tcp.TcpDiscoveryCoordinatorFailureTest4%][TcpDiscoverySpi] Failed to initialize connection (this can happen due to short time network problems and can be ignored if does not affect node discovery) [sock=Socket[addr=/127.0.0.1,port=44033,localport=47504]] java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:6464) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:60) Exception in thread "disco-pool-#62924%tcp.TcpDiscoveryCoordinatorFailureTest4%" Exception in thread "disco-pool-#62925%tcp.TcpDiscoveryCoordinatorFailureTest4%" java.lang.AssertionError at org.apache.ignite.spi.discovery.tcp.ServerImpl.pingNode(ServerImpl.java:723) at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$4000(ServerImpl.java:195) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker$8.run(ServerImpl.java:5598) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.AssertionError at org.apache.ignite.spi.discovery.tcp.ServerImpl.pingNode(ServerImpl.java:723) at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$4000(ServerImpl.java:195) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker$8.run(ServerImpl.java:5598) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11979) Add ability to set default parallelizm of rebuild indexes in configuration
Denis Chudov created IGNITE-11979: - Summary: Add ability to set default parallelizm of rebuild indexes in configuration Key: IGNITE-11979 URL: https://issues.apache.org/jira/browse/IGNITE-11979 Project: Ignite Issue Type: Improvement Reporter: Denis Chudov Assignee: Denis Chudov We can't change SchemaIndexCacheVisitorImpl#DFLT_PARALLELISM at the moment: {code:java} /** Default degree of parallelism. */ private static final int DFLT_PARALLELISM = Math.min(4, Math.max(1, Runtime.getRuntime().availableProcessors() / 4)); {code} On huge servers with a lot of cores (such as 56) we will rebuild indexes in 4 threads. I think we should have ability to set DFLT_PARALLELISM in Ignite configuration. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11958) JDBC connection validation should use it's own task instead of cache validation task
Denis Chudov created IGNITE-11958: - Summary: JDBC connection validation should use it's own task instead of cache validation task Key: IGNITE-11958 URL: https://issues.apache.org/jira/browse/IGNITE-11958 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Fix For: 2.8 JDBC connection is validated using GridCacheQueryJdbcValidationTask. We should create own validation task for this activity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11955) Fix control.sh issues related to IGNITE-11876 and IGNITE-11913
Denis Chudov created IGNITE-11955: - Summary: Fix control.sh issues related to IGNITE-11876 and IGNITE-11913 Key: IGNITE-11955 URL: https://issues.apache.org/jira/browse/IGNITE-11955 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Umbrella ticket for control.sh issues related to IGNITE-11876 and IGNITE-11913 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11913) Incorrect formatting of idle_verify output
Denis Chudov created IGNITE-11913: - Summary: Incorrect formatting of idle_verify output Key: IGNITE-11913 URL: https://issues.apache.org/jira/browse/IGNITE-11913 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Command: {noformat} bin/control.sh --cache idle_verify cachepoc1.* --cache-filter PERSISTENT --host 172.25.1.29 {noformat} Output: {noformat} Control utility [ver. 2.5.8#20190511-sha1:8b27f3a6] 2019 Copyright(C) Apache Software Foundation User: isuntsov Time: 2019-05-14T15:22:20.565 idle_verify failed.There are no caches matching given filter options.Idle verify failed on nodes: Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29] consistent ID: poc-tester-server-172.25.1.29-id-0 See log for additional information. /home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt {noformat} In general, the output is CORRECT but I think it will be more readable with the following formatting: {noformat} idle_verify failed. There are no caches matching given filter options: [ cachepoc1.*; PERSISTENT]. Idle verify failed on nodes: Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29] Consistent ID: poc-tester-server-172.25.1.29-id-0 See log for additional information: /home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt {noformat} Also, I guess that *null* in "Exception message" should be replaced with some meaningful message: \{noformat} cat /home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt idle_verify failed.There are no caches matching given filter options.Idle verify failed on nodes: Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29] consistent ID: poc-tester-server-172.25.1.29-id-0 Exception message: null {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11900) Fix ClassPathContentLoggingTest test
Denis Chudov created IGNITE-11900: - Summary: Fix ClassPathContentLoggingTest test Key: IGNITE-11900 URL: https://issues.apache.org/jira/browse/IGNITE-11900 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov This test failed in TC because of incorrect path separator in beforeTestsStarted. Should be File.pathSeparator instead of ';'. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11874) Fix mismatch between idle_verify results with and without -dump option.
Denis Chudov created IGNITE-11874: - Summary: Fix mismatch between idle_verify results with and without -dump option. Key: IGNITE-11874 URL: https://issues.apache.org/jira/browse/IGNITE-11874 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11855) Need to reduce log message in case: Topology projection is empty. Cluster group is empty.
Denis Chudov created IGNITE-11855: - Summary: Need to reduce log message in case: Topology projection is empty. Cluster group is empty. Key: IGNITE-11855 URL: https://issues.apache.org/jira/browse/IGNITE-11855 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov In some cases - there is a lot of stack trace in logs {code:java} [18:53:00,811][SEVERE][grid-timeout-worker-#39][diagnostic] Could not get thread dump from transaction owner near node: class org.apache.ignite.cluster.ClusterGroupEmptyException: Cluster group is empty. at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:853) at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:851) at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:991) at org.apache.ignite.internal.util.future.IgniteFutureImpl.convertException(IgniteFutureImpl.java:168) at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2112) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2107) at org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:215) at org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:179) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355) at org.apache.ignite.internal.util.future.IgniteFutureImpl.listen(IgniteFutureImpl.java:71) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningTransaction(GridCachePartitionExchangeManager.java:2107) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningOperations0(GridCachePartitionExchangeManager.java:2009) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningOperations(GridCachePartitionExchangeManager.java:2163) at org.apache.ignite.internal.IgniteKernal$4.run(IgniteKernal.java:1344) at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask.onTimeout(GridTimeoutProcessor.java:365) at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:234) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.internal.cluster.ClusterGroupEmptyCheckedException: Cluster group is empty. at org.apache.ignite.internal.util.IgniteUtils.emptyTopologyException(IgniteUtils.java:4811) at org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:670) at org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:479) at org.apache.ignite.internal.IgniteComputeImpl.callAsync0(IgniteComputeImpl.java:809) at org.apache.ignite.internal.IgniteComputeImpl.callAsync(IgniteComputeImpl.java:794) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningTransaction(GridCachePartitionExchangeManager.java:2106) ... 7 more {code} {code:java} [18:53:00,809][SEVERE][grid-timeout-worker-#39][diagnostic] Could not get thread dump from transaction owner near node: class org.apache.ignite.cluster.ClusterGroupEmptyException: Topology projection is empty. at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:853) at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:851) at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:991) at org.apache.ignite.internal.util.future.IgniteFutureImpl.convertException(IgniteFutureImpl.java:168) at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2112) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2107) at org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:215) at org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:179) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385) at
[jira] [Created] (IGNITE-11789) Document changes of LRT diagnostic messages made in IGNITE-11392
Denis Chudov created IGNITE-11789: - Summary: Document changes of LRT diagnostic messages made in IGNITE-11392 Key: IGNITE-11789 URL: https://issues.apache.org/jira/browse/IGNITE-11789 Project: Ignite Issue Type: Task Components: documentation Reporter: Denis Chudov Fix For: 2.8 Additionally to log messages about detected LRTs, local node creates a request to near node to get the dump of a thread that created the transaction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11788) Fix issues related to IGNITE-10896
Denis Chudov created IGNITE-11788: - Summary: Fix issues related to IGNITE-10896 Key: IGNITE-11788 URL: https://issues.apache.org/jira/browse/IGNITE-11788 Project: Ignite Issue Type: Bug Reporter: Denis Chudov Assignee: Denis Chudov Fix For: 2.8 Both following cases related to executing commands in control.sh 1. New boolean field *succeeded* {color:#33}in {color}*org.apache.ignite.internal.processors.cache.verify.* *IdleVerifyResultV2* is not serialized. It may affect the case when there are no caches matching idle_verify command filters: possibly user can get odd output message. 2. Cache name parsing now assumes that cache names can be given as regexps - it may affect the case when cache name contains regexp special characters: user can get error message about incorrect regular expression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)