[jira] [Created] (IGNITE-14619) Refactoring of GridDeploymentCommunication

2021-04-21 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14619:
-

 Summary: Refactoring of GridDeploymentCommunication
 Key: IGNITE-14619
 URL: https://issues.apache.org/jira/browse/IGNITE-14619
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov


org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication#sendResourceRequest
 uses "while" loop with mutex instead of future, and creates listeners for 
discovery events and communication messages for each request. This complicates 
the code and may affect class loading performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14530) Add offline utility for analyzing WAL

2021-04-13 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14530:
-

 Summary: Add offline utility for analyzing WAL
 Key: IGNITE-14530
 URL: https://issues.apache.org/jira/browse/IGNITE-14530
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov


It would be useful for investigating problems of data consistency, PDS and 
other to have the possibility of reading WAL files in offline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14529) Add offline utility for analyzing indexes

2021-04-13 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14529:
-

 Summary: Add offline utility for analyzing indexes
 Key: IGNITE-14529
 URL: https://issues.apache.org/jira/browse/IGNITE-14529
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov


It would be useful to have the possibility for indexes offline validation.

The utility must check that every configured index (available in MetaTree [1]) 
is reachable from corresponding root page, tree structure is valid and no 
orphan(unreachable) index pages are present in the index.bin persistent page 
store.

[1] 
org.apache.ignite.internal.processors.cache.persistence.IndexStorageImpl#getIndexNames



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14528) AssertionError in GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture:1528

2021-04-13 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14528:
-

 Summary: AssertionError in 
GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture:1528
 Key: IGNITE-14528
 URL: https://issues.apache.org/jira/browse/IGNITE-14528
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


the fall of the node during a rebalance
{code:java}
2021-01-31 
07:45:29.174[ERROR][exchange-worker-#168%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=java.lang.AssertionError: RebalanceFuture [grp=CacheGroupContext 
[grp=CACHEGROUP_OBJECT_TO_EVICTION_REGISTRY], topVer=AffinityTopologyVersion 
[topVer=46, minorTopVer=0], rebalanceId=27299, routines=22, receivedBytes=0, 
receivedKeys=0, partitionsLeft=1659, startTime=-1, endTime=-1, 
lastCancelledTime=1612068328290, result=true]]]
java.lang.AssertionError: RebalanceFuture [grp=CacheGroupContext 
[grp=CACHEGROUP_OBJECT_TO_EVICTION_REGISTRY], topVer=AffinityTopologyVersion 
[topVer=46, minorTopVer=0], rebalanceId=27299, routines=22, receivedBytes=0, 
receivedKeys=0, partitionsLeft=1659, startTime=-1, endTime=-1, 
lastCancelledTime=1612068328290, result=true]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.ownPartitionsAndFinishFuture(GridDhtPartitionDemander.java:1528)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.finishPreloading(GridDhtPartitionDemander.java:2064)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.finishPreloading(GridDhtPreloader.java:577)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:419)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:3133)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3280)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3195)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14474) Improve error message in case rebalance fails

2021-04-05 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14474:
-

 Summary: Improve error message in case rebalance fails
 Key: IGNITE-14474
 URL: https://issues.apache.org/jira/browse/IGNITE-14474
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov


Currently we can get a message like this when rebalance fails with an exception 
(examples from ignite 2.5, in newer versions the log messages were changed but 
the problem is still actual):
{code:java}
2019-11-27 13:41:14,504[WARN ][utility-#79%xxx%][GridDhtPartitionDemander] 
Rebalancing from node cancelled [grp=ignite-sys-cache, 
topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], 
supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topic=0]. Supply message 
couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to 
unmarshal object with optimized marshaller
2019-11-27 13:41:14,504[INFO ][utility-#79%xxx%][GridDhtPartitionDemander] 
Cancelled rebalancing [grp=ignite-sys-cache, 
supplier=f014f30a-77f2-4459-aa5b-6c12907a7449, topVer=AffinityTopologyVersion 
[topVer=1932, minorTopVer=1], time=88 ms]
2019-11-27 13:41:14,508[WARN ][utility-#76%xxx%][GridDhtPartitionDemander] 
Rebalancing from node cancelled [grp=ignite-sys-cache, 
topVer=AffinityTopologyVersion [topVer=1932, minorTopVer=1], 
supplier=dfa5ee06-48c9-4458-ae55-48cc6ceda998, topic=0]. Supply message 
couldn't be unmarshalled: class o.a.i.IgniteCheckedException: Failed to 
unmarshal object with optimized marshaller
{code}
In the case above, a marshalling exception leads to rebalance failure which 
will never be resolved - i.e. the cluster enters into a erroneous state.

We should report issues like this as ERROR. The message should explain that the 
rebalance has failed, data for the cache was not fully copied to the node, the 
backup factor is not recovered and the cluster may not work correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14425) Hang transactions in FINISH [COMMIT] phase when сommunication spi is blocked

2021-03-26 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14425:
-

 Summary: Hang transactions in FINISH [COMMIT] phase when 
сommunication spi is blocked
 Key: IGNITE-14425
 URL: https://issues.apache.org/jira/browse/IGNITE-14425
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


scenario:
>From a client for two concurrent transactions on a single key.
At the same time, the GridNearTxFinishRequest message is blocked from the 
client, a partial network failure is emulated, as a result, one of the 
transactions is not completed even if the node is no longer working

Reproducer:

(insert this test into *TxRollbackOnTimeoutTest*)
{code:java}
/**
 *
 */
@Test
public void testRollbackOnNearNodeLeft() throws Exception {
Ignite client = startClient();
Integer pk = primaryKey(grid(0).cache(CACHE_NAME));
CountDownLatch locked = new CountDownLatch(1);
CountDownLatch blocked = new CountDownLatch(1);
IgniteInternalFuture fut = runAsync(new Callable() {
@Override public Void call() throws Exception {
try (Transaction tx0 = client.transactions().txStart()) {
client.cache(CACHE_NAME).put(pk, 0);
locked.countDown();
U.awaitQuiet(blocked);
tx0.commit();
}
catch (Exception e) {
// Ignored.
}
return null;
}
});
IgniteInternalFuture fut2 = runAsync(new Runnable() {
@Override public void run() {
try (Transaction tx1 = 
client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 1000, 0)) {
U.awaitQuiet(locked);
TestRecordingCommunicationSpi.spi(client).blockMessages(new 
IgniteBiPredicate() {
@Override public boolean apply(ClusterNode clusterNode, 
Message msg) {
return msg instanceof GridNearTxFinishRequest;
}
});

TestRecordingCommunicationSpi.spi(grid(0)).blockMessages(new 
IgniteBiPredicate() {
@Override public boolean apply(ClusterNode clusterNode, 
Message msg) {
return msg instanceof GridNearLockResponse;
}
});
client.cache(CACHE_NAME).put(pk, 1);
fail();
}
catch (Exception e) {
assertTrue(X.hasCause(e, 
TransactionTimeoutException.class));
}
}
});
TestRecordingCommunicationSpi.spi(client).waitForBlocked();
TestRecordingCommunicationSpi.spi(grid(0)).waitForBlocked();
fut2.get();
client.close();
TestRecordingCommunicationSpi.spi(grid(0)).stopBlock();
blocked.countDown();
fut.get();

assertTrue(grid(0).context().cache().context().tm().activeTransactions().isEmpty());
}
{code}
As the result, transaction hangs on server node in MARKED_ROLLBACK state 
forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14423) Node failure caused by AssertionError: Transaction does not own lock for update

2021-03-26 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-14423:
-

 Summary: Node failure caused by AssertionError: Transaction does 
not own lock for update
 Key: IGNITE-14423
 URL: https://issues.apache.org/jira/browse/IGNITE-14423
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov


Scenario:
1. Start 3 servers
2. Start 2 clients
3. Start two OPTIMISTIC transactions with the same key from different client 
nodes
4. Transfer transactions to PREPARED STATE on primary node
5. Stop one client node(whose transaction changed state to PREPARED last)

 

Error in log:
{code:java}
[2021-03-03 
08:52:59,807][ERROR][sys-#499%transactions.TxRecoveryWithConcurrentRollbackTest1%][GridNearTxLocal]
 Failed completing the transaction: [commit=true, tx=GridDhtTxLocal 
[nearNodeId=29551b46-74ef-4e35-af4a-d97809cc5260, 
nearFutId=08b25a6f771-eec14fa0-02ef-46b9-97d9-0f2b7c851ddb, nearMiniId=1, 
nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion 
[topVer=226230774, order=1614750773047, nodeOrder=4], lb=tx, 
super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView 
[], dhtNodes=KeySetView [], explicitLock=false, super=IgniteTxLocalAdapter 
[completedBase=null, sndTransformedVals=false, depEnabled=false, 
txState=IgniteTxStateImpl [activeCacheIds=[1544803905], recovery=false, 
mvccEnabled=false, mvccCachingCacheIds=[], txMap=ArrayList [IgniteTxEntry 
[txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], 
cacheId=1544803905], val=TxEntryValueHolder [val=CacheObjectImpl [val=null, 
hasValBytes=true], op=CREATE], prevVal=TxEntryValueHolder [val=null, op=NOOP], 
oldVal=TxEntryValueHolder [val=null, op=NOOP], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=false, 
entry=GridDhtCacheEntry [rdrs=ReaderId[] [], part=1, 
super=GridDistributedCacheEntry [super=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], val=null, 
ver=GridCacheVersion [topVer=226230774, order=1614750774035, nodeOrder=2], 
hash=1, extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc [locs=LinkedList 
[GridCacheMvccCandidate [nodeId=b1f1a8e0-e1e8-4084-b9c6-bdd2a271, 
ver=GridCacheVersion [topVer=226230774, order=1614750774033, nodeOrder=2], 
threadId=364, id=5, topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], 
reentry=null, otherNodeId=207c7f09-21a9-4b99-b631-3304a522b002, 
otherVer=GridCacheVersion [topVer=226230774, order=1614750774032, nodeOrder=5], 
mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, 
key=KeyCacheObjectImpl [part=1, val=1, hasValBytes=true], 
masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0,
 prevVer=null, nextVer=null]], rmts=null]], flags=3]]], prepared=1, 
locked=false, nodeId=null, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=null]]], super=IgniteTxAdapter [xidVer=GridCacheVersion 
[topVer=226230774, order=1614750774035, nodeOrder=2], writeVer=null, 
implicit=false, loc=true, threadId=686, startTime=1614750774753, 
nodeId=b1f1a8e0-e1e8-4084-b9c6-bdd2a271, isolation=READ_COMMITTED, 
concurrency=OPTIMISTIC, timeout=5000, sysInvalidate=true, sys=false, plc=2, 
commitVer=GridCacheVersion [topVer=226230774, order=1614750774035, 
nodeOrder=2], finalizing=RECOVERY_FINISH, invalidParts=null, state=UNKNOWN, 
timedOut=false, topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], 
mvccSnapshot=null, skipCompletedVers=false, parentTx=null, duration=5048ms, 
onePhaseCommit=false], size=1
class 
org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
Committing a transaction has produced runtime exception
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:813)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:969)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitAsync(GridDhtTxLocal.java:542)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.finishTxOnRecovery(IgniteTxManager.java:2370)
at 

[jira] [Created] (IGNITE-13418) Deadlock on multiple cache delete

2020-09-09 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-13418:
-

 Summary: Deadlock on multiple cache delete
 Key: IGNITE-13418
 URL: https://issues.apache.org/jira/browse/IGNITE-13418
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


Thread 1:
 - acquires checkpoint read lock in 
GridCacheProcessor#processCacheStopRequestOnExchangeDone
 - acquires GridQueryProcessor#stateMux in GridQueryProcessor.onCacheStop
 - enters H2TreeIndex.destroy
 - releases checkpoint read lock in H2Tree.temporaryReleaseLock, can't take it 
again because of db-checkpoint-thread

Thread 2:
 - acquires checkpoint read lock in 
GridCacheProcessor#processCacheStopRequestOnExchangeDone
 - trying to acquire GridQueryProcessor#stateMux in 
GridQueryProcessor.onCacheStop which is held by thread 1

db-checkpoint-thread:
 - trying to acquire checkpoint write lock, can't do it because of Thread 2

Decision: H2Tree.temporaryReleaseLock should release lock only in case when 
tree deletion is asynchronous (H2TreeIndex.destroy is called with async=true), 
i.e. it happens inside of DurableBackgroundTask. Such tasks are executed in 
separate threads, which don't hold any other locks.

Thread dump:
{code:java}
Thread [name="sys-#1220%DPL_GRID%DplGridNodeName%", id=3200, state=BLOCKED, 
blockCnt=1, waitCnt=0]
 Lock [object=java.lang.Object@6a9a92ba, 
ownerName=sys-#1215%DPL_GRID%DplGridNodeName%, ownerId=3195]
 at 
o.a.i.i.processors.query.GridQueryProcessor.onCacheStop0(GridQueryProcessor.java:1695)
 at 
o.a.i.i.processors.query.GridQueryProcessor.onCacheStop(GridQueryProcessor.java:902)
 at 
o.a.i.i.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1169)
 at 
o.a.i.i.processors.cache.GridCacheProcessor.prepareCacheStop(GridCacheProcessor.java:2644)
 at 
o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$629e8679$1(GridCacheProcessor.java:2803)
 at 
o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$620/1418386924.apply(Unknown
 Source)
 at o.a.i.i.util.IgniteUtils.lambda$null$1(IgniteUtils.java:10879)
 at o.a.i.i.util.IgniteUtils$$Lambda$436/321848940.call(Unknown Source)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Locked synchronizers:
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2582f93c
 java.util.concurrent.ThreadPoolExecutor$Worker@4da1cafe
Thread [name="sys-#1215%DPL_GRID%DplGridNodeName%", id=3195, state=BLOCKED, 
blockCnt=4, waitCnt=437520]
 Lock [object=o.a.i.i.processors.failure.FailureProcessor@78edb1e9, 
ownerName=async-durable-background-task-executor-1-#1222%DPL_GRID%DplGridNodeName%,
 ownerId=3202]
 at 
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:162)
 at 
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:151)
 at 
o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1787)
 at 
o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1722)
 at 
o.a.i.i.processors.query.h2.database.H2Tree.temporaryReleaseLock(H2Tree.java:690)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.temporaryReleaseLock(BPlusTree.java:2367)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2548)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroyDownPages(BPlusTree.java:2522)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroy(BPlusTree.java:2441)
 at 
o.a.i.i.processors.cache.persistence.tree.BPlusTree.destroy(BPlusTree.java:2392)
 at 
o.a.i.i.processors.query.h2.database.H2TreeIndex.destroy0(H2TreeIndex.java:671)
 at 
o.a.i.i.processors.query.h2.database.H2TreeIndex.destroy(H2TreeIndex.java:639)
 at o.a.i.i.processors.query.h2.opt.GridH2Table.destroy(GridH2Table.java:567)
 at 
o.a.i.i.processors.query.h2.H2TableDescriptor.onDrop(H2TableDescriptor.java:347)
 at o.a.i.i.processors.query.h2.H2Schema.drop(H2Schema.java:127)
 at 
o.a.i.i.processors.query.h2.IgniteH2Indexing.unregisterCache(IgniteH2Indexing.java:2595)
 at 
o.a.i.i.processors.query.GridQueryProcessor.onCacheStop0(GridQueryProcessor.java:1727)
 - locked java.lang.Object@6a9a92ba
 at 
o.a.i.i.processors.query.GridQueryProcessor.onCacheStop(GridQueryProcessor.java:902)
 at 
o.a.i.i.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1169)
 at 

[jira] [Created] (IGNITE-12760) Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left

2020-03-10 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12760:
-

 Summary: Prevent AssertionError on message unmarshalling, when 
classLoaderId contains id of node that already left
 Key: IGNITE-12760
 URL: https://issues.apache.org/jira/browse/IGNITE-12760
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


Following assertion error triggers failure handler and crashes the node. Can 
possibly crash the whole cluster.


{code:java}
2020-02-18 
14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager]
 Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, 
msg=GridCacheQueryRequest \[id=178, 
cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, 
type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, 
rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, 
incMeta=false, all=false, keepBinary=true, 
subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, 
topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, 
receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, 
super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean 
\[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, 
userVer=0, locDepOwner=false, participants=null], 
lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], 
err=null, skipPrepare=false
java.lang.AssertionError: null
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.(GridCacheDeploymentManager.java:918)
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.(GridCacheDeploymentManager.java:889)
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}

There is no fair reproducer for now, but it seems that we should prevent such 
situation in general like following:
1) check the correctness of the message before it will be sent - inside of 
GridCacheDeploymentManager#prepare. If we have the corresponding class loader 
on local node, we can try to fix message and replace wrong class loader with 
local one.
2) log suspicious deployments which we receive from 
GridDeploymentManager#deploy - maybe we have obsolete deployments in caches. 
3) possibly we can remove this assertion, we should have this class on sender 
node and use it as class loader id, and if we don't, we will receive exception 
on finishUnmarshall (Failed to peer load class) and try to process this 
situation with GridCacheIoManager#processFailedMessage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12612) Failing test IoStatisticsBasicIndexSelfTest.testMetricRegistryRemovedOnIndexDrop after IGNITE-12496

2020-01-31 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12612:
-

 Summary: Failing test 
IoStatisticsBasicIndexSelfTest.testMetricRegistryRemovedOnIndexDrop after 
IGNITE-12496
 Key: IGNITE-12612
 URL: https://issues.apache.org/jira/browse/IGNITE-12612
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


In IGNITE-12496 a new method of destroying index was added 
(H2TreeIndex#asyncDestroy), and there are no changes from IGNITE-11987 in it, 
the changes are only present in old H2TreeIndex#destroy. We should apply 
changes to new method.

Also it would be better to refactor both methods to prevent such errors in 
future.

Failing test history 
[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1153260173537232526=%3Cdefault%3E=testDetails]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12516) Dump active transaction from near node does not work if transaction not follow first

2019-12-30 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12516:
-

 Summary: Dump active transaction from near node does not work if 
transaction not follow first
 Key: IGNITE-12516
 URL: https://issues.apache.org/jira/browse/IGNITE-12516
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


For this code:


{code:java}
for (IgniteInternalTx tx : tm.activeTransactions()) {
 if (curTime - tx.startTime() > timeout) {
 found = true;
if (warnings.canAddMessage()) {
 warnings.add(longRunningTransactionWarning(tx, curTime));
if (ltrDumpLimiter.allowAction(tx))
 dumpLongRunningTransaction(tx);
 }
 else
 warnings.incTotal();
 }
}
{code}

you can see, if transaction is not ACTIVE dumping closure will skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12496) Index deletion blocks checkpoint for all of its duration, which can cause "Critical system error: system critical thread blocked"

2019-12-26 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12496:
-

 Summary: Index deletion blocks checkpoint for all of its duration, 
which can cause "Critical system error: system critical thread blocked"
 Key: IGNITE-12496
 URL: https://issues.apache.org/jira/browse/IGNITE-12496
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


GridH2Table#removeIndex(Session, Index) acquires checkpoint read lock and 
releases it only after full completion of deletion process. It happens because 
H2TreeIndex#destroy requires to be run when checkpoint lock is held. Meanwhile, 
checkpoint thread stops on Checkpointer#markCheckpointBegin, trying to acquire 
write lock, and stays locked for all the time of index deletion.

The possible fix is that checkpoint read lock is periodically released while 
index deletion is in progress. To avoid persistence corruption in case of node 
crush in the middle of the process, we should put index root into some 
persistent structure like index meta tree and remember it as "pending delete". 
Then we must delete tree pages from leafs to root, this allows to avoid links 
to deleted pages. When deletion is complete, tree root can be removed from 
"pending delete".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12340) Extend test coverage of ability to track system/user time held in transaction

2019-10-30 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12340:
-

 Summary: Extend test coverage of ability to track system/user time 
held in transaction
 Key: IGNITE-12340
 URL: https://issues.apache.org/jira/browse/IGNITE-12340
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


Test coverage of this feature for now does not cover all use cases and should 
be improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12167) I think we should add logic for clearing custom pageHndWrapper in afterTestsStopped() in GridCommonAbstractTest or GridAbstractTest https://ggtc.gridgain.com/viewLog.ht

2019-09-12 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12167:
-

 Summary: I think we should add logic for clearing custom 
pageHndWrapper in afterTestsStopped() in GridCommonAbstractTest or 
GridAbstractTest https://ggtc.gridgain.com/viewLog.html?buildId=2364333
 Key: IGNITE-12167
 URL: https://issues.apache.org/jira/browse/IGNITE-12167
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


We should move logic for clearing custom pageHndWrapperin 
{{afterTestsStopped()}} in {{GridCommonAbstractTest}} or {{GridAbstractTest  to 
ensure that we will not have problems in any tests which set custom wrapper}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12165) Negative time in Transaction time dump

2019-09-12 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12165:
-

 Summary: Negative time in Transaction time dump
 Key: IGNITE-12165
 URL: https://issues.apache.org/jira/browse/IGNITE-12165
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


After implementing ticket https://ggsystems.atlassian.net/browse/GG-21272 we 
have transaction dumps in the logs.
there are some issues with information in these dumps:
{code:java}
[11:53:36,154][INFO][snapshot-scheduler-restats-#69][GridNearTxLocal] 
Transaction time dump [startTime=11:53:36.081, totalTime=65, systemTime=3, 
userTime=62, cacheOperationsTime=-41943785508, rollbackTime=41943785512, 
tx=GridNearTxLocal [mappings=IgniteTxMappingsImpl [], nearLocallyMapped=false, 
colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=true, 
trackTimeout=false, systemTime=3953576, systemStartTime=0, prepareStartTime=0, 
prepareTime=0, commitOrRollbackStartTime=0, 
commitOrRollbackTime=41943785512318555, 
txDumpsThrottling=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxDumpsThrottling@760d8025,
 lb=null, thread=snapshot-scheduler-restats-#69, mappings=IgniteTxMappingsImpl 
[], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, 
nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, 
super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, 
depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[-2100569601], 
recovery=false, txMap=HashSet [IgniteTxEntry [key=KeyCacheObjectImpl [part=2, 
val=SnapshotScheduleKey [id=_SCHEDULES_], hasValBytes=true], 
cacheId=-2100569601, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=2, 
val=SnapshotScheduleKey [id=_SCHEDULES_], hasValBytes=true], 
cacheId=-2100569601], val=[op=READ, val=null], prevVal=[op=NOOP, val=null], 
oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=null, filtersPassed=false, filtersSet=true, entry=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=2, val=SnapshotScheduleKey [id=_SCHEDULES_], 
hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, 
nodeOrder=0], hash=886348002, extras=null, flags=0]GridDistributedCacheEntry 
[super=]GridDhtDetachedCacheEntry [super=], prepared=0, locked=true, 
nodeId=1adbae78-40fe-480d-803d-4d498919ae63, locMapped=false, expiryPlc=null, 
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, 
xidVer=GridCacheVersion [topVer=179672005, order=1568192005565, 
nodeOrder=1, super=IgniteTxAdapter [xidVer=GridCacheVersion 
[topVer=179672005, order=1568192005565, nodeOrder=1], writeVer=null, 
implicit=false, loc=true, threadId=125, startTime=1568192016081, 
nodeId=9e00ae45-4084-4f29-949f-c19c933f4299, startVer=GridCacheVersion 
[topVer=179672005, order=1568192005565, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, 
sysInvalidate=false, sys=true, plc=5, commitVer=null, finalizing=NONE, 
invalidParts=null, state=ROLLED_BACK, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], duration=60ms, 
onePhaseCommit=false], size=1
{code}
For example:
 # Negative time: cacheOperationsTime=-41943785508,
 # Huge times: rollbackTime=41943785512



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12158) Failing test RebuildIndexLogMessageTest on ignite-2.5-master

2019-09-10 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12158:
-

 Summary: Failing test RebuildIndexLogMessageTest on 
ignite-2.5-master
 Key: IGNITE-12158
 URL: https://issues.apache.org/jira/browse/IGNITE-12158
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8==testDetails=-1672212793500055752=START_DATE_DESC_IgniteTests24Java8=pull%2F4611%2Fhead=500]

 

Test is constantly failing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12112) H2TreeIndex should throw CorruptTreeException with cacheId, cacheName and indexName

2019-08-27 Thread Denis Chudov (Jira)
Denis Chudov created IGNITE-12112:
-

 Summary: H2TreeIndex should throw CorruptTreeException with 
cacheId, cacheName and indexName
 Key: IGNITE-12112
 URL: https://issues.apache.org/jira/browse/IGNITE-12112
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


At the moment we can't define problem cache, If we have many caches in one 
cache group and CorruptTreeException was thrown CorruptTreeException. 
For example:
{code:java}
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on row: Row@346ba4aa[ key: 29003, val: 
com.sbt.acquiring.processing.entities.dictionaries.TurnoverRange_DPL_PROXY 
[idHash=492721050, hash=-616731792, valueStart=2001, isImmutable=false, 
lastChangeDate=1560917668719, name=Ñâûøå 20, valueEnd=null, 
changeState=null, id=29003, pkey=7c0d0d59-f33e-41d7-a5c9-4d8c7e74b976, 
ownerId=acquiring-processing-replication, base=true], ver: GridCacheVersion 
[topVer=172396060, order=1560917613306, nodeOrder=2] ][ Ñâûøå 20, null, 
2001, TRUE, null, 29003, 7c0d0d59-f33e-41d7-a5c9-4d8c7e74b976 ]" [5-195]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:295)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:251)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:548)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:480)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:709)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1863)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:403)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:1402)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1263)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1625)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:358)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3629)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2793)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:902)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:772)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:344)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:418)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:408)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1061)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:586)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1624)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 

[jira] [Created] (IGNITE-12070) Document the new ability to track system/user time of transactions

2019-08-14 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-12070:
-

 Summary: Document the new ability to track system/user time of 
transactions
 Key: IGNITE-12070
 URL: https://issues.apache.org/jira/browse/IGNITE-12070
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Denis Chudov


Now there is ability to track system/user time of transactions. System time is 
the time that is spent for system activities - i.e. time while aquiring locks, 
preparing, commiting, etc.User time is the time that is spent for user 
activities when client node runs some code while holding transaction.

We have ability to log info about transactions that exceed some threshold 
execution timeout, or some percentage of all transactions. Log record in case 
of long-running transactions looks like following:
{code:java}
[2019-08-09 13:39:49,130][WARN ][sys-stripe-1-#101%client%][root] Long 
transaction time dump [startTime=13:39:47.970, totalTime=1160, systemTime=157, 
userTime=1003, cacheOperationsTime=141, prepareTime=15, commitTime=0, 
tx=GridNearTxLocal [...]]
{code}
In case of sampling of all transactions:


{code:java}
[2019-08-09 13:39:54,079][INFO ][sys-stripe-2-#102%client%][root] Transaction 
time dump [startTime=13:39:54.063, totalTime=15, systemTime=6, userTime=9, 
cacheOperationsTime=2, prepareTime=3, commitTime=0, tx=GridNearTxLocal [...]]
{code}
Also some of transactions can be skipped to not overflow the log, information 
about this log throttling looks like this:
{code:java}
[2019-08-09 13:39:55,109][INFO ][sys-stripe-0-#100%client%][root] Transaction 
time dumps skipped because of log throttling: 2
{code}
There are JMX parameters and JVM options to control this behavior:
1)
JVM option: IGNITE_LONG_TRANSACTION_TIME_DUMP_THRESHOLD
JMX parameter: TransactionsMXBean.longTransactionTimeDumpThreshold
Threshold timeout in milliseconds for long transactions, if transaction exceeds 
it, it will be dumped in log with information about how much time did it spent 
in system time and user time. Default value is 0. No info about system/user 
time of long transactions is dumped in log if this parameter is not set.
2) 
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesCoefficient
The coefficient for samples of completed transactions that will be dumped in 
log. Must be float value between 0.0 and 1.0 inclusive. Default value is 0.0.
3) 
JVM option: IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_PER_SECOND_LIMIT
JMX parameter: TransactionsMXBean.transactionTimeDumpSamplesPerSecondLimit
The limit of samples of completed transactions that will be dumped in log per 
second, if IGNITE_TRANSACTION_TIME_DUMP_SAMPLES_COEFFICIENT is above 0.0. Must 
be integer value greater than 0. Default value is 5.

For the existing long running transaction warning was added information about 
current system and user time of transaction:
{code:java}
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] First 
10 long running transactions [total=1]
[2019-08-09 14:10:31,835][WARN ][grid-timeout-worker-#122%client%][root] >>> 
Transaction [startTime=14:10:31.170, curTime=14:10:31.750, systemTime=32, 
userTime=548, tx=GridNearTxLocal [...]]
{code}

Also added following metrics to monitor system and user time for single node:
diagnostic.transactions.totalNodeSystemTime - Total transactions system time on 
node.
diagnostic.transactions.totalNodeUserTime - Total transactions user time on 
node.
diagnostic.transactions.nodeSystemTimeHistogram - Transactions system times on 
node represented as histogram.
diagnostic.transactions.nodeUserTimeHistogram - Transactions user times on node 
represented as histogram.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12063) Add ability to track system/user time held in transaction

2019-08-13 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-12063:
-

 Summary: Add ability to track system/user time held in transaction
 Key: IGNITE-12063
 URL: https://issues.apache.org/jira/browse/IGNITE-12063
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov
 Fix For: 2.8


We should dump user/system times in transaction to log on commit/rollback, if 
duration of transaction more then threshold. I want to see in log on tx 
coordinator node:
# Transaction duration
# System time:
#* How long we were getting locks on keys?
#* How long we were preparing transaction?
#* How long we were commiting transaction?
# User time (transaction time - total system time) 
# Transaction status (commit/rollback)

The threshold could be set by system property and overwrite by JMX. We 
shouldn't dump times, if the property not set.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12047) senderNodeId is absent in StatusCheckMessage

2019-08-06 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-12047:
-

 Summary: senderNodeId is absent in StatusCheckMessage
 Key: IGNITE-12047
 URL: https://issues.apache.org/jira/browse/IGNITE-12047
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov
 Fix For: 2.8


TcpDiscoveryCoordinatorFailureTest.testClusterFailedNewCoordinatorInitialized
{code:java}
[2019-07-29 
02:42:11,609][ERROR][tcp-disco-sock-reader-[]-#21611%tcp.TcpDiscoveryCoordinatorFailureTest4%][TcpDiscoverySpi]
 Failed to initialize connection (this can happen due to short time network 
problems and can be ignored if does not affect node discovery) 
[sock=Socket[addr=/127.0.0.1,port=44033,localport=47504]] 
java.net.SocketTimeoutException: Read timed out at 
java.net.SocketInputStream.socketRead0(Native Method) at 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at 
java.net.SocketInputStream.read(SocketInputStream.java:171) at 
java.net.SocketInputStream.read(SocketInputStream.java:141) at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:6464)
 at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:60) 
Exception in thread 
"disco-pool-#62924%tcp.TcpDiscoveryCoordinatorFailureTest4%" Exception in 
thread "disco-pool-#62925%tcp.TcpDiscoveryCoordinatorFailureTest4%" 
java.lang.AssertionError at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.pingNode(ServerImpl.java:723) at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.access$4000(ServerImpl.java:195) 
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker$8.run(ServerImpl.java:5598)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) java.lang.AssertionError at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.pingNode(ServerImpl.java:723) at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.access$4000(ServerImpl.java:195) 
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker$8.run(ServerImpl.java:5598)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11979) Add ability to set default parallelizm of rebuild indexes in configuration

2019-07-12 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11979:
-

 Summary: Add ability to set default parallelizm of rebuild indexes 
in configuration
 Key: IGNITE-11979
 URL: https://issues.apache.org/jira/browse/IGNITE-11979
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


We can't change SchemaIndexCacheVisitorImpl#DFLT_PARALLELISM at the moment:
{code:java}
/** Default degree of parallelism. */
private static final int DFLT_PARALLELISM = Math.min(4, Math.max(1, 
Runtime.getRuntime().availableProcessors() / 4));
{code}
On huge servers with a lot of cores (such as 56) we will rebuild indexes in 4 
threads. I think we should have ability to set DFLT_PARALLELISM in Ignite 
configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11958) JDBC connection validation should use it's own task instead of cache validation task

2019-07-03 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11958:
-

 Summary: JDBC connection validation should use it's own task 
instead of cache validation task
 Key: IGNITE-11958
 URL: https://issues.apache.org/jira/browse/IGNITE-11958
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov
 Fix For: 2.8


JDBC connection is validated using GridCacheQueryJdbcValidationTask. We should 
create own validation task for this activity.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11955) Fix control.sh issues related to IGNITE-11876 and IGNITE-11913

2019-07-02 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11955:
-

 Summary: Fix control.sh issues related to IGNITE-11876 and 
IGNITE-11913
 Key: IGNITE-11955
 URL: https://issues.apache.org/jira/browse/IGNITE-11955
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


Umbrella ticket for control.sh issues related to IGNITE-11876 and IGNITE-11913



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11913) Incorrect formatting of idle_verify output

2019-06-13 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11913:
-

 Summary: Incorrect formatting of idle_verify output
 Key: IGNITE-11913
 URL: https://issues.apache.org/jira/browse/IGNITE-11913
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


Command:
{noformat}
bin/control.sh --cache idle_verify cachepoc1.* --cache-filter PERSISTENT --host 
172.25.1.29
{noformat}
Output:
{noformat}
Control utility [ver. 2.5.8#20190511-sha1:8b27f3a6]

2019 Copyright(C) Apache Software Foundation

User: isuntsov

Time: 2019-05-14T15:22:20.565



idle_verify failed.There are no caches matching given filter options.Idle 
verify failed on nodes:

Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29] consistent ID: 
poc-tester-server-172.25.1.29-id-0

See log for additional information. 
/home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt
{noformat}
In general, the output is CORRECT but I think it will be more readable with the 
following formatting:
{noformat}
idle_verify failed.

There are no caches matching given filter options: [ cachepoc1.*; PERSISTENT].

Idle verify failed on nodes:

Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29]

Consistent ID: poc-tester-server-172.25.1.29-id-0

See log for additional information: 
/home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt
{noformat}

Also, I guess that *null* in "Exception message" should be replaced with some 
meaningful message:
 \{noformat}
cat /home/isuntsov/858_test/idle_verify-2019-05-14T15-22-20_895.txt
idle_verify failed.There are no caches matching given filter options.Idle 
verify failed on nodes:
Node ID: e5a981f6-9d0b-4cd7-aad3-fe3cc7fa6101 [172.25.1.29] consistent ID: 
poc-tester-server-172.25.1.29-id-0
Exception message:
null
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11900) Fix ClassPathContentLoggingTest test

2019-06-07 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11900:
-

 Summary: Fix ClassPathContentLoggingTest test
 Key: IGNITE-11900
 URL: https://issues.apache.org/jira/browse/IGNITE-11900
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


This test failed in TC because of incorrect path separator in 
beforeTestsStarted. Should be File.pathSeparator instead of ';'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11874) Fix mismatch between idle_verify results with and without -dump option.

2019-05-27 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11874:
-

 Summary: Fix mismatch between idle_verify results with and without 
-dump option.
 Key: IGNITE-11874
 URL: https://issues.apache.org/jira/browse/IGNITE-11874
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11855) Need to reduce log message in case: Topology projection is empty. Cluster group is empty.

2019-05-16 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11855:
-

 Summary: Need to reduce log message in case: Topology projection 
is empty. Cluster group is empty.
 Key: IGNITE-11855
 URL: https://issues.apache.org/jira/browse/IGNITE-11855
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov


In some cases - there is a lot of stack trace in logs


{code:java}
[18:53:00,811][SEVERE][grid-timeout-worker-#39][diagnostic] Could not get 
thread dump from transaction owner near node:
class org.apache.ignite.cluster.ClusterGroupEmptyException: Cluster group is 
empty.
 at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:853)
 at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:851)
 at 
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:991)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl.convertException(IgniteFutureImpl.java:168)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2112)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2107)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:215)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:179)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl.listen(IgniteFutureImpl.java:71)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningTransaction(GridCachePartitionExchangeManager.java:2107)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningOperations0(GridCachePartitionExchangeManager.java:2009)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningOperations(GridCachePartitionExchangeManager.java:2163)
 at org.apache.ignite.internal.IgniteKernal$4.run(IgniteKernal.java:1344)
 at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask.onTimeout(GridTimeoutProcessor.java:365)
 at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:234)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
 at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.cluster.ClusterGroupEmptyCheckedException: Cluster 
group is empty.
 at 
org.apache.ignite.internal.util.IgniteUtils.emptyTopologyException(IgniteUtils.java:4811)
 at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:670)
 at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor.callAsync(GridClosureProcessor.java:479)
 at 
org.apache.ignite.internal.IgniteComputeImpl.callAsync0(IgniteComputeImpl.java:809)
 at 
org.apache.ignite.internal.IgniteComputeImpl.callAsync(IgniteComputeImpl.java:794)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpLongRunningTransaction(GridCachePartitionExchangeManager.java:2106)
 ... 7 more
{code}


{code:java}
[18:53:00,809][SEVERE][grid-timeout-worker-#39][diagnostic] Could not get 
thread dump from transaction owner near node:
class org.apache.ignite.cluster.ClusterGroupEmptyException: Topology projection 
is empty.
 at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:853)
 at org.apache.ignite.internal.util.IgniteUtils$6.apply(IgniteUtils.java:851)
 at 
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:991)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl.convertException(IgniteFutureImpl.java:168)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2112)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$8.apply(GridCachePartitionExchangeManager.java:2107)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:215)
 at 
org.apache.ignite.internal.util.future.IgniteFutureImpl$InternalFutureListener.apply(IgniteFutureImpl.java:179)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
 at 

[jira] [Created] (IGNITE-11789) Document changes of LRT diagnostic messages made in IGNITE-11392

2019-04-22 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11789:
-

 Summary: Document changes of LRT diagnostic messages made in 
IGNITE-11392
 Key: IGNITE-11789
 URL: https://issues.apache.org/jira/browse/IGNITE-11789
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Denis Chudov
 Fix For: 2.8


Additionally to log messages about detected LRTs, local node creates a request 
to near node to get the dump of a thread that created the transaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11788) Fix issues related to IGNITE-10896

2019-04-19 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11788:
-

 Summary: Fix issues related to IGNITE-10896
 Key: IGNITE-11788
 URL: https://issues.apache.org/jira/browse/IGNITE-11788
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov
Assignee: Denis Chudov
 Fix For: 2.8


Both following cases related to executing commands in control.sh

1. New boolean field *succeeded* {color:#33}in 
{color}*org.apache.ignite.internal.processors.cache.verify.*

*IdleVerifyResultV2* is not serialized. It may affect the case when there are 
no caches matching idle_verify command filters: possibly user can get odd 
output message.

2. Cache name parsing now assumes that cache names can be given as regexps - it 
may affect the case when cache name contains regexp special characters: user 
can get error message about incorrect regular expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)