Hi, Transaction is timed out as transaction initiator node doen't receive response from other node in time by some reason. It may be due to an entry is locked by smbd or network issues or GC issues on other node or some bug.
There is no starvation is stripe pool actually, as thread queue is empty, but the long running operation that can be an issue as it can hold a locks and make other messages delayed. Also, transaction can be blocked with running partition map exchange on unstable topology. Is it possible there are too many operations or large entries involved in transaction? On Thu, Oct 12, 2017 at 8:36 AM, iostream <[email protected]> wrote: > Hi, > > We are observing cache transaction timeout in our Ignite v2.1 cluster. The > setup comprises 20 ignite servers and 20 clients. Error log - > > spring_264202322.log:Caused by: > org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: > Cache transaction timed out: GridNearTxLocal [mappings=IgniteTxMappingsImpl > [], nearLocallyMapped=false, colocatedLocallyMapped=false, > needCheckBackup=null, hasRemoteLocks=true, > thread=DefaultMessageListenerContainer-10, mappings=IgniteTxMappingsImpl > [], > super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], > dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter > [completedBase=null, sndTransformedVals=false, depEnabled=false, > txState=IgniteTxStateImpl [activeCacheIds=GridIntList [idx=2, > arr=[-1836347052,159420608]], recovery=false, txMap=[IgniteTxEntry > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_key > [idHash=1384394993, hash=789085753, fulfill_order_id=79218428], > cacheId=-1836347052, txKey=IgniteTxKey > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_key > [idHash=1384394993, hash=789085753, fulfill_order_id=79218428], > cacheId=-1836347052], val=[op=TRANSFORM, val=null], prevVal=[op=TRANSFORM, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=org.apache.ignite.internal.processors.query.h2. > DmlStatementsProcessor$ModifyingEntryProcessor@75e749b8, > val2=[Ljava.lang.Object;@4ee393d4]], ttl=-1, conflictExpireTime=-1, > conflictVer=null, explicitVer=null, dhtVer=null, filters=[], > filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry > [super=GridDistributedCacheEntry [super=GridCacheMapEntry > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_key > [idHash=1384394993, hash=789085753, fulfill_order_id=79218428], > val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order > [idHash=1411260389, hash=1683738400, create_userid=HOORDPRO, > dspns_type_cd=1], startVer=1507208251042, ver=GridCacheVersion > [topVer=117883976, order=1507204044952, nodeOrder=112], hash=789085753, > extras=null, flags=0]]], prepared=0, locked=true, > nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=117883976, > order=1507208250156, nodeOrder=401]], IgniteTxEntry > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_line_key > [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1, > fulfill_order_id=79218428], cacheId=159420608, txKey=IgniteTxKey > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_line_key > [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1, > fulfill_order_id=79218428], cacheId=159420608], val=[op=TRANSFORM, > val=null], prevVal=[op=TRANSFORM, val=null], oldVal=[op=NOOP, val=null], > entryProcessorsCol=[IgniteBiTuple > [val1=org.apache.ignite.internal.processors.query.h2. > DmlStatementsProcessor$ModifyingEntryProcessor@688b99e3, > val2=[Ljava.lang.Object;@3a24ca15]], ttl=-1, conflictExpireTime=-1, > conflictVer=null, explicitVer=null, dhtVer=null, filters=[], > filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry > [super=GridDistributedCacheEntry [super=GridCacheMapEntry > [key=com.walmart.node.commons.manager.ignite.setup.model. > key.fulfill_order_line_key > [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1, > fulfill_order_id=79218428], > val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order_line > [idHash=518069123, hash=-1823543951, create_userid=HOORDPRO, > src_item_secondary_desc=null, s], startVer=1507208251774, > ver=GridCacheVersion [topVer=117883976, order=1507204044404, > nodeOrder=112], > hash=1510291529, extras=null, flags=0]]], prepared=0, locked=true, > nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=117883976, > order=1507208250156, nodeOrder=401]]]], super=IgniteTxAdapter > [xidVer=GridCacheVersion [topVer=117883976, order=1507208250156, > nodeOrder=401], writeVer=null, implicit=false, loc=true, threadId=167, > startTime=1507741430563, nodeId=a951003b-310c-45db-88d9-622d30a7558f, > startVer=GridCacheVersion [topVer=117883976, order=1507208250156, > nodeOrder=401], endVer=null, isolation=REPEATABLE_READ, > concurrency=PESSIMISTIC, timeout=120000, sysInvalidate=false, sys=false, > plc=2, commitVer=null, finalizing=NONE, invalidParts=null, > state=MARKED_ROLLBACK, timedOut=true, topVer=AffinityTopologyVersion > [topVer=406, minorTopVer=0], duration=143967ms, onePhaseCommit=false], > size=2]]] > > Looking at "hasRemoteLocks=true" in the error log, I want to understand if > the transaction timed out because some other thread has already occupied a > lock on this key? Was this the reason for transaction timing out? > > I also saw the following warning in ignite server logs - > > [13:11:55,559][WARNING][grid-timeout-worker-#31%null%][G] >>> Possible > starvation in striped pool. > Thread name: sys-stripe-5-#6%null% > Queue: [] > Deadlock: false > Completed: 254308 > Thread [name="sys-stripe-5-#6%null%", id=20, state=RUNNABLE, blockCnt=97, > waitCnt=254278] > at > o.a.i.i.processors.cache.persistence.tree.io.CacheVersionIO.read( > CacheVersionIO.java:235) > at > o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.readFullRow( > CacheDataRowAdapter.java:317) > at > o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink( > CacheDataRowAdapter.java:158) > at > o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink( > CacheDataRowAdapter.java:101) > at > o.a.i.i.processors.query.h2.database.H2RowFactory.getRow( > H2RowFactory.java:62) > at > o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO. > getLookupRow(H2ExtrasLeafIO.java:124) > at > o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO. > getLookupRow(H2ExtrasLeafIO.java:36) > at > o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:123) > at > o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:40) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > getRow(BPlusTree.java:4372) > at > o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:199) > at > o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:40) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > compare(BPlusTree.java:4359) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > findInsertionPoint(BPlusTree.java:4279) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > access$1500(BPlusTree.java:81) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree$ > Replace.run0(BPlusTree.java:342) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree$ > Replace.run0(BPlusTree.java:330) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree$ > GetPageHandler.run(BPlusTree.java:4697) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree$ > GetPageHandler.run(BPlusTree.java:4682) > at > o.a.i.i.processors.cache.persistence.tree.util.PageHandler.writePage( > PageHandler.java:342) > at > o.a.i.i.processors.cache.persistence.DataStructure. > write(DataStructure.java:261) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > access$11200(BPlusTree.java:81) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree$ > Put.tryReplace(BPlusTree.java:2875) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > putDown(BPlusTree.java:2279) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > putDown(BPlusTree.java:2266) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > putDown(BPlusTree.java:2266) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > putDown(BPlusTree.java:2266) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > doPut(BPlusTree.java:2006) > at > o.a.i.i.processors.cache.persistence.tree.BPlusTree. > put(BPlusTree.java:1977) > at > o.a.i.i.processors.query.h2.database.H2TreeIndex.put(H2TreeIndex.java:207) > at > o.a.i.i.processors.query.h2.opt.GridH2Table.addToIndex( > GridH2Table.java:729) > > Is there any correlation between the starving striped pool size and cache > transaction taking too long? > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- Best regards, Andrey V. Mashenkov
