Hi,

Transaction is timed out as transaction initiator node doen't receive
response from other node in time by some reason.
It may be due to an entry is locked by smbd or network issues or GC issues
on other node or some bug.

There is no starvation is stripe pool actually, as thread queue is empty,
but the long running operation that can be an issue
as it can hold a locks and make other messages delayed.

Also, transaction can be blocked with running partition map exchange on
unstable topology.
Is it possible there are too many operations or large entries involved in
transaction?


On Thu, Oct 12, 2017 at 8:36 AM, iostream <[email protected]> wrote:

> Hi,
>
> We are observing cache transaction timeout in our Ignite v2.1 cluster. The
> setup comprises 20 ignite servers and 20 clients. Error log -
>
> spring_264202322.log:Caused by:
> org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
> Cache transaction timed out: GridNearTxLocal [mappings=IgniteTxMappingsImpl
> [], nearLocallyMapped=false, colocatedLocallyMapped=false,
> needCheckBackup=null, hasRemoteLocks=true,
> thread=DefaultMessageListenerContainer-10, mappings=IgniteTxMappingsImpl
> [],
> super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[],
> dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter
> [completedBase=null, sndTransformedVals=false, depEnabled=false,
> txState=IgniteTxStateImpl [activeCacheIds=GridIntList [idx=2,
> arr=[-1836347052,159420608]], recovery=false, txMap=[IgniteTxEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> cacheId=-1836347052, txKey=IgniteTxKey
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> cacheId=-1836347052], val=[op=TRANSFORM, val=null], prevVal=[op=TRANSFORM,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=org.apache.ignite.internal.processors.query.h2.
> DmlStatementsProcessor$ModifyingEntryProcessor@75e749b8,
> val2=[Ljava.lang.Object;@4ee393d4]], ttl=-1, conflictExpireTime=-1,
> conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_key
> [idHash=1384394993, hash=789085753, fulfill_order_id=79218428],
> val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order
> [idHash=1411260389, hash=1683738400, create_userid=HOORDPRO,
> dspns_type_cd=1], startVer=1507208251042, ver=GridCacheVersion
> [topVer=117883976, order=1507204044952, nodeOrder=112], hash=789085753,
> extras=null, flags=0]]], prepared=0, locked=true,
> nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=117883976,
> order=1507208250156, nodeOrder=401]], IgniteTxEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428], cacheId=159420608, txKey=IgniteTxKey
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428], cacheId=159420608], val=[op=TRANSFORM,
> val=null], prevVal=[op=TRANSFORM, val=null], oldVal=[op=NOOP, val=null],
> entryProcessorsCol=[IgniteBiTuple
> [val1=org.apache.ignite.internal.processors.query.h2.
> DmlStatementsProcessor$ModifyingEntryProcessor@688b99e3,
> val2=[Ljava.lang.Object;@3a24ca15]], ttl=-1, conflictExpireTime=-1,
> conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=com.walmart.node.commons.manager.ignite.setup.model.
> key.fulfill_order_line_key
> [idHash=669957679, hash=1510291529, fulfill_order_line_nbr=1,
> fulfill_order_id=79218428],
> val=com.walmart.node.commons.manager.ignite.setup.model.fulfill_order_line
> [idHash=518069123, hash=-1823543951, create_userid=HOORDPRO,
> src_item_secondary_desc=null, s], startVer=1507208251774,
> ver=GridCacheVersion [topVer=117883976, order=1507204044404,
> nodeOrder=112],
> hash=1510291529, extras=null, flags=0]]], prepared=0, locked=true,
> nodeId=d02e4a23-abe5-470d-a414-bfb9816ff494, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=2, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=117883976,
> order=1507208250156, nodeOrder=401]]]], super=IgniteTxAdapter
> [xidVer=GridCacheVersion [topVer=117883976, order=1507208250156,
> nodeOrder=401], writeVer=null, implicit=false, loc=true, threadId=167,
> startTime=1507741430563, nodeId=a951003b-310c-45db-88d9-622d30a7558f,
> startVer=GridCacheVersion [topVer=117883976, order=1507208250156,
> nodeOrder=401], endVer=null, isolation=REPEATABLE_READ,
> concurrency=PESSIMISTIC, timeout=120000, sysInvalidate=false, sys=false,
> plc=2, commitVer=null, finalizing=NONE, invalidParts=null,
> state=MARKED_ROLLBACK, timedOut=true, topVer=AffinityTopologyVersion
> [topVer=406, minorTopVer=0], duration=143967ms, onePhaseCommit=false],
> size=2]]]
>
> Looking at "hasRemoteLocks=true" in the error log, I want to understand if
> the transaction timed out because some other thread has already occupied a
> lock on this key? Was this the reason for transaction timing out?
>
> I also saw the following warning in ignite server logs -
>
> [13:11:55,559][WARNING][grid-timeout-worker-#31%null%][G] >>> Possible
> starvation in striped pool.
>     Thread name: sys-stripe-5-#6%null%
>     Queue: []
>     Deadlock: false
>     Completed: 254308
> Thread [name="sys-stripe-5-#6%null%", id=20, state=RUNNABLE, blockCnt=97,
> waitCnt=254278]
>         at
> o.a.i.i.processors.cache.persistence.tree.io.CacheVersionIO.read(
> CacheVersionIO.java:235)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.readFullRow(
> CacheDataRowAdapter.java:317)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink(
> CacheDataRowAdapter.java:158)
>         at
> o.a.i.i.processors.cache.persistence.CacheDataRowAdapter.initFromLink(
> CacheDataRowAdapter.java:101)
>         at
> o.a.i.i.processors.query.h2.database.H2RowFactory.getRow(
> H2RowFactory.java:62)
>         at
> o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO.
> getLookupRow(H2ExtrasLeafIO.java:124)
>         at
> o.a.i.i.processors.query.h2.database.io.H2ExtrasLeafIO.
> getLookupRow(H2ExtrasLeafIO.java:36)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:123)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.getRow(H2Tree.java:40)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> getRow(BPlusTree.java:4372)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:199)
>         at
> o.a.i.i.processors.query.h2.database.H2Tree.compare(H2Tree.java:40)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> compare(BPlusTree.java:4359)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> findInsertionPoint(BPlusTree.java:4279)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> access$1500(BPlusTree.java:81)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Replace.run0(BPlusTree.java:342)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Replace.run0(BPlusTree.java:330)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> GetPageHandler.run(BPlusTree.java:4697)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> GetPageHandler.run(BPlusTree.java:4682)
>         at
> o.a.i.i.processors.cache.persistence.tree.util.PageHandler.writePage(
> PageHandler.java:342)
>         at
> o.a.i.i.processors.cache.persistence.DataStructure.
> write(DataStructure.java:261)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> access$11200(BPlusTree.java:81)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree$
> Put.tryReplace(BPlusTree.java:2875)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2279)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> putDown(BPlusTree.java:2266)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> doPut(BPlusTree.java:2006)
>         at
> o.a.i.i.processors.cache.persistence.tree.BPlusTree.
> put(BPlusTree.java:1977)
>         at
> o.a.i.i.processors.query.h2.database.H2TreeIndex.put(H2TreeIndex.java:207)
>         at
> o.a.i.i.processors.query.h2.opt.GridH2Table.addToIndex(
> GridH2Table.java:729)
>
> Is there any correlation between the starving striped pool size and cache
> transaction taking too long?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
Best regards,
Andrey V. Mashenkov

Reply via email to