Re: Grid freezing

2017-11-15 Thread smurphy
I figured this out at long last...

The root cause of the problem was the Scan object's toString method:

@Override
public String toString() {
return ReflectionToStringBuilder.reflectionToString(this);
}


It used the apache common-lang's RefelectionToStringBuilder and this jar was
not in the ignite libs folder.
Once in there, transactions worked with multiple server nodes under heavy
loads.
The problem was easy to reproduce - just remove this jar from the libs
folder.

The error message that results is the same as shown above and does not point
to the real underlying cause, so that is what caused the confusion:
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.implicitSingleResult(IgniteTxLocalAdapter.java:352)
 

I am guessing that because this ReflectionToStringBuilder.toString() is only
used in the Scan object's toString() method, it is never needed by Ignite
until at some point an exception is thrown and a logger calls the toString
on the Scan object and a java.lang.NoClassDefFoundError is not
anticipated...

I tried invoking the ReflectionToStringBuilder.toString() more directly from
the dequeuer code itself and I did see the expected error in the logs:
java.lang.NoClassDefFoundError:
org/apache/commons/lang/builder/ReflectionToStringBuilder at...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-11-09 Thread ezhuravlev
Is it possible that in dequeuePortionIds you will have the same keys(at least
one the same key) for scanCache in different threads? If it's possible, I
think it's pretty easy to face a deadlock here with 
_getCurrentMax(customerMaxCache, queryScan), while I suppose 2 different
Scans could have the same customer(well, at least it could be deduced from
your comments).

Could you share full reproducer with the community? It's hard to try to find
a deadlock if you don't even have a source code.

Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-11-06 Thread smurphy
The WARNING message above is not consistently showing up in the logs. 
Nodes are not going down either. 
What I have found is that running 1 server and 2 clients does NOT freeze,
even under heavy loads.
But having 2 ignite servers does freeze up. 
Two servers are unsuccessful either on separate boxes or on the same box
using localhost.

Looking at the thread dump of one server, 7 threads are waiting at
TransactionProxyImpl.commit (line 259) (Excerpt 1 & 2 below)

The thread dump of the other server has 8 thread waiting at
IgniteCacheProxy.getAllOutTx (line 1328) (Excerpt 3)
The thread number of one of these waiting threads is also waiting in the
CacheAffinitySharedManager  (Excerpt 4)
Even though Excerpt 3 and 4 call getAllOutTx, they are the first place in
the the transaction block where an Ignite cache is called.

*Excerpt 1:*

"pub-#75%dna%" prio=5 tid=96 WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park()
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:779)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:103)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:461)
at
org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:283)
   Local Variable: java.lang.NullPointerException#2
at
org.apache.ignite.internal.util.future.GridCompoundFuture.markInitialized(GridCompoundFuture.java:269)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1468)
   Local Variable: java.lang.NoClassDefFoundError#1
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:668)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1034)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareAsyncLocal(GridNearTxLocal.java:3343)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareColocatedTx(IgniteTxHandler.java:221)
   Local Variable:
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler#1
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticSerializableTxPrepareFuture.prepareLocal(GridNearOptimisticSerializableTxPrepareFuture.java:576)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticSerializableTxPrepareFuture.prepare(GridNearOptimisticSerializableTxPrepareFuture.java:493)
   Local Variable: org.apache.ignite.internal.util.GridLeanMap#24
   Local Variable:
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode#1
   Local Variable:
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxMapping#4
   Local Variable:
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxPrepareRequest#1
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticSerializableTxPrepareFuture.prepare(GridNearOptimisticSerializableTxPrepareFuture.java:407)
   Local Variable: java.util.AbstractMap$2#1
   Local Variable:
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticSerializableTxPrepareFuture$MiniFuture#2
   Local Variable: java.util.AbstractMap$2#2
   Local Variable: java.util.ArrayList$Itr#1
   Local Variable: java.util.ArrayList#7898
   Local Variable: java.util.HashMap#5544
   Local Variable:
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxMapping#1
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticSerializableTxPrepareFuture.prepare0(GridNearOptimisticSerializableTxPrepareFuture.java:314)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:137)
   Local Variable:
org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion#33
   Local Variable:
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture#11
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:74)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3110)
   Local Variable:

Re: Grid freezing

2017-10-27 Thread Eduard Shangareev
Hi, guys.

It looks like https://issues.apache.org/jira/browse/IGNITE-6313.

On Fri, Oct 27, 2017 at 10:28 AM, Evgenii Zhuravlev <
e.zhuravlev...@gmail.com> wrote:

> In logs i see messages:
>
> WARNING: Job is being cancelled because master task node left grid (as there 
> is no one waiting for results, job will not be failed over)
>
> Most possible that it's a root cause of problem you've described earlier
>
>
> 2017-10-26 21:27 GMT+03:00 smurphy :
>
>> Here is an additional log message that suggests that rolling back the
>> transaction is not working.
>> I do not know why the isolation level in this log is marked as
>> READ_COMMITTED. All transactions are configured to be Optimistic and
>> Serializable..:
>>
>>
>> 2017-10-26 10:49:12,524 ERROR [dna-scan-engine 172.29.11.197]
>> progress_monitor_1   {Log4JLogger.java:495} -  Failed to
>> add
>> cause to the end of cause chain (cause is printed here but will not be
>> propagated to callee): class o.a.i.i.transactions.IgniteTx
>> TimeoutCheckedException: Failed to acquire lock within provided timeout
>> for
>> transaction [timeout=1, tx=GridNearTxLocal
>> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
>> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311, hasVal
>> Bytes=true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
>> [part=311, val=311, hasValBytes=true], cacheId=-211228266],
>> val=[op=DELETE,
>> val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
>> entryProcessorsCol=null, ttl=-1, conflictExpireTi
>> me=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
>> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
>> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
>> [key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
>>  startVer=1509032890673, ver=GridCacheVersion [topVer=120512802,
>> order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
>> prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
>> locMapped=false, expiryPlc=null, transferExpiryPlc=false, flag
>> s=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
>> [topVer=120512802, order=1509032890671, nodeOrder=4]]],
>> explicitLock=false,
>> dhtVer=null, last=false, nearEntries=0, clientFirst=false,
>> node=790731d7-c002-436f-ae3e-9a3bd7944581]], nearLocallyMapped=false,
>>  colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
>> thread=,
>> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
>> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311,
>> hasValBytes=
>> true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
>> [part=311, val=311, hasValBytes=true], cacheId=-211228266],
>> val=[op=DELETE,
>> val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
>> entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1,
>>  conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
>> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
>> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
>> [key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
>> start
>> Ver=1509032890673, ver=GridCacheVersion [topVer=120512802,
>> order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
>> prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
>> locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, p
>> artUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
>> [topVer=120512802,
>> order=1509032890671, nodeOrder=4]]], explicitLock=false, dhtVer=null,
>> last=false, nearEntries=0, clientFirst=false,
>> node=790731d7-c002-436f-ae3e-9a3bd7944581]], super=GridDhtTxLocalAdapter
>> [n
>> earOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false,
>> super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
>> depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
>> recovery=false], super=IgniteTxAdapter [xidVer=Gr
>> idCacheVersion [topVer=120512802, order=1509032890671, nodeOrder=4],
>> writeVer=null, implicit=true, loc=true, threadId=208,
>> startTime=1509032942451, nodeId=f55eb4f4-57cb-4a6e-b1db-5f91b8a0f34e,
>> startVer=GridCacheVersion [topVer=120512802, order=1509032890671,
>> nodeOrder
>> =4], endVer=null, isolation=READ_COMMITTED, concurrency=OPTIMISTIC,
>> timeout=1, sysInvalidate=false, sys=false, plc=2,
>> commitVer=GridCacheVersion [topVer=120512802, order=1509032890671,
>> nodeOrder=4], finalizing=NONE, invalidParts=null, state=PREPARING,
>> timedOut=fal
>> se, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=8],
>> duration=10024ms, onePhaseCommit=false], size=1
>> class org.apache.ignite.IgniteCheckedException: Failed to commit
>> transaction:
>> GridNearTxLocal[id=f219e595f51--072e-e122--0004,
>> concurrency=OPTIMISTIC, isolation=READ_COMMITTED, 

Re: Grid freezing

2017-10-27 Thread Evgenii Zhuravlev
In logs i see messages:

WARNING: Job is being cancelled because master task node left grid (as
there is no one waiting for results, job will not be failed over)

Most possible that it's a root cause of problem you've described earlier


2017-10-26 21:27 GMT+03:00 smurphy :

> Here is an additional log message that suggests that rolling back the
> transaction is not working.
> I do not know why the isolation level in this log is marked as
> READ_COMMITTED. All transactions are configured to be Optimistic and
> Serializable..:
>
>
> 2017-10-26 10:49:12,524 ERROR [dna-scan-engine 172.29.11.197]
> progress_monitor_1   {Log4JLogger.java:495} -  Failed to
> add
> cause to the end of cause chain (cause is printed here but will not be
> propagated to callee): class o.a.i.i.transactions.IgniteTx
> TimeoutCheckedException: Failed to acquire lock within provided timeout for
> transaction [timeout=1, tx=GridNearTxLocal
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311, hasVal
> Bytes=true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
> [part=311, val=311, hasValBytes=true], cacheId=-211228266], val=[op=DELETE,
> val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
> entryProcessorsCol=null, ttl=-1, conflictExpireTi
> me=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
>  startVer=1509032890673, ver=GridCacheVersion [topVer=120512802,
> order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
> prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
> locMapped=false, expiryPlc=null, transferExpiryPlc=false, flag
> s=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
> [topVer=120512802, order=1509032890671, nodeOrder=4]]], explicitLock=false,
> dhtVer=null, last=false, nearEntries=0, clientFirst=false,
> node=790731d7-c002-436f-ae3e-9a3bd7944581]], nearLocallyMapped=false,
>  colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
> thread=,
> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311,
> hasValBytes=
> true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
> [part=311, val=311, hasValBytes=true], cacheId=-211228266], val=[op=DELETE,
> val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
> entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1,
>  conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
> filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
> [super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
> start
> Ver=1509032890673, ver=GridCacheVersion [topVer=120512802,
> order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
> prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
> locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, p
> artUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
> [topVer=120512802,
> order=1509032890671, nodeOrder=4]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=790731d7-c002-436f-ae3e-9a3bd7944581]], super=GridDhtTxLocalAdapter
> [n
> earOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false,
> super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
> depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
> recovery=false], super=IgniteTxAdapter [xidVer=Gr
> idCacheVersion [topVer=120512802, order=1509032890671, nodeOrder=4],
> writeVer=null, implicit=true, loc=true, threadId=208,
> startTime=1509032942451, nodeId=f55eb4f4-57cb-4a6e-b1db-5f91b8a0f34e,
> startVer=GridCacheVersion [topVer=120512802, order=1509032890671, nodeOrder
> =4], endVer=null, isolation=READ_COMMITTED, concurrency=OPTIMISTIC,
> timeout=1, sysInvalidate=false, sys=false, plc=2,
> commitVer=GridCacheVersion [topVer=120512802, order=1509032890671,
> nodeOrder=4], finalizing=NONE, invalidParts=null, state=PREPARING,
> timedOut=fal
> se, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=8],
> duration=10024ms, onePhaseCommit=false], size=1
> class org.apache.ignite.IgniteCheckedException: Failed to commit
> transaction:
> GridNearTxLocal[id=f219e595f51--072e-e122--0004,
> concurrency=OPTIMISTIC, isolation=READ_COMMITTED, state=ROLLED_BACK,
> invalidate=false, rollbackOnly=true, nodeId=f55eb4f4
> -57cb-4a6e-b1db-5f91b8a0f34e, duration=10037]
> at
> org.apache.ignite.internal.processors.cache.distributed.
> near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:423)
> at

Re: Grid freezing

2017-10-26 Thread smurphy
Here is an additional log message that suggests that rolling back the
transaction is not working.
I do not know why the isolation level in this log is marked as
READ_COMMITTED. All transactions are configured to be Optimistic and
Serializable..:


2017-10-26 10:49:12,524 ERROR [dna-scan-engine 172.29.11.197]
progress_monitor_1   {Log4JLogger.java:495} -  Failed to add
cause to the end of cause chain (cause is printed here but will not be
propagated to callee): class o.a.i.i.transactions.IgniteTx
TimeoutCheckedException: Failed to acquire lock within provided timeout for
transaction [timeout=1, tx=GridNearTxLocal
[mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
[entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311, hasVal
Bytes=true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
[part=311, val=311, hasValBytes=true], cacheId=-211228266], val=[op=DELETE,
val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
entryProcessorsCol=null, ttl=-1, conflictExpireTi
me=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
[super=GridDistributedCacheEntry [super=GridCacheMapEntry
[key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
 startVer=1509032890673, ver=GridCacheVersion [topVer=120512802,
order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flag
s=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
[topVer=120512802, order=1509032890671, nodeOrder=4]]], explicitLock=false,
dhtVer=null, last=false, nearEntries=0, clientFirst=false,
node=790731d7-c002-436f-ae3e-9a3bd7944581]], nearLocallyMapped=false,
 colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
thread=,
mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
[entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=311, val=311,
hasValBytes=
true], cacheId=-211228266, txKey=IgniteTxKey [key=KeyCacheObjectImpl
[part=311, val=311, hasValBytes=true], cacheId=-211228266], val=[op=DELETE,
val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null],
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1,
 conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
filtersPassed=false, filtersSet=true, entry=GridDhtDetachedCacheEntry
[super=GridDistributedCacheEntry [super=GridCacheMapEntry
[key=KeyCacheObjectImpl [part=311, val=311, hasValBytes=true], val=null,
start
Ver=1509032890673, ver=GridCacheVersion [topVer=120512802,
order=1509032890673, nodeOrder=4], hash=311, extras=null, flags=0]]],
prepared=0, locked=false, nodeId=790731d7-c002-436f-ae3e-9a3bd7944581,
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, p
artUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=120512802,
order=1509032890671, nodeOrder=4]]], explicitLock=false, dhtVer=null,
last=false, nearEntries=0, clientFirst=false,
node=790731d7-c002-436f-ae3e-9a3bd7944581]], super=GridDhtTxLocalAdapter [n
earOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false,
super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
recovery=false], super=IgniteTxAdapter [xidVer=Gr
idCacheVersion [topVer=120512802, order=1509032890671, nodeOrder=4],
writeVer=null, implicit=true, loc=true, threadId=208,
startTime=1509032942451, nodeId=f55eb4f4-57cb-4a6e-b1db-5f91b8a0f34e,
startVer=GridCacheVersion [topVer=120512802, order=1509032890671, nodeOrder
=4], endVer=null, isolation=READ_COMMITTED, concurrency=OPTIMISTIC,
timeout=1, sysInvalidate=false, sys=false, plc=2,
commitVer=GridCacheVersion [topVer=120512802, order=1509032890671,
nodeOrder=4], finalizing=NONE, invalidParts=null, state=PREPARING,
timedOut=fal
se, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=8],
duration=10024ms, onePhaseCommit=false], size=1
class org.apache.ignite.IgniteCheckedException: Failed to commit
transaction:
GridNearTxLocal[id=f219e595f51--072e-e122--0004,
concurrency=OPTIMISTIC, isolation=READ_COMMITTED, state=ROLLED_BACK,
invalidate=false, rollbackOnly=true, nodeId=f55eb4f4
-57cb-4a6e-b1db-5f91b8a0f34e, duration=10037]
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:423)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.rollbackNearTxLocalAsync(GridNearTxLocal.java:3255)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$5.applyx(GridNearTxLocal.java:1624)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$5.applyx(GridNearTxLocal.java:1614)
at

Re: Grid freezing

2017-10-26 Thread smurphy
One thing that I see in the logs looks to me like a NullPointerException on
attempting to commit an optimistic/serializable transaction.
My reading of this excpetion is that an optimistic lock conflict is
correctly detected in prepare0 but that  eventually the code incorrectly
attempts to return this.ret which is null rather than an
IgniteTxOptimisticCheckedException...:



Oct 25 19:25:46 srvr.company apache-ignite[45092]: [19:25:46] (err) Failed
to execute compound future reducer: GridDhtTxPrepareFuture
[futId=2a3ec065f51-75e18206-0fee-4926-99be-bb6fee7b5828, err=null,
replied=1, mapped=1, reads=[], writes=[IgniteTxEntry [key=KeyCacheObjectImpl
[part=115, val=115, hasValBytes=true], cacheId=-211228266, txKey=IgniteTxKey
[key=KeyCacheObjectImpl [part=115, val=115, hasValBytes=true],
cacheId=-211228266], val=[op=UPDATE,
val=com.company.dna.scan.fragment.node.domain.Scan [idHash=1148364470,
hash=525731140, _fragmentIds=HashSet {}, _scannerGroupId=16,
_visibility=EXTERNAL, _scanUUID=d1a7a992-6ff2-4eb0-8398-06e45b15d08b,
_customerId=6779780, _customerMaxReached=false, _priority=1,
_scanStartedMsgSent=false, _empty=true, _idx=115, _canceled=true,
_scanId=45534, _version=2]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP,
val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1,
conflictVer=null, explicitVer=null, dhtVer=null, filters=[],
filtersPassed=false, filtersSet=false, entry=GridDhtCacheEntry [rdrs=[],
part=115, super=GridDistributedCacheEntry [super=GridCacheMapEntry
[key=KeyCacheObjectImpl [part=115, val=115, hasValBytes=true],
val=com.company.dna.scan.fragment.node.domain.Scan [idHash=582900940,
hash=-1150571357, _fragmentIds=HashSet {}, _scannerGroupId=16,
_visibility=EXTERNAL, _scanUUID=d1a7a992-6ff2-4eb0-8398-06e45b15d08b,
_customerId=6779780, _customerMaxReached=false, _priority=1,
_scanStartedMsgSent=true, _empty=true, _idx=115, _canceled=false,
_scanId=45534, _version=3], startVer=1508977480914, ver=GridCacheVersion
[topVer=120361566, order=1508977480909, nodeOrder=33], hash=115,
extras=GridCacheMvccEntryExtras [mvcc=GridCacheMvcc
[locs=[GridCacheMvccCandidate [nodeId=47680ca3-827f-4788-af9e-d73dd7b030e0,
ver=GridCacheVersion [topVer=120361566, order=1508977480913, nodeOrder=33],
threadId=111, id=396, topVer=AffinityTopologyVersion [topVer=39,
minorTopVer=0], reentry=null,
otherNodeId=7a2804c4-5fae-4258-b6e2-96e75a219ea7, otherVer=GridCacheVersion
[topVer=120361566, order=1508977480890, nodeOrder=39], mappedD
Oct 25 19:25:46 srvr.company apache-ignite[45092]: htNodes=null,
mappedNearNodes=null, ownerVer=null, serOrder=GridCacheVersion
[topVer=120361566, order=1508977480890, nodeOrder=39],
key=KeyCacheObjectImpl [part=115, val=115, hasValBytes=true],
masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0|read=0,
prevVer=null, nextVer=null]], rmts=null]], flags=2]]], prepared=1,
locked=false, nodeId=null, locMapped=false, expiryPlc=null,
transferExpiryPlc=false, flags=0, partUpdateCntr=0,
serReadVer=GridCacheVersion [topVer=120361566, order=1508977480161,
nodeOrder=33], xidVer=null]], trackable=true, nearMiniId=2, last=true,
retVal=false, ret=null, lockKeys=[], forceKeysFut=null, locksReady=true,
invoke=false, timeoutObj=PrepareTimeoutObject [timeout=9976],
xid=GridCacheVersion [topVer=120361566, order=1508977480913, nodeOrder=33],
innerFuts=[], super=GridCompoundFuture
[rdc=o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture$1@38648d98,
initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null,
futs=[]]]java.lang.NullPointerException
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.implicitSingleResult(IgniteTxLocalAdapter.java:352)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.createPrepareResponse(GridDhtTxPrepareFuture.java:875)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:763)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:103)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:278)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.util.future.GridCompoundFuture.markInitialized(GridCompoundFuture.java:269)
Oct 25 19:25:46 srvr.company apache-ignite[45092]: at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1468)
Oct 25 19:25:46 srvr.company 

Re: Grid freezing

2017-10-26 Thread smurphy


log_1.txt
  
log_2.txt
  
log_3.txt
  

I don't see any nodes shutting down until 9 minutes after this particular
log..
But please see the attached are server logs for fuller details..





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-10-26 Thread smurphy
No - this is what the invocation looks like:

DequeuePortionsCallable job = new
DequeuePortionsCallable(ignitePortionDequeuer);
DequeuedPortionResponse response = _ignite.compute().call(job);




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-10-26 Thread Evgenii Zhuravlev
Is it possible that you run DequeuePortionsCallable with timeout?

Evgenii

2017-10-26 6:40 GMT+03:00 smurphy :

> I added a transaction timeout of 1 millis and a transation size of 100
> (transaction is Optimistic and Serializable)...
>
> I see no TransactionTimeoutExceptions, just the following:
> CacheException: class org.apache.ignite.IgniteInterruptedException: Got
> interrupted while waiting for future to complete..
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Grid freezing

2017-10-25 Thread smurphy
I added a transaction timeout of 1 millis and a transation size of 100
(transaction is Optimistic and Serializable)...

I see no TransactionTimeoutExceptions, just the following:
CacheException: class org.apache.ignite.IgniteInterruptedException: Got
interrupted while waiting for future to complete..



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-10-24 Thread smurphy
Thanks Evgenii,

I'll add that catch block and see if it sheds any light on the issue. 

The client's transaction configuration is set to Optimistic and Serializable
and the transaction within the try with resources block is explicitly set to
Optimistic and Serializable, which should preclude deadlock...but there does
seem to be deadlock..

The servers do not have any transaction configuration set explicitly..



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-10-24 Thread Evgenii Zhuravlev
Looks like possible deadlock. Please add timeout for transactions and check
it with deadlock detection, as described here:
https://apacheignite.readme.io/docs/transactions#section-deadlock-detection

Evgenii

2017-10-23 21:16 GMT+03:00 smurphy :

> IgnitePortionDequeuer.java
>  t1317/IgnitePortionDequeuer.java>
>
> top.visor
> 
>
> Hi Evgenii,
>
> See the attached top command and java file..
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Grid freezing

2017-10-23 Thread smurphy
IgnitePortionDequeuer.java

  

top.visor
  

Hi Evgenii,

See the attached top command and java file..



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Grid freezing

2017-10-23 Thread Evgenii Zhuravlev
Hi!

How many nodes do you have? How many CPUs? Could you provide a code of
com.company.node.ignite.IgnitePortionDequeuer
or at least a dequeuePortions method?

Evgenii

2017-10-20 0:24 GMT+03:00 smurphy :

> threaddump.tdump
>  t1317/threaddump.tdump>
>
> I am using Ignite v2.1 and my code using Optimistic/Serializable
> transactions and is locking up. When it does, there are a lot of `WARNING:
> Found long running transaction` and `WARNING: Found long running cache
> future` messages in the logs (see below). Eventually the grid freezes and
> there are a lot of "IgniteInterruptedException: Got interrupted while
> waiting for future to complete." (see below). Also, see attached thread
> dump
> when the grid is finally stopped.
>
> Using visor, I could see the active threads increase until all are
> consumed.
> I boosted the thread pool count to 100 and saw the active threads rise
> steadily until all were consumed. At theat point, the grid freezes up.
>
> Using visor ping multiple times, the servers always ping successfully but
> the client fails intermittently.
>
> LONG RUNNING TRANSACTION:
>
> 2017-10-19 13:03:06,698 WARN  [ ] grid-timeout-worker-#15%null%
> {Log4JLogger.java:480} - Found long running transaction
> [startTime=12:30:33.985, curTime=13:03:06.682, tx=GridNearTxLocal
> [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], nearLocallyMapped=false,
> colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false,
> thread=,
> mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping
> [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey
> [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl
> [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true],
> cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP,
> val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple
> [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]],
> ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null,
> dhtVer=null, filters=[], filtersPassed=false, filtersSet=true,
> entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry
> [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126,
> val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group],
> hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion
> [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214,
> extras=null, flags=0]]], prepared=0, locked=false,
> nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false,
> expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0,
> serReadVer=null, xidVer=GridCacheVersion [topVer=119473256,
> order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null,
> last=false, nearEntries=0, clientFirst=false,
> node=36b4d422-d011-4f77-919a-b8ffb089614b]], super=GridDhtTxLocalAdapter
> [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[],
> explicitLock=false,
> super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false,
> depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true,
> recovery=false], super=IgniteTxAdapter [xidVer=GridCacheVersion
> [topVer=119473256, order=1508434155230, nodeOrder=2243], writeVer=null,
> implicit=true, loc=true, threadId=109,