Re: error - Failed to wait for partition release future

2017-02-16 Thread Andrey Mashenkov
Hi Cameron,

It looks like nodes cant see each other.
Would you please attach grid configuration and full logs?


BTW, ignite version on nodes still "ver=1.8.1#20161208-sha1:a8fcb7b2"

On Fri, Feb 17, 2017 at 2:15 AM, Cameron Braid  wrote:

> Ok,  I have moved to using apache ignite 1.8.0 jar and the same issue is
> there.
>
> I don't believe that  it is slow networking that is causing this as it does
> also happen on a single host single node cluster - its hard to reproduce
> but it does happen.
>
> I've included a log from the grid locking up  on a single node grid.
>
> Any ideas on what I can do to stop the cluster from locking up ?
>
> Feb 17 09:09:22 webapp-97-3v11l webapp  85614 WARN
> o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture
> [exchange-worker-#26%webapp%] Failed to wait for partition release future
> [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=6],
> node=68845172-67fb-4f49-ae81-f3024ef89e66]. Dumping pending objects that
> might be the cause:
> Feb 17 09:09:22 webapp-97-3v11l webapp  85614 WARN
> o.a.i.i.p.c.GridCachePartitionExchangeManager
> [exchange-worker-#26%webapp%]
> Ready affinity version: AffinityTopologyVersion [topVer=1, minorTopVer=5]
> Feb 17 09:09:22 webapp-97-3v11l webapp  85619 WARN
> o.a.i.i.p.c.GridCachePartitionExchangeManager
> [exchange-worker-#26%webapp%]
> Last exchange future: GridDhtPartitionsExchangeFuture [dummy=false,
> forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent
> [customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest
> [deploymentId=d7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2,
> startCfg=CacheConfiguration [name=query.starRatings.aggregate,
> storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2,
> rebalanceTimeout=1, evictPlc=LruEvictionPolicy [max=500, batchSize=1,
> maxMemSize=0, memSize=0, size=0], evictSync=false, evictKeyBufSize=1024,
> evictSyncConcurrencyLvl=4, evictSyncTimeout=1, evictFilter=null,
> evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0,
> startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC, storeFactory=null,
> storeKeepBinary=false, loadPrevVal=false,
> aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@3cfe01ec,
> cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY,
> backups=2147483647, invalidate=false, tmLookupClsName=null,
> rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288,
> rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false,
> maxConcurrentAsyncOps=500, writeBehindEnabled=false,
> writeBehindFlushSize=10240, writeBehindFlushFreq=5000,
> writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024,
> memMode=ONHEAP_TIERED,
> affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyM
> apper@387ac79,
> rebalanceDelay=0, rebalanceThrottle=0, interceptor=null,
> longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true,
> nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@
> 2a423797,
> sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240,
> snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER,
> initiatingNodeId=68845172-67fb-4f49-ae81-f3024ef89e66, nearCacheCfg=null,
> clientStartOnly=false, stop=false, close=false, failIfExists=true,
> template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null,
> cacheName=query.starRatings.aggregate]], clientNodes=null,
> id=e7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2,
> clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=1,
> minorTopVer=6], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo,
> 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500,
> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1,
> intOrder=1, lastExchangeTime=1487282891381, loc=true,
> ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=1,
> nodeId8=68845172, msg=null, type=DISCOVERY_CUSTOM_EVT,
> tstamp=1487282952548]], crd=TcpDiscoveryNode
> [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo,
> 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500,
> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1,
> intOrder=1, lastExchangeTime=1487282891381, loc=true,
> ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false],
> exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
> [topVer=1, minorTopVer=6], nodeId=68845172, evt=DISCOVERY_CUSTOM_EVT],
> added=true, initFut=GridFutureAdapter [resFlag=0, res=null,
> startTime=1487282952548, endTime=0, ignoreInterrupts=false, state=INIT],
> init=false, topSnapshot=null, lastVer=null,
> partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3,
> done=false, cancelled=false, err=null, futs=[true, true, true, false]],
> affChangeMsg=null, skipPreload=false, clientOnlyExchange=false,
> initTs=1487282952548, centralizedAff=false, 

Re: error - Failed to wait for partition release future

2017-02-16 Thread Cameron Braid
Ok,  I have moved to using apache ignite 1.8.0 jar and the same issue is
there.

I don't believe that  it is slow networking that is causing this as it does
also happen on a single host single node cluster - its hard to reproduce
but it does happen.

I've included a log from the grid locking up  on a single node grid.

Any ideas on what I can do to stop the cluster from locking up ?

Feb 17 09:09:22 webapp-97-3v11l webapp  85614 WARN
o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture
[exchange-worker-#26%webapp%] Failed to wait for partition release future
[topVer=AffinityTopologyVersion [topVer=1, minorTopVer=6],
node=68845172-67fb-4f49-ae81-f3024ef89e66]. Dumping pending objects that
might be the cause:
Feb 17 09:09:22 webapp-97-3v11l webapp  85614 WARN
o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%]
Ready affinity version: AffinityTopologyVersion [topVer=1, minorTopVer=5]
Feb 17 09:09:22 webapp-97-3v11l webapp  85619 WARN
o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%]
Last exchange future: GridDhtPartitionsExchangeFuture [dummy=false,
forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent
[customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest
[deploymentId=d7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2,
startCfg=CacheConfiguration [name=query.starRatings.aggregate,
storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2,
rebalanceTimeout=1, evictPlc=LruEvictionPolicy [max=500, batchSize=1,
maxMemSize=0, memSize=0, size=0], evictSync=false, evictKeyBufSize=1024,
evictSyncConcurrencyLvl=4, evictSyncTimeout=1, evictFilter=null,
evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0,
startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC, storeFactory=null,
storeKeepBinary=false, loadPrevVal=false,
aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@3cfe01ec,
cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY,
backups=2147483647, invalidate=false, tmLookupClsName=null,
rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288,
rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false,
maxConcurrentAsyncOps=500, writeBehindEnabled=false,
writeBehindFlushSize=10240, writeBehindFlushFreq=5000,
writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024,
memMode=ONHEAP_TIERED,
affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper@387ac79,
rebalanceDelay=0, rebalanceThrottle=0, interceptor=null,
longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true,
nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@2a423797,
sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240,
snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER,
initiatingNodeId=68845172-67fb-4f49-ae81-f3024ef89e66, nearCacheCfg=null,
clientStartOnly=false, stop=false, close=false, failIfExists=true,
template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null,
cacheName=query.starRatings.aggregate]], clientNodes=null,
id=e7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2,
clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=1,
minorTopVer=6], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo,
10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1487282891381, loc=true,
ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=1,
nodeId8=68845172, msg=null, type=DISCOVERY_CUSTOM_EVT,
tstamp=1487282952548]], crd=TcpDiscoveryNode
[id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo,
10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500,
/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1487282891381, loc=true,
ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=1, minorTopVer=6], nodeId=68845172, evt=DISCOVERY_CUSTOM_EVT],
added=true, initFut=GridFutureAdapter [resFlag=0, res=null,
startTime=1487282952548, endTime=0, ignoreInterrupts=false, state=INIT],
init=false, topSnapshot=null, lastVer=null,
partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3,
done=false, cancelled=false, err=null, futs=[true, true, true, false]],
affChangeMsg=null, skipPreload=false, clientOnlyExchange=false,
initTs=1487282952548, centralizedAff=false, evtLatch=0, remaining=[],
srvNodes=[TcpDiscoveryNode [id=68845172-67fb-4f49-ae81-f3024ef89e66,
addrs=[0:0:0:0:0:0:0:1%lo, 10.129.1.132, 127.0.0.1],
sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, /0:0:0:0:0:0:0:1%lo:47500, /
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1487282891381, loc=true, ver=1.8.1#20161208-sha1:a8fcb7b2,
isClient=false]], super=GridFutureAdapter [resFlag=0, res=null,

Re: error - Failed to wait for partition release future

2017-02-16 Thread Andrey Mashenkov
Hi Cameron,

You wrote on Apache Ignite Community dev-list.  Iginte does not have
versions like 1.8.x.
Git repository you refer to is not belong to the Apache Ignite project.


On Thu, Feb 16, 2017 at 6:58 PM, Cameron Braid  wrote:

> Hi,
>
> Thanks for looking at my issue.
>
> There are tags for 1.8.1, 1.8.2 and 1.8.3 at
> https://github.com/gridgain/apache-ignite  so I have been trying using
> jars
> built from those tags, including 1.9
>
> I'm still getting the issues. What concerns me the most is that it is very
> easy to lockup the whole grid.  I have tried various short (2s) .. long
> (40s) FailureDetectionTimeout and I still get a lockup.  I've tried the
> prefer ip4 as you suggested but it didn't make any difference.  I've
> included my jvm args, an excerpt from the jvm gc logs, and some more recent
> ignite logs.
>
> Cheers
>
> Cameron
>
> *[jvm args]*
>
> -Xms2G -Xmx2G
>
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=100M
> -Xloggc:target/gc-log-devA.txt
>
> -XX:+UseParNewGC  -XX:+UseConcMarkSweepGC  -XX:+UseTLAB
> -XX:NewSize=128m  -XX:MaxNewSize=128m
> -XX:MaxTenuringThreshold=0  -XX:SurvivorRatio=1024
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=40 -XX:MaxGCPauseMillis=1000
> -XX:InitiatingHeapOccupancyPercent=50  -XX:+UseCompressedOops
> -XX:ParallelGCThreads=6  -XX:ConcGCThreads=6
>
> *[GC logs]*
>
> 2017-02-17T02:43:31.345+1100: 1997.387: [CMS-concurrent-reset: 0.004/0.004
> secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
> 2017-02-17T02:43:32.099+1100: 1998.141: [GC (Allocation Failure)
> 2017-02-17T02:43:32.100+1100: 1998.141: [ParNew: 130485K->0K(131008K),
> 0.0332423 secs] 773911K->655291K(2097088K), 0.0336493 secs] [Times:
> user=0.10 sys=0.00, real=0.04 secs]
> 2017-02-17T02:43:32.498+1100: 1998.539: [GC (Allocation Failure)
> 2017-02-17T02:43:32.498+1100: 1998.539: [ParNew: 130944K->0K(131008K),
> 0.0489761 secs] 786235K->683673K(2097088K), 0.0492079 secs] [Times:
> user=0.13 sys=0.01, real=0.05 secs]
> 2017-02-17T02:43:33.578+1100: 1999.620: [GC (Allocation Failure)
> 2017-02-17T02:43:33.578+1100: 1999.620: [ParNew: 130944K->0K(131008K),
> 0.0589320 secs] 814617K->710468K(2097088K), 0.0594862 secs] [Times:
> user=0.19 sys=0.01, real=0.06 secs]
> 2017-02-17T02:43:33.997+1100: 2000.038: [GC (Allocation Failure)
> 2017-02-17T02:43:33.997+1100: 2000.038: [ParNew: 130944K->0K(131008K),
> 0.0382098 secs] 841412K->723510K(2097088K), 0.0386616 secs] [Times:
> user=0.11 sys=0.00, real=0.04 secs]
> 2017-02-17T02:43:34.724+1100: 2000.765: [GC (Allocation Failure)
> 2017-02-17T02:43:34.724+1100: 2000.765: [ParNew: 130944K->0K(131008K),
> 0.0061646 secs] 854454K->725669K(2097088K), 0.0064497 secs] [Times:
> user=0.03 sys=0.01, real=0.01 secs]
> 2017-02-17T02:43:35.303+1100: 2001.344: [GC (Allocation Failure)
> 2017-02-17T02:43:35.303+1100: 2001.345: [ParNew: 130944K->0K(131008K),
> 0.0102684 secs] 856613K->727224K(2097088K), 0.0105370 secs] [Times:
> user=0.04 sys=0.00, real=0.01 secs]
> 2017-02-17T02:43:35.582+1100: 2001.623: [GC (Allocation Failure)
> 2017-02-17T02:43:35.582+1100: 2001.623: [ParNew: 130944K->0K(131008K),
> 0.0201246 secs] 858168K->735764K(2097088K), 0.0204913 secs] [Times:
> user=0.09 sys=0.00, real=0.02 secs]
> 2017-02-17T02:43:35.856+1100: 2001.898: [GC (Allocation Failure)
> 2017-02-17T02:43:35.856+1100: 2001.898: [ParNew: 130944K->0K(131008K),
> 0.0176264 secs] 866708K->745151K(2097088K), 0.0178741 secs] [Times:
> user=0.08 sys=0.00, real=0.02 secs]
> 2017-02-17T02:43:36.554+1100: 2002.595: [GC (Allocation Failure)
> 2017-02-17T02:43:36.554+1100: 2002.596: [ParNew: 130944K->0K(131008K),
> 0.0480600 secs] 876095K->752482K(2097088K), 0.0483472 secs] [Times:
> user=0.08 sys=0.00, real=0.05 secs]
> 2017-02-17T02:43:38.991+1100: 2005.033: [GC (Allocation Failure)
> 2017-02-17T02:43:38.991+1100: 2005.033: [ParNew: 130930K->0K(131008K),
> 0.0188153 secs] 883413K->764137K(2097088K), 0.0191396 secs] [Times:
> user=0.08 sys=0.00, real=0.02 secs]
> 2017-02-17T02:43:40.428+1100: 2006.470: [GC (Allocation Failure)
> 2017-02-17T02:43:40.428+1100: 2006.470: [ParNew: 130941K->0K(131008K),
> 0.0272629 secs] 895079K->775860K(2097088K), 0.0274685 secs] [Times:
> user=0.07 sys=0.01, real=0.02 secs]
> 2017-02-17T02:43:41.195+1100: 2007.237: [GC (Allocation Failure)
> 2017-02-17T02:43:41.195+1100: 2007.237: [ParNew: 130944K->0K(131008K),
> 0.0117222 secs] 906804K->778960K(2097088K), 0.0120822 secs] [Times:
> user=0.05 sys=0.00, real=0.01 secs]
> 2017-02-17T02:43:41.919+1100: 2007.960: [GC (Allocation Failure)
> 2017-02-17T02:43:41.919+1100: 2007.961: [ParNew: 130944K->0K(131008K),
> 0.0181342 secs] 909904K->782162K(2097088K), 0.0183815 secs] [Times:
> user=0.05 sys=0.00, real=0.02 secs]
> 2017-02-17T02:43:42.337+1100: 2008.378: [GC (Allocation Failure)
> 2017-02-17T02:43:42.337+1100: 2008.378: [ParNew: 130944K->0K(131008K),
> 0.0182636 secs] 

Re: error - Failed to wait for partition release future

2017-02-16 Thread Cameron Braid
Hi,

Thanks for looking at my issue.

There are tags for 1.8.1, 1.8.2 and 1.8.3 at
https://github.com/gridgain/apache-ignite  so I have been trying using jars
built from those tags, including 1.9

I'm still getting the issues. What concerns me the most is that it is very
easy to lockup the whole grid.  I have tried various short (2s) .. long
(40s) FailureDetectionTimeout and I still get a lockup.  I've tried the
prefer ip4 as you suggested but it didn't make any difference.  I've
included my jvm args, an excerpt from the jvm gc logs, and some more recent
ignite logs.

Cheers

Cameron

*[jvm args]*

-Xms2G -Xmx2G

-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=100M
-Xloggc:target/gc-log-devA.txt

-XX:+UseParNewGC  -XX:+UseConcMarkSweepGC  -XX:+UseTLAB
-XX:NewSize=128m  -XX:MaxNewSize=128m
-XX:MaxTenuringThreshold=0  -XX:SurvivorRatio=1024
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=40 -XX:MaxGCPauseMillis=1000
-XX:InitiatingHeapOccupancyPercent=50  -XX:+UseCompressedOops
-XX:ParallelGCThreads=6  -XX:ConcGCThreads=6

*[GC logs]*

2017-02-17T02:43:31.345+1100: 1997.387: [CMS-concurrent-reset: 0.004/0.004
secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2017-02-17T02:43:32.099+1100: 1998.141: [GC (Allocation Failure)
2017-02-17T02:43:32.100+1100: 1998.141: [ParNew: 130485K->0K(131008K),
0.0332423 secs] 773911K->655291K(2097088K), 0.0336493 secs] [Times:
user=0.10 sys=0.00, real=0.04 secs]
2017-02-17T02:43:32.498+1100: 1998.539: [GC (Allocation Failure)
2017-02-17T02:43:32.498+1100: 1998.539: [ParNew: 130944K->0K(131008K),
0.0489761 secs] 786235K->683673K(2097088K), 0.0492079 secs] [Times:
user=0.13 sys=0.01, real=0.05 secs]
2017-02-17T02:43:33.578+1100: 1999.620: [GC (Allocation Failure)
2017-02-17T02:43:33.578+1100: 1999.620: [ParNew: 130944K->0K(131008K),
0.0589320 secs] 814617K->710468K(2097088K), 0.0594862 secs] [Times:
user=0.19 sys=0.01, real=0.06 secs]
2017-02-17T02:43:33.997+1100: 2000.038: [GC (Allocation Failure)
2017-02-17T02:43:33.997+1100: 2000.038: [ParNew: 130944K->0K(131008K),
0.0382098 secs] 841412K->723510K(2097088K), 0.0386616 secs] [Times:
user=0.11 sys=0.00, real=0.04 secs]
2017-02-17T02:43:34.724+1100: 2000.765: [GC (Allocation Failure)
2017-02-17T02:43:34.724+1100: 2000.765: [ParNew: 130944K->0K(131008K),
0.0061646 secs] 854454K->725669K(2097088K), 0.0064497 secs] [Times:
user=0.03 sys=0.01, real=0.01 secs]
2017-02-17T02:43:35.303+1100: 2001.344: [GC (Allocation Failure)
2017-02-17T02:43:35.303+1100: 2001.345: [ParNew: 130944K->0K(131008K),
0.0102684 secs] 856613K->727224K(2097088K), 0.0105370 secs] [Times:
user=0.04 sys=0.00, real=0.01 secs]
2017-02-17T02:43:35.582+1100: 2001.623: [GC (Allocation Failure)
2017-02-17T02:43:35.582+1100: 2001.623: [ParNew: 130944K->0K(131008K),
0.0201246 secs] 858168K->735764K(2097088K), 0.0204913 secs] [Times:
user=0.09 sys=0.00, real=0.02 secs]
2017-02-17T02:43:35.856+1100: 2001.898: [GC (Allocation Failure)
2017-02-17T02:43:35.856+1100: 2001.898: [ParNew: 130944K->0K(131008K),
0.0176264 secs] 866708K->745151K(2097088K), 0.0178741 secs] [Times:
user=0.08 sys=0.00, real=0.02 secs]
2017-02-17T02:43:36.554+1100: 2002.595: [GC (Allocation Failure)
2017-02-17T02:43:36.554+1100: 2002.596: [ParNew: 130944K->0K(131008K),
0.0480600 secs] 876095K->752482K(2097088K), 0.0483472 secs] [Times:
user=0.08 sys=0.00, real=0.05 secs]
2017-02-17T02:43:38.991+1100: 2005.033: [GC (Allocation Failure)
2017-02-17T02:43:38.991+1100: 2005.033: [ParNew: 130930K->0K(131008K),
0.0188153 secs] 883413K->764137K(2097088K), 0.0191396 secs] [Times:
user=0.08 sys=0.00, real=0.02 secs]
2017-02-17T02:43:40.428+1100: 2006.470: [GC (Allocation Failure)
2017-02-17T02:43:40.428+1100: 2006.470: [ParNew: 130941K->0K(131008K),
0.0272629 secs] 895079K->775860K(2097088K), 0.0274685 secs] [Times:
user=0.07 sys=0.01, real=0.02 secs]
2017-02-17T02:43:41.195+1100: 2007.237: [GC (Allocation Failure)
2017-02-17T02:43:41.195+1100: 2007.237: [ParNew: 130944K->0K(131008K),
0.0117222 secs] 906804K->778960K(2097088K), 0.0120822 secs] [Times:
user=0.05 sys=0.00, real=0.01 secs]
2017-02-17T02:43:41.919+1100: 2007.960: [GC (Allocation Failure)
2017-02-17T02:43:41.919+1100: 2007.961: [ParNew: 130944K->0K(131008K),
0.0181342 secs] 909904K->782162K(2097088K), 0.0183815 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs]
2017-02-17T02:43:42.337+1100: 2008.378: [GC (Allocation Failure)
2017-02-17T02:43:42.337+1100: 2008.378: [ParNew: 130944K->0K(131008K),
0.0182636 secs] 913106K->790713K(2097088K), 0.0187463 secs] [Times:
user=0.08 sys=0.00, real=0.02 secs]
2017-02-17T02:43:42.358+1100: 2008.399: [GC (CMS Initial Mark) [1
CMS-initial-mark: 790713K(1966080K)] 791332K(2097088K), 0.0103648 secs]
[Times: user=0.02 sys=0.00, real=0.02 secs]
2017-02-17T02:43:42.368+1100: 2008.410: [CMS-concurrent-mark-start]
2017-02-17T02:43:42.668+1100: 2008.709: [CMS-concurrent-mark: 0.299/0.299
secs] [Times: user=1.76 

Re: error - Failed to wait for partition release future

2017-02-16 Thread Andrey Mashenkov
Hi Cameron,

Ignite does not have 1.8.x versions.

It look like you use IPv6 on some of you nodes.
At first, check if either -Djava.net.preferIPv6Addresses=true or
-Djava.net.preferIPv4Addresses=true java option is set for all of your
nodes
and communication port is set properly.

On Thu, Feb 16, 2017 at 1:29 AM, Cameron Braid  wrote:

> Hi,
>
> I am getting the following error in Ignite version 1.8.1 :
>
> "Failed to wait for partition release future"
>
> The related logs lines are at the bottom.
>
> I was curious if it could be related to
> https://issues.apache.org/jira/browse/IGNITE-3212 which has fix for
> version
> 1.9.  Could this be the case ?
>
> Otherwise, is this likely to be a bug in ignite, or something specific to
> my environment?
>
> Cheers
>
> Cameron
>
>
>- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN
>o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture
>[exchange-worker-#26%webapp%] Failed to wait for partition release
> future
>[topVer=AffinityTopologyVersion [topVer=21, minorTopVer=2],
>node=79d74dfa-9903-42e3-af94-524ab08c1f97]. Dumping pending objects
> that
>might be the cause:
>- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN
>o.a.i.i.p.c.GridCachePartitionExchangeManager
>[exchange-worker-#26%webapp%] Ready affinity version:
>AffinityTopologyVersion [topVer=21, minorTopVer=1]
>- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287667 WARN
>o.a.i.i.p.c.GridCachePartitionExchangeManager
>[exchange-worker-#26%webapp%] Last exchange future:
>GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false,
>reassign=false, discoEvt=DiscoveryCustomEvent
>[customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest
>[deploymentId=3f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9,
>startCfg=CacheConfiguration [name=userTable.cmsDataVehicles.query,
>storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2,
>rebalanceTimeout=1, evictPlc=null, evictSync=false,
>evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=1,
>evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true,
>dfltLockTimeout=0, startSize=150, nearCfg=null,
> writeSync=PRIMARY_SYNC,
>storeFactory=null, storeKeepBinary=false, loadPrevVal=false,
>aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@7e1c4f4,
>cacheMode=REPLICATED, atomicityMode=ATOMIC,
> atomicWriteOrderMode=PRIMARY,
>backups=2147483647, invalidate=false, tmLookupClsName=null,
>rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288,
>rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false,
>maxConcurrentAsyncOps=500, writeBehindEnabled=false,
>writeBehindFlushSize=10240, writeBehindFlushFreq=5000,
>writeBehindFlushThreadCnt=1, writeBehindBatchSize=512,
> maxQryIterCnt=1024,
>memMode=ONHEAP_TIERED,
>affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyM
> apper@6ca896dd,
>rebalanceDelay=0, rebalanceThrottle=0, interceptor=null,
>longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true,
>nodeFilter=o.a.i.configuration.CacheConfiguration$
> IgniteAllNodesPredicate@db2015b,
>sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240,
>snapshotableIdx=false, cpOnRead=true, topValidator=null],
> cacheType=USER,
>initiatingNodeId=79d74dfa-9903-42e3-af94-524ab08c1f97,
> nearCacheCfg=null,
>clientStartOnly=false, stop=false, close=false, failIfExists=true,
>template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null,
>cacheName=userTable.cmsDataVehicles.query]], clientNodes=null,
>id=4f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9,
>clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=21,
>minorTopVer=2], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
>[id=79d74dfa-9903-42e3-af94-524ab08c1f97, addrs=[0:0:0:0:0:0:0:1%lo,
>10.130.1.43, 127.0.0.1], sockAddrs=[webapp-92-50ik3/10.130.1.43:47500,
>/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=21,
>intOrder=12, lastExchangeTime=1487193806549, loc=true,
>ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=21,
>nodeId8=79d74dfa, msg=null, type=DISCOVERY_CUSTOM_EVT,
>tstamp=1487193798024]], crd=TcpDiscoveryNode
>[id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo,
>10.128.1.11, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /
>127.0.0.1:47500, /10.128.1.11:47500], discPort=47500, order=18,
>intOrder=10, lastExchangeTime=1487178534506, loc=false,
>ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false],
>exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
>[topVer=21, minorTopVer=2], nodeId=79d74dfa, evt=DISCOVERY_CUSTOM_EVT],
>added=true, initFut=GridFutureAdapter [resFlag=0, res=null,
>startTime=1487193798024, endTime=0, ignoreInterrupts=false, state=INIT],
>init=false, topSnapshot=null, 

error - Failed to wait for partition release future

2017-02-15 Thread Cameron Braid
Hi,

I am getting the following error in Ignite version 1.8.1 :

"Failed to wait for partition release future"

The related logs lines are at the bottom.

I was curious if it could be related to
https://issues.apache.org/jira/browse/IGNITE-3212 which has fix for version
1.9.  Could this be the case ?

Otherwise, is this likely to be a bug in ignite, or something specific to
my environment?

Cheers

Cameron


   - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN
   o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture
   [exchange-worker-#26%webapp%] Failed to wait for partition release future
   [topVer=AffinityTopologyVersion [topVer=21, minorTopVer=2],
   node=79d74dfa-9903-42e3-af94-524ab08c1f97]. Dumping pending objects that
   might be the cause:
   - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN
   o.a.i.i.p.c.GridCachePartitionExchangeManager
   [exchange-worker-#26%webapp%] Ready affinity version:
   AffinityTopologyVersion [topVer=21, minorTopVer=1]
   - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287667 WARN
   o.a.i.i.p.c.GridCachePartitionExchangeManager
   [exchange-worker-#26%webapp%] Last exchange future:
   GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false,
   reassign=false, discoEvt=DiscoveryCustomEvent
   [customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest
   [deploymentId=3f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9,
   startCfg=CacheConfiguration [name=userTable.cmsDataVehicles.query,
   storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2,
   rebalanceTimeout=1, evictPlc=null, evictSync=false,
   evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=1,
   evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true,
   dfltLockTimeout=0, startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC,
   storeFactory=null, storeKeepBinary=false, loadPrevVal=false,
   aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@7e1c4f4,
   cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY,
   backups=2147483647, invalidate=false, tmLookupClsName=null,
   rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288,
   rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false,
   maxConcurrentAsyncOps=500, writeBehindEnabled=false,
   writeBehindFlushSize=10240, writeBehindFlushFreq=5000,
   writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024,
   memMode=ONHEAP_TIERED,
   
affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper@6ca896dd,
   rebalanceDelay=0, rebalanceThrottle=0, interceptor=null,
   longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true,
   
nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@db2015b,
   sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240,
   snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER,
   initiatingNodeId=79d74dfa-9903-42e3-af94-524ab08c1f97, nearCacheCfg=null,
   clientStartOnly=false, stop=false, close=false, failIfExists=true,
   template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null,
   cacheName=userTable.cmsDataVehicles.query]], clientNodes=null,
   id=4f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9,
   clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=21,
   minorTopVer=2], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
   [id=79d74dfa-9903-42e3-af94-524ab08c1f97, addrs=[0:0:0:0:0:0:0:1%lo,
   10.130.1.43, 127.0.0.1], sockAddrs=[webapp-92-50ik3/10.130.1.43:47500,
   /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=21,
   intOrder=12, lastExchangeTime=1487193806549, loc=true,
   ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=21,
   nodeId8=79d74dfa, msg=null, type=DISCOVERY_CUSTOM_EVT,
   tstamp=1487193798024]], crd=TcpDiscoveryNode
   [id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo,
   10.128.1.11, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /
   127.0.0.1:47500, /10.128.1.11:47500], discPort=47500, order=18,
   intOrder=10, lastExchangeTime=1487178534506, loc=false,
   ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false],
   exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
   [topVer=21, minorTopVer=2], nodeId=79d74dfa, evt=DISCOVERY_CUSTOM_EVT],
   added=true, initFut=GridFutureAdapter [resFlag=0, res=null,
   startTime=1487193798024, endTime=0, ignoreInterrupts=false, state=INIT],
   init=false, topSnapshot=null, lastVer=null,
   partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3,
   done=false, cancelled=false, err=null, futs=[true, true, false, true]],
   affChangeMsg=null, skipPreload=false, clientOnlyExchange=false,
   initTs=1487193798024, centralizedAff=false, evtLatch=0,
   remaining=[0940d968-a901-4c78-8354-8171628d41aa,
   659ea029-ca80-4413-af47-67b446ffd3e7], srvNodes=[TcpDiscoveryNode
   [id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo,
   10.128.1.11, 127.0.0.1],