Re: error - Failed to wait for partition release future
Hi Cameron, It looks like nodes cant see each other. Would you please attach grid configuration and full logs? BTW, ignite version on nodes still "ver=1.8.1#20161208-sha1:a8fcb7b2" On Fri, Feb 17, 2017 at 2:15 AM, Cameron Braidwrote: > Ok, I have moved to using apache ignite 1.8.0 jar and the same issue is > there. > > I don't believe that it is slow networking that is causing this as it does > also happen on a single host single node cluster - its hard to reproduce > but it does happen. > > I've included a log from the grid locking up on a single node grid. > > Any ideas on what I can do to stop the cluster from locking up ? > > Feb 17 09:09:22 webapp-97-3v11l webapp 85614 WARN > o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture > [exchange-worker-#26%webapp%] Failed to wait for partition release future > [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=6], > node=68845172-67fb-4f49-ae81-f3024ef89e66]. Dumping pending objects that > might be the cause: > Feb 17 09:09:22 webapp-97-3v11l webapp 85614 WARN > o.a.i.i.p.c.GridCachePartitionExchangeManager > [exchange-worker-#26%webapp%] > Ready affinity version: AffinityTopologyVersion [topVer=1, minorTopVer=5] > Feb 17 09:09:22 webapp-97-3v11l webapp 85619 WARN > o.a.i.i.p.c.GridCachePartitionExchangeManager > [exchange-worker-#26%webapp%] > Last exchange future: GridDhtPartitionsExchangeFuture [dummy=false, > forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent > [customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest > [deploymentId=d7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2, > startCfg=CacheConfiguration [name=query.starRatings.aggregate, > storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2, > rebalanceTimeout=1, evictPlc=LruEvictionPolicy [max=500, batchSize=1, > maxMemSize=0, memSize=0, size=0], evictSync=false, evictKeyBufSize=1024, > evictSyncConcurrencyLvl=4, evictSyncTimeout=1, evictFilter=null, > evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0, > startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC, storeFactory=null, > storeKeepBinary=false, loadPrevVal=false, > aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@3cfe01ec, > cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY, > backups=2147483647, invalidate=false, tmLookupClsName=null, > rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, > rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false, > maxConcurrentAsyncOps=500, writeBehindEnabled=false, > writeBehindFlushSize=10240, writeBehindFlushFreq=5000, > writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024, > memMode=ONHEAP_TIERED, > affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyM > apper@387ac79, > rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, > longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, > nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@ > 2a423797, > sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240, > snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER, > initiatingNodeId=68845172-67fb-4f49-ae81-f3024ef89e66, nearCacheCfg=null, > clientStartOnly=false, stop=false, close=false, failIfExists=true, > template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null, > cacheName=query.starRatings.aggregate]], clientNodes=null, > id=e7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2, > clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=1, > minorTopVer=6], super=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo, > 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, > /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, > intOrder=1, lastExchangeTime=1487282891381, loc=true, > ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=1, > nodeId8=68845172, msg=null, type=DISCOVERY_CUSTOM_EVT, > tstamp=1487282952548]], crd=TcpDiscoveryNode > [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo, > 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, > /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, > intOrder=1, lastExchangeTime=1487282891381, loc=true, > ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion > [topVer=1, minorTopVer=6], nodeId=68845172, evt=DISCOVERY_CUSTOM_EVT], > added=true, initFut=GridFutureAdapter [resFlag=0, res=null, > startTime=1487282952548, endTime=0, ignoreInterrupts=false, state=INIT], > init=false, topSnapshot=null, lastVer=null, > partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3, > done=false, cancelled=false, err=null, futs=[true, true, true, false]], > affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, > initTs=1487282952548, centralizedAff=false,
Re: error - Failed to wait for partition release future
Ok, I have moved to using apache ignite 1.8.0 jar and the same issue is there. I don't believe that it is slow networking that is causing this as it does also happen on a single host single node cluster - its hard to reproduce but it does happen. I've included a log from the grid locking up on a single node grid. Any ideas on what I can do to stop the cluster from locking up ? Feb 17 09:09:22 webapp-97-3v11l webapp 85614 WARN o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture [exchange-worker-#26%webapp%] Failed to wait for partition release future [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=6], node=68845172-67fb-4f49-ae81-f3024ef89e66]. Dumping pending objects that might be the cause: Feb 17 09:09:22 webapp-97-3v11l webapp 85614 WARN o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%] Ready affinity version: AffinityTopologyVersion [topVer=1, minorTopVer=5] Feb 17 09:09:22 webapp-97-3v11l webapp 85619 WARN o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%] Last exchange future: GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent [customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest [deploymentId=d7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2, startCfg=CacheConfiguration [name=query.starRatings.aggregate, storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2, rebalanceTimeout=1, evictPlc=LruEvictionPolicy [max=500, batchSize=1, maxMemSize=0, memSize=0, size=0], evictSync=false, evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=1, evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0, startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@3cfe01ec, cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY, backups=2147483647, invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false, maxConcurrentAsyncOps=500, writeBehindEnabled=false, writeBehindFlushSize=10240, writeBehindFlushFreq=5000, writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024, memMode=ONHEAP_TIERED, affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper@387ac79, rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@2a423797, sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240, snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER, initiatingNodeId=68845172-67fb-4f49-ae81-f3024ef89e66, nearCacheCfg=null, clientStartOnly=false, stop=false, close=false, failIfExists=true, template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null, cacheName=query.starRatings.aggregate]], clientNodes=null, id=e7f09f84a51-923f7cb7-d9c2-46e8-b121-c7fe21e14aa2, clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=6], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo, 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1487282891381, loc=true, ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=1, nodeId8=68845172, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1487282952548]], crd=TcpDiscoveryNode [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo, 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1487282891381, loc=true, ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=6], nodeId=68845172, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [resFlag=0, res=null, startTime=1487282952548, endTime=0, ignoreInterrupts=false, state=INIT], init=false, topSnapshot=null, lastVer=null, partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3, done=false, cancelled=false, err=null, futs=[true, true, true, false]], affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, initTs=1487282952548, centralizedAff=false, evtLatch=0, remaining=[], srvNodes=[TcpDiscoveryNode [id=68845172-67fb-4f49-ae81-f3024ef89e66, addrs=[0:0:0:0:0:0:0:1%lo, 10.129.1.132, 127.0.0.1], sockAddrs=[webapp-97-3v11l/10.129.1.132:47500, /0:0:0:0:0:0:0:1%lo:47500, / 127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1487282891381, loc=true, ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false]], super=GridFutureAdapter [resFlag=0, res=null,
Re: error - Failed to wait for partition release future
Hi Cameron, You wrote on Apache Ignite Community dev-list. Iginte does not have versions like 1.8.x. Git repository you refer to is not belong to the Apache Ignite project. On Thu, Feb 16, 2017 at 6:58 PM, Cameron Braidwrote: > Hi, > > Thanks for looking at my issue. > > There are tags for 1.8.1, 1.8.2 and 1.8.3 at > https://github.com/gridgain/apache-ignite so I have been trying using > jars > built from those tags, including 1.9 > > I'm still getting the issues. What concerns me the most is that it is very > easy to lockup the whole grid. I have tried various short (2s) .. long > (40s) FailureDetectionTimeout and I still get a lockup. I've tried the > prefer ip4 as you suggested but it didn't make any difference. I've > included my jvm args, an excerpt from the jvm gc logs, and some more recent > ignite logs. > > Cheers > > Cameron > > *[jvm args]* > > -Xms2G -Xmx2G > > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+PrintGCDateStamps > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=100M > -Xloggc:target/gc-log-devA.txt > > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB > -XX:NewSize=128m -XX:MaxNewSize=128m > -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=40 -XX:MaxGCPauseMillis=1000 > -XX:InitiatingHeapOccupancyPercent=50 -XX:+UseCompressedOops > -XX:ParallelGCThreads=6 -XX:ConcGCThreads=6 > > *[GC logs]* > > 2017-02-17T02:43:31.345+1100: 1997.387: [CMS-concurrent-reset: 0.004/0.004 > secs] [Times: user=0.01 sys=0.00, real=0.00 secs] > 2017-02-17T02:43:32.099+1100: 1998.141: [GC (Allocation Failure) > 2017-02-17T02:43:32.100+1100: 1998.141: [ParNew: 130485K->0K(131008K), > 0.0332423 secs] 773911K->655291K(2097088K), 0.0336493 secs] [Times: > user=0.10 sys=0.00, real=0.04 secs] > 2017-02-17T02:43:32.498+1100: 1998.539: [GC (Allocation Failure) > 2017-02-17T02:43:32.498+1100: 1998.539: [ParNew: 130944K->0K(131008K), > 0.0489761 secs] 786235K->683673K(2097088K), 0.0492079 secs] [Times: > user=0.13 sys=0.01, real=0.05 secs] > 2017-02-17T02:43:33.578+1100: 1999.620: [GC (Allocation Failure) > 2017-02-17T02:43:33.578+1100: 1999.620: [ParNew: 130944K->0K(131008K), > 0.0589320 secs] 814617K->710468K(2097088K), 0.0594862 secs] [Times: > user=0.19 sys=0.01, real=0.06 secs] > 2017-02-17T02:43:33.997+1100: 2000.038: [GC (Allocation Failure) > 2017-02-17T02:43:33.997+1100: 2000.038: [ParNew: 130944K->0K(131008K), > 0.0382098 secs] 841412K->723510K(2097088K), 0.0386616 secs] [Times: > user=0.11 sys=0.00, real=0.04 secs] > 2017-02-17T02:43:34.724+1100: 2000.765: [GC (Allocation Failure) > 2017-02-17T02:43:34.724+1100: 2000.765: [ParNew: 130944K->0K(131008K), > 0.0061646 secs] 854454K->725669K(2097088K), 0.0064497 secs] [Times: > user=0.03 sys=0.01, real=0.01 secs] > 2017-02-17T02:43:35.303+1100: 2001.344: [GC (Allocation Failure) > 2017-02-17T02:43:35.303+1100: 2001.345: [ParNew: 130944K->0K(131008K), > 0.0102684 secs] 856613K->727224K(2097088K), 0.0105370 secs] [Times: > user=0.04 sys=0.00, real=0.01 secs] > 2017-02-17T02:43:35.582+1100: 2001.623: [GC (Allocation Failure) > 2017-02-17T02:43:35.582+1100: 2001.623: [ParNew: 130944K->0K(131008K), > 0.0201246 secs] 858168K->735764K(2097088K), 0.0204913 secs] [Times: > user=0.09 sys=0.00, real=0.02 secs] > 2017-02-17T02:43:35.856+1100: 2001.898: [GC (Allocation Failure) > 2017-02-17T02:43:35.856+1100: 2001.898: [ParNew: 130944K->0K(131008K), > 0.0176264 secs] 866708K->745151K(2097088K), 0.0178741 secs] [Times: > user=0.08 sys=0.00, real=0.02 secs] > 2017-02-17T02:43:36.554+1100: 2002.595: [GC (Allocation Failure) > 2017-02-17T02:43:36.554+1100: 2002.596: [ParNew: 130944K->0K(131008K), > 0.0480600 secs] 876095K->752482K(2097088K), 0.0483472 secs] [Times: > user=0.08 sys=0.00, real=0.05 secs] > 2017-02-17T02:43:38.991+1100: 2005.033: [GC (Allocation Failure) > 2017-02-17T02:43:38.991+1100: 2005.033: [ParNew: 130930K->0K(131008K), > 0.0188153 secs] 883413K->764137K(2097088K), 0.0191396 secs] [Times: > user=0.08 sys=0.00, real=0.02 secs] > 2017-02-17T02:43:40.428+1100: 2006.470: [GC (Allocation Failure) > 2017-02-17T02:43:40.428+1100: 2006.470: [ParNew: 130941K->0K(131008K), > 0.0272629 secs] 895079K->775860K(2097088K), 0.0274685 secs] [Times: > user=0.07 sys=0.01, real=0.02 secs] > 2017-02-17T02:43:41.195+1100: 2007.237: [GC (Allocation Failure) > 2017-02-17T02:43:41.195+1100: 2007.237: [ParNew: 130944K->0K(131008K), > 0.0117222 secs] 906804K->778960K(2097088K), 0.0120822 secs] [Times: > user=0.05 sys=0.00, real=0.01 secs] > 2017-02-17T02:43:41.919+1100: 2007.960: [GC (Allocation Failure) > 2017-02-17T02:43:41.919+1100: 2007.961: [ParNew: 130944K->0K(131008K), > 0.0181342 secs] 909904K->782162K(2097088K), 0.0183815 secs] [Times: > user=0.05 sys=0.00, real=0.02 secs] > 2017-02-17T02:43:42.337+1100: 2008.378: [GC (Allocation Failure) > 2017-02-17T02:43:42.337+1100: 2008.378: [ParNew: 130944K->0K(131008K), > 0.0182636 secs]
Re: error - Failed to wait for partition release future
Hi, Thanks for looking at my issue. There are tags for 1.8.1, 1.8.2 and 1.8.3 at https://github.com/gridgain/apache-ignite so I have been trying using jars built from those tags, including 1.9 I'm still getting the issues. What concerns me the most is that it is very easy to lockup the whole grid. I have tried various short (2s) .. long (40s) FailureDetectionTimeout and I still get a lockup. I've tried the prefer ip4 as you suggested but it didn't make any difference. I've included my jvm args, an excerpt from the jvm gc logs, and some more recent ignite logs. Cheers Cameron *[jvm args]* -Xms2G -Xmx2G -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:target/gc-log-devA.txt -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=40 -XX:MaxGCPauseMillis=1000 -XX:InitiatingHeapOccupancyPercent=50 -XX:+UseCompressedOops -XX:ParallelGCThreads=6 -XX:ConcGCThreads=6 *[GC logs]* 2017-02-17T02:43:31.345+1100: 1997.387: [CMS-concurrent-reset: 0.004/0.004 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2017-02-17T02:43:32.099+1100: 1998.141: [GC (Allocation Failure) 2017-02-17T02:43:32.100+1100: 1998.141: [ParNew: 130485K->0K(131008K), 0.0332423 secs] 773911K->655291K(2097088K), 0.0336493 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 2017-02-17T02:43:32.498+1100: 1998.539: [GC (Allocation Failure) 2017-02-17T02:43:32.498+1100: 1998.539: [ParNew: 130944K->0K(131008K), 0.0489761 secs] 786235K->683673K(2097088K), 0.0492079 secs] [Times: user=0.13 sys=0.01, real=0.05 secs] 2017-02-17T02:43:33.578+1100: 1999.620: [GC (Allocation Failure) 2017-02-17T02:43:33.578+1100: 1999.620: [ParNew: 130944K->0K(131008K), 0.0589320 secs] 814617K->710468K(2097088K), 0.0594862 secs] [Times: user=0.19 sys=0.01, real=0.06 secs] 2017-02-17T02:43:33.997+1100: 2000.038: [GC (Allocation Failure) 2017-02-17T02:43:33.997+1100: 2000.038: [ParNew: 130944K->0K(131008K), 0.0382098 secs] 841412K->723510K(2097088K), 0.0386616 secs] [Times: user=0.11 sys=0.00, real=0.04 secs] 2017-02-17T02:43:34.724+1100: 2000.765: [GC (Allocation Failure) 2017-02-17T02:43:34.724+1100: 2000.765: [ParNew: 130944K->0K(131008K), 0.0061646 secs] 854454K->725669K(2097088K), 0.0064497 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 2017-02-17T02:43:35.303+1100: 2001.344: [GC (Allocation Failure) 2017-02-17T02:43:35.303+1100: 2001.345: [ParNew: 130944K->0K(131008K), 0.0102684 secs] 856613K->727224K(2097088K), 0.0105370 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2017-02-17T02:43:35.582+1100: 2001.623: [GC (Allocation Failure) 2017-02-17T02:43:35.582+1100: 2001.623: [ParNew: 130944K->0K(131008K), 0.0201246 secs] 858168K->735764K(2097088K), 0.0204913 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 2017-02-17T02:43:35.856+1100: 2001.898: [GC (Allocation Failure) 2017-02-17T02:43:35.856+1100: 2001.898: [ParNew: 130944K->0K(131008K), 0.0176264 secs] 866708K->745151K(2097088K), 0.0178741 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 2017-02-17T02:43:36.554+1100: 2002.595: [GC (Allocation Failure) 2017-02-17T02:43:36.554+1100: 2002.596: [ParNew: 130944K->0K(131008K), 0.0480600 secs] 876095K->752482K(2097088K), 0.0483472 secs] [Times: user=0.08 sys=0.00, real=0.05 secs] 2017-02-17T02:43:38.991+1100: 2005.033: [GC (Allocation Failure) 2017-02-17T02:43:38.991+1100: 2005.033: [ParNew: 130930K->0K(131008K), 0.0188153 secs] 883413K->764137K(2097088K), 0.0191396 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 2017-02-17T02:43:40.428+1100: 2006.470: [GC (Allocation Failure) 2017-02-17T02:43:40.428+1100: 2006.470: [ParNew: 130941K->0K(131008K), 0.0272629 secs] 895079K->775860K(2097088K), 0.0274685 secs] [Times: user=0.07 sys=0.01, real=0.02 secs] 2017-02-17T02:43:41.195+1100: 2007.237: [GC (Allocation Failure) 2017-02-17T02:43:41.195+1100: 2007.237: [ParNew: 130944K->0K(131008K), 0.0117222 secs] 906804K->778960K(2097088K), 0.0120822 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 2017-02-17T02:43:41.919+1100: 2007.960: [GC (Allocation Failure) 2017-02-17T02:43:41.919+1100: 2007.961: [ParNew: 130944K->0K(131008K), 0.0181342 secs] 909904K->782162K(2097088K), 0.0183815 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 2017-02-17T02:43:42.337+1100: 2008.378: [GC (Allocation Failure) 2017-02-17T02:43:42.337+1100: 2008.378: [ParNew: 130944K->0K(131008K), 0.0182636 secs] 913106K->790713K(2097088K), 0.0187463 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 2017-02-17T02:43:42.358+1100: 2008.399: [GC (CMS Initial Mark) [1 CMS-initial-mark: 790713K(1966080K)] 791332K(2097088K), 0.0103648 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 2017-02-17T02:43:42.368+1100: 2008.410: [CMS-concurrent-mark-start] 2017-02-17T02:43:42.668+1100: 2008.709: [CMS-concurrent-mark: 0.299/0.299 secs] [Times: user=1.76
Re: error - Failed to wait for partition release future
Hi Cameron, Ignite does not have 1.8.x versions. It look like you use IPv6 on some of you nodes. At first, check if either -Djava.net.preferIPv6Addresses=true or -Djava.net.preferIPv4Addresses=true java option is set for all of your nodes and communication port is set properly. On Thu, Feb 16, 2017 at 1:29 AM, Cameron Braidwrote: > Hi, > > I am getting the following error in Ignite version 1.8.1 : > > "Failed to wait for partition release future" > > The related logs lines are at the bottom. > > I was curious if it could be related to > https://issues.apache.org/jira/browse/IGNITE-3212 which has fix for > version > 1.9. Could this be the case ? > > Otherwise, is this likely to be a bug in ignite, or something specific to > my environment? > > Cheers > > Cameron > > >- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN >o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture >[exchange-worker-#26%webapp%] Failed to wait for partition release > future >[topVer=AffinityTopologyVersion [topVer=21, minorTopVer=2], >node=79d74dfa-9903-42e3-af94-524ab08c1f97]. Dumping pending objects > that >might be the cause: >- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN >o.a.i.i.p.c.GridCachePartitionExchangeManager >[exchange-worker-#26%webapp%] Ready affinity version: >AffinityTopologyVersion [topVer=21, minorTopVer=1] >- Feb 16 08:23:28 webapp-92-50ik3 webapp 15287667 WARN >o.a.i.i.p.c.GridCachePartitionExchangeManager >[exchange-worker-#26%webapp%] Last exchange future: >GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, >reassign=false, discoEvt=DiscoveryCustomEvent >[customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest >[deploymentId=3f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9, >startCfg=CacheConfiguration [name=userTable.cmsDataVehicles.query, >storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2, >rebalanceTimeout=1, evictPlc=null, evictSync=false, >evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=1, >evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true, >dfltLockTimeout=0, startSize=150, nearCfg=null, > writeSync=PRIMARY_SYNC, >storeFactory=null, storeKeepBinary=false, loadPrevVal=false, >aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@7e1c4f4, >cacheMode=REPLICATED, atomicityMode=ATOMIC, > atomicWriteOrderMode=PRIMARY, >backups=2147483647, invalidate=false, tmLookupClsName=null, >rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, >rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false, >maxConcurrentAsyncOps=500, writeBehindEnabled=false, >writeBehindFlushSize=10240, writeBehindFlushFreq=5000, >writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, > maxQryIterCnt=1024, >memMode=ONHEAP_TIERED, >affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyM > apper@6ca896dd, >rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, >longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, >nodeFilter=o.a.i.configuration.CacheConfiguration$ > IgniteAllNodesPredicate@db2015b, >sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240, >snapshotableIdx=false, cpOnRead=true, topValidator=null], > cacheType=USER, >initiatingNodeId=79d74dfa-9903-42e3-af94-524ab08c1f97, > nearCacheCfg=null, >clientStartOnly=false, stop=false, close=false, failIfExists=true, >template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null, >cacheName=userTable.cmsDataVehicles.query]], clientNodes=null, >id=4f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9, >clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=21, >minorTopVer=2], super=DiscoveryEvent [evtNode=TcpDiscoveryNode >[id=79d74dfa-9903-42e3-af94-524ab08c1f97, addrs=[0:0:0:0:0:0:0:1%lo, >10.130.1.43, 127.0.0.1], sockAddrs=[webapp-92-50ik3/10.130.1.43:47500, >/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=21, >intOrder=12, lastExchangeTime=1487193806549, loc=true, >ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=21, >nodeId8=79d74dfa, msg=null, type=DISCOVERY_CUSTOM_EVT, >tstamp=1487193798024]], crd=TcpDiscoveryNode >[id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo, >10.128.1.11, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, / >127.0.0.1:47500, /10.128.1.11:47500], discPort=47500, order=18, >intOrder=10, lastExchangeTime=1487178534506, loc=false, >ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], >exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion >[topVer=21, minorTopVer=2], nodeId=79d74dfa, evt=DISCOVERY_CUSTOM_EVT], >added=true, initFut=GridFutureAdapter [resFlag=0, res=null, >startTime=1487193798024, endTime=0, ignoreInterrupts=false, state=INIT], >init=false, topSnapshot=null,
error - Failed to wait for partition release future
Hi, I am getting the following error in Ignite version 1.8.1 : "Failed to wait for partition release future" The related logs lines are at the bottom. I was curious if it could be related to https://issues.apache.org/jira/browse/IGNITE-3212 which has fix for version 1.9. Could this be the case ? Otherwise, is this likely to be a bug in ignite, or something specific to my environment? Cheers Cameron - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture [exchange-worker-#26%webapp%] Failed to wait for partition release future [topVer=AffinityTopologyVersion [topVer=21, minorTopVer=2], node=79d74dfa-9903-42e3-af94-524ab08c1f97]. Dumping pending objects that might be the cause: - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287661 WARN o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%] Ready affinity version: AffinityTopologyVersion [topVer=21, minorTopVer=1] - Feb 16 08:23:28 webapp-92-50ik3 webapp 15287667 WARN o.a.i.i.p.c.GridCachePartitionExchangeManager [exchange-worker-#26%webapp%] Last exchange future: GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent [customMsg=DynamicCacheChangeBatch [reqs=[DynamicCacheChangeRequest [deploymentId=3f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9, startCfg=CacheConfiguration [name=userTable.cmsDataVehicles.query, storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2, rebalanceTimeout=1, evictPlc=null, evictSync=false, evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=1, evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0, startSize=150, nearCfg=null, writeSync=PRIMARY_SYNC, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, aff=o.a.i.cache.affinity.rendezvous.RendezvousAffinityFunction@7e1c4f4, cacheMode=REPLICATED, atomicityMode=ATOMIC, atomicWriteOrderMode=PRIMARY, backups=2147483647, invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCount=2, offHeapMaxMem=-1, swapEnabled=false, maxConcurrentAsyncOps=500, writeBehindEnabled=false, writeBehindFlushSize=10240, writeBehindFlushFreq=5000, writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, maxQryIterCnt=1024, memMode=ONHEAP_TIERED, affMapper=o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper@6ca896dd, rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, longQryWarnTimeout=3000, qryDetailMetricsSz=0, readFromBackup=true, nodeFilter=o.a.i.configuration.CacheConfiguration$IgniteAllNodesPredicate@db2015b, sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240, snapshotableIdx=false, cpOnRead=true, topValidator=null], cacheType=USER, initiatingNodeId=79d74dfa-9903-42e3-af94-524ab08c1f97, nearCacheCfg=null, clientStartOnly=false, stop=false, close=false, failIfExists=true, template=false, rcvdFrom=null, exchangeNeeded=true, cacheFutTopVer=null, cacheName=userTable.cmsDataVehicles.query]], clientNodes=null, id=4f757f24a51-ee74fcc5-5cea-4548-9cd6-8a201c29fdf9, clientReconnect=false], affTopVer=AffinityTopologyVersion [topVer=21, minorTopVer=2], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=79d74dfa-9903-42e3-af94-524ab08c1f97, addrs=[0:0:0:0:0:0:0:1%lo, 10.130.1.43, 127.0.0.1], sockAddrs=[webapp-92-50ik3/10.130.1.43:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=21, intOrder=12, lastExchangeTime=1487193806549, loc=true, ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], topVer=21, nodeId8=79d74dfa, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1487193798024]], crd=TcpDiscoveryNode [id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo, 10.128.1.11, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, / 127.0.0.1:47500, /10.128.1.11:47500], discPort=47500, order=18, intOrder=10, lastExchangeTime=1487178534506, loc=false, ver=1.8.1#20161208-sha1:a8fcb7b2, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=21, minorTopVer=2], nodeId=79d74dfa, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [resFlag=0, res=null, startTime=1487193798024, endTime=0, ignoreInterrupts=false, state=INIT], init=false, topSnapshot=null, lastVer=null, partReleaseFut=GridCompoundFuture [rdc=null, initFlag=1, lsnrCalls=3, done=false, cancelled=false, err=null, futs=[true, true, false, true]], affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, initTs=1487193798024, centralizedAff=false, evtLatch=0, remaining=[0940d968-a901-4c78-8354-8171628d41aa, 659ea029-ca80-4413-af47-67b446ffd3e7], srvNodes=[TcpDiscoveryNode [id=659ea029-ca80-4413-af47-67b446ffd3e7, addrs=[0:0:0:0:0:0:0:1%lo, 10.128.1.11, 127.0.0.1],