Hello Ray, According to your attached log, It seems that you have some network problems. Could you please also share logs from nodes with temporary ids = [429edc2b-eb14-414f-a978-9bfe35443c8c, 6783732c-9a13-466f-800a-ad4c8d9be3bf]. The root cause should be on those nodes.
2018-07-25 13:03 GMT+03:00 Ray <[email protected]>: > I have a three node Ignite 2.6 cluster setup with the following config. > > <bean id="grid.cfg" > class="org.apache.ignite.configuration.IgniteConfiguration"> > <property name="segmentationPolicy" value="RESTART_JVM"/> > <property name="peerClassLoadingEnabled" value="true"/> > <property name="failureDetectionTimeout" value="60000"/> > <property name="dataStorageConfiguration"> > <bean > class="org.apache.ignite.configuration.DataStorageConfiguration"> > <property name="storagePath" value="/data/ignite/persistence"/> > > <property name="walPath" value="/wal"/> > <property name="walArchivePath" value="/wal/archive"/> > <property name="defaultDataRegionConfiguration"> > <bean > class="org.apache.ignite.configuration.DataRegionConfiguration"> > <property name="name" value="default_Region"/> > <property name="initialSize" value="#{100L * 1024 * > 1024 > * 1024}"/> > <property name="maxSize" value="#{460L * 1024 * 1024 * > 1024}"/> > <property name="persistenceEnabled" value="true"/> > <property name="checkpointPageBufferSize" value="#{8L > * > 1024 * 1024 * 1024}"/> > </bean> > </property> > <property name="walMode" value="BACKGROUND"/> > <property name="walFlushFrequency" value="5000"/> > <property name="checkpointFrequency" value="600000"/> > </bean> > </property> > <property name="discoverySpi"> > <bean > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> > <property name="localPort" value="49500"/> > <property name="ipFinder"> > <bean > class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> > > <property name="addresses"> > <list> > <value>node1:49500</value> > <value>node2:49500</value> > <value>node3:49500</value> > </list> > </property> > </bean> > </property> > </bean> > </property> > <property name="gridLogger"> > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> > <constructor-arg type="java.lang.String" > value="config/ignite-log4j2.xml"/> > </bean> > </property> > </bean> > </beans> > > And I used this command to start Ignite service on three nodes. > > ./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC > -J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch > -J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps > -J-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime > -J-Xloggc:/spare/ignite/log/ignitegc-$(date +%Y_%m_%d-%H_%M).log > config/persistent-config.xml > > When I'm using Spark dataframe API to ingest data into this cluster, the > cluster freezes after some time and no new data can be ingested into > Ignite. > Both the client(spark executor) and server are showing the "Unable to await > partitions release latch within timeout: ServerLatch" exception starts from > line 51834 in full log like this > > [2018-07-25T09:45:42,177][WARN > ][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await > partitions release latch within timeout: ServerLatch [permits=2, > pendingAcks=[429edc2b-eb14-414f-a978-9bfe35443c8c, > 6783732c-9a13-466f-800a-ad4c8d9be3bf], super=Completab leLatch > [id=exchange, topVer=AffinityTopologyVersion [topVer=239, minorTopVer=0]]] > > Here's the full log on server node having the exception. > 07-25.zip > <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/07-25.zip> > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
