I don't see why we would get such a huge pause, in fact I have provided GC logs before and we found nothing...
All operations are in the "big" partitioned 3 million cache are put or get and a query on another cache which has 450 entries. There no other caches. The nodes all have 6G off heap and 26G off heap. I think it can be IO related but I can't seem to be able to correlate it to IO. I saw some heavy IO usage but the node failed way after. Now my question is should I put the failure detection to 60s just for the sake of trying it? Isn't that too high? If i put the servers to 60s how how high should I put the clients? On Tue., Aug. 18, 2020, 7:32 a.m. Ilya Kasnacheev, < [email protected]> wrote: > Hello! > > [13:39:53,242][WARNING][jvm-pause-detector-worker][IgniteKernal%company] > Possible too long JVM pause: 41779 milliseconds. > > It seems that you have too-long full GC. Either make sure it does not > happen, or increase failureDetectionTimeout to be longer than any expected > GC. > > Regards, > -- > Ilya Kasnacheev > > > пн, 17 авг. 2020 г. в 17:51, John Smith <[email protected]>: > >> Hi guys it seems every couple of weeks we lose a node... Here are the >> logs: >> https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0 >> >> And some extra details. Maybe I need to do more tuning then what is >> already mentioned below, maybe set a higher timeout? >> >> 3 server nodes and 9 clients (client = true) >> >> Performance wise the cluster is not doing any kind of high volume on >> average it does about 15-20 puts/gets/queries (any combination of) per >> 30-60 seconds. >> >> The biggest cache we have is: 3 million records distributed with 1 backup >> using the following template. >> >> <bean id="cache-template-bean" abstract="true" >> class="org.apache.ignite.configuration.CacheConfiguration"> >> <!-- when you create a template via XML configuration, >> you must add an asterisk to the name of the template --> >> <property name="name" value="partitionedTpl*"/> >> <property name="cacheMode" value="PARTITIONED" /> >> <property name="backups" value="1" /> >> <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/> >> </bean> >> >> Persistence is configured: >> >> <property name="dataStorageConfiguration"> >> <bean >> class="org.apache.ignite.configuration.DataStorageConfiguration"> >> <!-- Redefining the default region's settings --> >> <property name="defaultDataRegionConfiguration"> >> <bean >> class="org.apache.ignite.configuration.DataRegionConfiguration"> >> <property name="persistenceEnabled" value="true"/> >> >> <property name="name" value="Default_Region"/> >> <property name="maxSize" value="#{10L * 1024 * 1024 * >> 1024}"/> >> </bean> >> </property> >> </bean> >> </property> >> >> We also followed the tuning instructions for GC and I/O >> if [ -z "$JVM_OPTS" ] ; then >> JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m" >> fi >> >> # >> # Uncomment the following GC settings if you see spikes in your >> throughput due to Garbage Collection. >> # >> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch >> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC" >> sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm >> .dirty_expire_centisecs=500 >> >>
