Hi guys it seems every couple of weeks we lose a node... Here are the logs:
https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0
And some extra details. Maybe I need to do more tuning then what is already
mentioned below, maybe set a higher timeout?
3 server nodes and 9 clients (client = true)
Performance wise the cluster is not doing any kind of high volume on
average it does about 15-20 puts/gets/queries (any combination of) per
30-60 seconds.
The biggest cache we have is: 3 million records distributed with 1 backup
using the following template.
<bean id="cache-template-bean" abstract="true"
class="org.apache.ignite.configuration.CacheConfiguration">
<!-- when you create a template via XML configuration,
you must add an asterisk to the name of the template -->
<property name="name" value="partitionedTpl*"/>
<property name="cacheMode" value="PARTITIONED" />
<property name="backups" value="1" />
<property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
</bean>
Persistence is configured:
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- Redefining the default region's settings -->
<property name="defaultDataRegionConfiguration">
<bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="name" value="Default_Region"/>
<property name="maxSize" value="#{10L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
We also followed the tuning instructions for GC and I/O
if [ -z "$JVM_OPTS" ] ; then
JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m"
fi
#
# Uncomment the following GC settings if you see spikes in your throughput
due to Garbage Collection.
#
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC"
sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm
.dirty_expire_centisecs=500