Re: Lost node again.

John Smith Tue, 18 Aug 2020 10:38:21 -0700

I don't see why we would get such a huge pause, in fact I have provided GC
logs before and we found nothing...


All operations are in the "big" partitioned 3 million cache are put or get
and a query on another cache which has 450 entries. There no other caches.

The nodes all have 6G off heap and 26G off heap.

I think it can be IO related but I can't seem to be able to correlate it to
IO. I saw some heavy IO usage but the node failed way after.

Now my question is should I put the failure detection to 60s just for the
sake of trying it? Isn't that too high? If i put the servers to 60s how how
high should I put the clients?

On Tue., Aug. 18, 2020, 7:32 a.m. Ilya Kasnacheev, <
[email protected]> wrote:

> Hello!
>
> [13:39:53,242][WARNING][jvm-pause-detector-worker][IgniteKernal%company]
> Possible too long JVM pause: 41779 milliseconds.
>
> It seems that you have too-long full GC. Either make sure it does not
> happen, or increase failureDetectionTimeout to be longer than any expected
> GC.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 авг. 2020 г. в 17:51, John Smith <[email protected]>:
>
>> Hi guys it seems every couple of weeks we lose a node... Here are the
>> logs:
>> https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0
>>
>> And some extra details. Maybe I need to do more tuning then what is
>> already mentioned below, maybe set a higher timeout?
>>
>> 3 server nodes and 9 clients (client = true)
>>
>> Performance wise the cluster is not doing any kind of high volume on
>> average it does about 15-20 puts/gets/queries (any combination of) per
>> 30-60 seconds.
>>
>> The biggest cache we have is: 3 million records distributed with 1 backup
>> using the following template.
>>
>>           <bean id="cache-template-bean" abstract="true"
>> class="org.apache.ignite.configuration.CacheConfiguration">
>>             <!-- when you create a template via XML configuration,
>>             you must add an asterisk to the name of the template -->
>>             <property name="name" value="partitionedTpl*"/>
>>             <property name="cacheMode" value="PARTITIONED" />
>>             <property name="backups" value="1" />
>>             <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
>>           </bean>
>>
>> Persistence is configured:
>>
>>       <property name="dataStorageConfiguration">
>>         <bean
>> class="org.apache.ignite.configuration.DataStorageConfiguration">
>>           <!-- Redefining the default region's settings -->
>>           <property name="defaultDataRegionConfiguration">
>>             <bean
>> class="org.apache.ignite.configuration.DataRegionConfiguration">
>>               <property name="persistenceEnabled" value="true"/>
>>
>>               <property name="name" value="Default_Region"/>
>>               <property name="maxSize" value="#{10L * 1024 * 1024 *
>> 1024}"/>
>>             </bean>
>>           </property>
>>         </bean>
>>       </property>
>>
>> We also followed the tuning instructions for GC and I/O
>> if [ -z "$JVM_OPTS" ] ; then
>>     JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m"
>> fi
>>
>> #
>> # Uncomment the following GC settings if you see spikes in your
>> throughput due to Garbage Collection.
>> #
>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch
>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC"
>> sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm
>> .dirty_expire_centisecs=500
>>
>>

Re: Lost node again.

Reply via email to