The production instance has issues with ignite heaping out, the solution we
attempted to implement was to set the default data region to have swap
enabled and also set a eviction policy on the server with a maxMemorySize
such that it was much less then the Xmx jvm memory size.
Testing locally with a dev version of our server (weblogic acting as ignite
with client mode enabled) and the docker instance of ignite 2.7.6 it appears
as though using this configuration does not solve ignites instability
issues.
Many different configurations were attempted (for the full config see bottom
of post). The desired configuration would be one which the client has no
cache and the server does all the caching. That was done with attempting the
below on the server:
<property name="onheapCacheEnabled" value="true"/>
<property name="evictionPolicyFactory">
<bean
class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicyFactory">
<property name="maxMemorySize" value="#{50L * 1024 * 1024}"/>
<property name="maxSize" value="1000000000"/>
</bean>
</property>
With the above server configuration 3 attempts were made to the client
configuration:
1. Mirrored configuration on the client
2. Similar configuration with maxSize set to 0 (an attempt at ensuring the
client didn't try to cache)
3. enabling IGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK and not having any
eviction policy for the client
All three of these configurations resulted in the client weblogic to
disconnect from the cluster and finally to die (while attempting to
reconnect to ignite)
Error from client before death:
| Servlet failed with an Exception
| java.lang.IllegalStateException: Grid is in invalid state to perform
this operation. It either not started yet or has already being or have
stopped [igniteInstanceName=null, state=STOPPING]
| at
org.apache.ignite.internal.GridKernalGatewayImpl.illegalState(GridKernalGatewayImpl.java:201)
| at
org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:95)
| at
org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:3886)
| at
org.apache.ignite.internal.IgniteKernal.transactions(IgniteKernal.java:2862)
| at
org.apache.ignite.cache.websession.CustomWebSessionFilter.init(CustomWebSessionFilter.java:273)
| Truncated. see log file for complete stacktrace
Error in ignitevisorcmd.sh:
SEVERE: Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
blockedFor=17s]
Dec 06, 2019 12:17:27 AM java.util.logging.LogManager$RootLogger log
SEVERE: Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SEGMENTATION]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=tcp-disco-msg-worker,
igniteInstanceName=null, finished=false, heartbeatTs=1575591430824]]]
class org.apache.ignite.IgniteException: GridWorker
[name=tcp-disco-msg-worker, igniteInstanceName=null, finished=false,
heartbeatTs=1575591430824]
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:221)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Sometimes in testing it was possible for the client to successfully
reconnect. But I could not see why it was inconsistent with this behavior.
A separate test was conducted in which there was no eviction policy or
on-heap enabled on either the client or server. This seems to be more
stable.
Is there something incorrect with the configuration? Is there something
missing that would allow us to use on-heap memory without it causing issue
with our client?
Apendix:
Client Configuration:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="clientMode" value="true"/>
<property name="peerClassLoadingEnabled" value="true"/>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="forceServerMode" value="true"/>
<property name="ipFinder">
<bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>127.0.0.1:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="WebCache"/>
<property name="cacheMode" value="REPLICATED"/>
</bean>
</list>
</property>
<property name="failureHandler">
<bean class="org.apache.ignite.failure.StopNodeFailureHandler">
<property name="ignoredFailureTypes">
<list>
<value>SYSTEM_WORKER_BLOCKED</value>
<value>SEGMENTATION</value>
</list>
</property>
</bean>
</property>
<property name="lifecycleBeans">
<list>
<bean class="com.company.common.ignite.CustomIgniteLifecycleBean"/>
</list>
</property>
</bean>
</beans>
Server Configuration (used with docker image apache-ignite 2.7.6):
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=" http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="peerClassLoadingEnabled" value="true"/>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>127.0.0.1:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="initialSize" value="#{10L * 1024 * 1024}"/>
<property name="maxSize" value="#{20L * 1024 * 1024 * 1024}"/>
<property name="swapPath" value="/opt/ignite/"/>
</bean>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="WebCache"/>
<property name="cacheMode" value="REPLICATED"/>
</bean>
</list>
</property>
<property name="failureHandler">
<bean class="org.apache.ignite.failure.StopNodeFailureHandler">
<property name="ignoredFailureTypes">
<list>
<value>SYSTEM_WORKER_BLOCKED</value>
<value>SEGMENTATION</value>
</list>
</property>
</bean>
</property>
</bean>
</beans>
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/