Hello!

I can already see that you're not closing IgniteClient when you reconnect.
Meaning it will not free resources.

Have you considered using Ignite (with Ignition.setClientMode(true))
instead of IgniteClient?

Regards,
-- 
Ilya Kasnacheev


ср, 27 мар. 2019 г. в 22:03, Brent Williams <[email protected]>:

> Igor,
>
> Thanks for responding.
>
> I have 2 java singletons that I use. The first is the CacheManager which
> starts the instance. The second is another Singleton that caches the
> repository Name + CacheClient so
> we can reuse them through out the process.
>
> /**
>  * This is a singleton for Apache Ingite, it starts the client instance.
>  */
> public class IgniteCacheManager {
>     private static IgniteCacheManager instance;
>     private IgniteClient igniteClient;
>     private ClientConfiguration cfg;
>
>     public static IgniteCacheManager getInstance(YamlConfig cacheConfig) {
>         if (instance == null) instance = new
> IgniteCacheManager(cacheConfig);
>         return instance;
>     }
>
>     private IgniteCacheManager(YamlConfig cacheConfig) {
>         String hostsMap = cacheConfig.getString("hosts");
>         String[] hosts = null;
>         if (hostsMap != null) {
>             hosts = hostsMap.split(",");
>         } else {
>             hosts = new String[] { "localhost:10800" };
>         }
>         cfg = new ClientConfiguration().setAddresses(hosts)
>                 .setTimeout(cacheConfig.getInteger("cache.timeout"));
>         igniteClient = Ignition.startClient(cfg);
>     }
>
>     public IgniteClient getClient() {
>         return this.igniteClient;
>     }
>
>     public void reconnect() {
>         igniteClient = Ignition.startClient(cfg);
>     }
> }
>
>
> public class CacheFactory {
>     private static CacheFactory instance;
>     private YamlConfig cacheConfig;
>     private Map<String, CacheClient<?>> clientCache = new
> ConcurrentHashMap<>();
>
>     public static CacheFactory getInstance(YamlConfig cacheConfig) {
>         if (instance == null) instance = new CacheFactory(cacheConfig);
>         return instance;
>     }
>
>     /**
>      * This is the main factory method for pulling a Cache Instance to
> begin Caching.
>      */
>     public <T> CacheClient<T> getCacheProvider(Class<T> t, String
> repository) {
>         CacheClient<T> client = null;
>         if (clientCache.containsKey(repository)) {
>             client = (CacheClient<T>) clientCache.get(repository);
>         } else {
>                 try {
>                     client = new IgniteCacheClientWrapper<T>(cacheConfig,
> repository);
>                     clientCache.put(repository, client);
>                 } catch (Exception ex) {
>                     /**
>                      * If we encounter any errors return null and let the
> caller decide how to act on the null response.
>                      */
>                      LOG.error(ex);
>                 }
>         }
>         return client;
>     }
> }
>
> To call this we use this method inside all of our request threads.
>
> public <t> someMethod() {
>
> }
>
> public <T> CacheClient<T> getCacheClient(Class<T> t, String key) {
>    factory = CacheFactory.getInstance(cacheConfig);
>    return factory.getCacheProvider(t, key);
>  }
>
> ...
> getCacheClient(StorageContainer.class, partner).get(id);.
> ...
>
>
> The spikes are unpredictable, we see normal load on all 3 nodes, however
> we do see a huge spike in these errors around the time the hosts lock up.
>
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: java.io.IOException:
> Connection reset by peer
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> sun.nio.ch.IOUtil.read(IOUtil.java:197)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1104)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2389)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2156)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1797)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]: #011at
> java.lang.Thread.run(Thread.java:748)
> Mar 25 06:25:04 prd-cache001 service.sh[10538]:
> [06:25:04,742][SEVERE][grid-nio-worker-client-listener-1-#30][ClientListenerProcessor]
> Failed to process selector key [ses=GridSelectorNioSessionImpl
> [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0
> lim=8192 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0,
> bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-client-listener-1, igniteInstanceName=null,
> finished=false, heartbeatTs=1553520191555, hashCode=1720789126,
> interrupted=false, runner=grid-nio-worker-client-listener-1-#30]]],
> writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null,
> super=GridNioSessionImpl [locAddr=/10.132.52.64:10800, rmtAddr=/
> 10.132.52.59:49105, createTime=1553519004533, closeTime=0, bytesSent=5,
> bytesRcvd=12, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1553519005007,
> lastSndTime=1553519005525, lastRcvTime=1553519005007, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter,
> GridNioCodecFilter [parser=ClientListenerBufferedParser,
> directMode=false]], accepted=true, markedForClose=false]]]
>
> Then we see the TCP connection count go way up, too many files open and
> request times take forever. One thing I have changed was I did increase the
> timeout on the client, I had it at 100 ms but I increased it to 250 ms. Not
> sure if the load cause connection to timeout so it spawns more connections
> or if the GC causes the connections to hang.
>
> My 3 nodes are 2 CPU, 8 GB RAM, during load peaks, we see averages of 6%
> CPU with 30% memory utilization. Here is the Settings I have for me GC.
>
> /usr/bin/java -server -Xms1g -Xmx1g -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:MaxMetaspaceSize=256m
> -Djava.net.preferIPv4Stack=true -DIGNITE_QUIET=true
> -DIGNITE_SUCCESS_FILE=/usr/share/apache-ignite/work/ignite_success_274df869-bebf-47d0-8c9e-6b2da78f1f09
> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -DIGNITE_HOME=/usr/share/apache-ignite
> -DIGNITE_PROG_NAME=/usr/share/apache-ignite/bin/ignite.sh -cp
> /usr/share/apache-ignite/libs/*:/usr/share/apache-ignite/libs/ignite-indexing/*:/usr/share/apache-ignite/libs/ignite-rest-http/*:/usr/share/apache-ignite/libs/ignite-spring/*:/usr/share/apache-ignite/libs/licenses/*
> org.apache.ignite.startup.cmdline.CommandLineStartup
> /etc/apache-ignite/default-config.xml
>
>
>
> On Wed, Mar 27, 2019 at 4:27 AM Igor Sapego <[email protected]> wrote:
>
>> That's really weird. There should not be so much connections. Normally
>> thin
>> client will open one TCP connection per node at max. In many cases, there
>> going to be only one connection.
>>
>> Do you create IgniteClient in your application once, or do you start them
>> several
>> times? Could it be that your code are leaking IgniteClient instances?
>>
>> Can you provide some minimal reproducer to us, so we can debug the issue?
>>
>> Best Regards,
>> Igor
>>
>>
>> On Mon, Mar 25, 2019 at 11:19 PM Brent Williams <[email protected]>
>> wrote:
>>
>>> All,
>>>
>>> I am running Apache Ingite 2.7.0. I have 3 nodes in my cluster, CPU,
>>> memory, GC all tuned properly. I have even adjusted file limit to 65k open
>>> connections. I have 8 client nodes that are connecting to the 3 node
>>> cluster and for the most part working fine, however, we see spikes in
>>> connections and we start to blow out the file limit and we get too many
>>> files open and all client nodes hang.
>>>
>>> When I check the connections per client on one of the server nodes, I am
>>> seeing 5500+ TCP connections established per host.  This is roughly 44,0000
>>> + . My question is what should the file limits be? Why so many TCP
>>> connections per host? How do we control this as it is causing our
>>> production cluster to hang.
>>>
>>> --Brent
>>>
>>>
>>>

Reply via email to