Hello! Increasing checkpoint page buffer is very useful, this is the approach I recommend taking. Also, we recommend using SSD with Ignite and not HDDs.
Regards, -- Ilya Kasnacheev чт, 21 нояб. 2019 г. в 01:23, Mikael <[email protected]>: > Hi! > > When I get timeout exceptions on the striping threads (like below) when > streaming data, what is the best way around it ? should I increase the > thread pool size, I would guess the reason is that the HD is not that > fast and both WAL and storage is on the same drive (it's a persistent > cache), but I would like some kind of setup that does not have to be > tuned all the time to work without exceptions even if persistent storage > is not so fast, I do use: > > <property name="writeThrottlingEnabled" value="true"/> > > So the question is what to modify that would help best, more threads, > bigger checkpointPageBufferSize (128MB on a 2GB data region) or > something else ? 11 seconds is a long time so increasing timeouts does > not sound like a good idea ? > > [2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][G] Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=11s] > [2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][] Critical > system error detected. Will be handled accordingly to configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler > [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker > [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, > heartbeatTs=1574282154412]]] > org.apache.ignite.IgniteException: GridWorker > [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, > heartbeatTs=1574282154412] > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [ignite-core-2.7.6.jar:2.7.6] > [2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][G] Blocked > system-critical thread has been detected. This can lead to cluster-wide > undefined behaviour [threadName=data-streamer-stripe-1, blockedFor=11s] > [2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][] Critical > system error detected. Will be handled accordingly to configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler > [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker > [name=data-streamer-stripe-1, igniteInstanceName=null, finished=false, > heartbeatTs=1574282154310]]] > org.apache.ignite.IgniteException: GridWorker > [name=data-streamer-stripe-1, igniteInstanceName=null, finished=false, > heartbeatTs=1574282154310] > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) > > ~[ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) > > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > [ignite-core-2.7.6.jar:2.7.6] > > Mikael > > >
