RE: Ignite bulk data load issue

Muhammed Favas Tue, 24 Sep 2019 01:45:21 -0700

Thanks Igor for the suggestions.

But, In the log JVM pause is not showing too much of time consumption(around 
500 milliseconds) and I assume it is not the only problem of time taking. There 
are other reasons I expect for this strange behavior.
Even I tried with WAL disabled, still the time is going high after few files 
import.


Is there any way to figure out why so many THREAD WAIT are happening. Because 
of it there are SEVERE errors logging in the ignite nodes. Below are some 
sample errors from the log file I shared. Please help me here.


[05:02:45,413][SEVERE][tcp-disco-msg-worker-#2][G] Blocked system-critical 
thread has been detected. This can lead to cluster-wide undefined behaviour 
[threadName=data-streamer-stripe-0, blockedFor=34s]
[05:02:45,414][WARNING][tcp-disco-msg-worker-#2][G] Thread 
[name="data-streamer-stripe-0-#9", id=22, state=RUNNABLE, blockCnt=0, 
waitCnt=4375]

[05:02:45,415][SEVERE][tcp-disco-msg-worker-#2][] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, 
heartbeatTs=1569214931358]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, 
heartbeatTs=1569214931358]
                at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
                at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
                at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
                at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
                at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
                at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
                at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
                at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
                at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
                at 
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)


Thread [name="sys-#367", id=468, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7a4b4b27,
 ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#366", id=467, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock 
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7a4b4b27,
 ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748).


Regards,
Favas

From: Igor Belyakov <[email protected]>
Sent: Monday, September 23, 2019 7:37 PM
To: [email protected]
Subject: Re: Ignite bulk data load issue

Hi,

According to the logs, I see a lot of "Possible too long JVM pause" messages, 
seems like you have a problem with GC activity. You should check GC logs and 
try to tune GC following this manual:
https://apacheignite.readme.io/docs/jvm-and-system-tuning

Also, it's suggested to disable WAL during initial data loading, to reduce 
amount of disk IO operations. Check "Performance tips" in the bottom of the 
next page for more details:
https://apacheignite.readme.io/docs/data-loading

Regards,
Igor

On Mon, Sep 23, 2019 at 1:07 PM Muhammed Favas 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi,

I need help to figure out the issues identified during the bulk load of data 
into ignite cluster.
My cluster consist of 5 nodes, each with 8 core CPU, 32 GB RAM and 30GB Disk. 
Also ignite native persistence is enabled for the table.
I am trying to load data into my ignite SQL table from csv file using COPY 
command. Each file consist of 50 Million record and I have numerous file of 
same size. During the initial time of loading, it was quite fast, but after 
some time the data load become very slow and now it is taking hours to load 
even a single file. Below is my observations


  *   When I trigger the load first time after a pause, the CPU usage shows in 
promising level and that time data load is in higher rate.
  *   After loading 2-3 file, the CPU starts dropping down to less than 1 % and 
it continue in that state for ever.
  *   Then I stop the loading processes for some time and re-start, it again 
perform well and after some time the same situation happens.

When I checked the log file, I saw certain THREAD WAIT are happening, I believe 
due to this waits, the CPU is dropping down. I have attached the entire log 
file.
Can some one help me to figure out why it so in ignite? Or is it something I 
have made wrong in my configuration. Below is my configuration file content

<bean id="ignite.cfg" 
class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="failureDetectionTimeout" value="30000"/>
        <!-- Redefining maximum memory size for the cluster node usage. -->
        <property name="dataStorageConfiguration">
            <bean 
class="org.apache.ignite.configuration.DataStorageConfiguration">

                <property name="CheckpointReadLockTimeout" value="0" />
                <!-- Set the page size to 4 KB -->
                <property name="pageSize" value="#{4 * 1024}"/>
                <!-- Incraese WAL segment size to 1 GB - Default was 64 MB -->
                <property name="walSegmentSize" value="#{1L * 1024 * 1024 * 
1024}"/>
                <!-- Set the wal segment to 5. Default was 10 -->
                <property name="walSegments" value="5"/>
                <!-- Set the wal segment history size to 5. Default was 20 -->
                <property name="WalHistorySize" value="5"/>
                <property name="walCompactionEnabled" value="true" />
                <property name="walCompactionLevel" value="6" />
                <!-- Enable write throttling. -->
                <property name="writeThrottlingEnabled" value="true"/>
            <!-- Redefining the default region's settings -->
            <property name="defaultDataRegionConfiguration">
                <bean 
class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="persistenceEnabled" value="true"/>
                        <property name="name" value="Default_Region"/>
                        <!-- Setting the size of the default region to 24GB. -->
                        <property name="maxSize" value="#{24L * 1024 * 1024 * 
1024}"/>
                        <!-- Increasing the check point buffer size to 2 GB. -->
                        <property name="checkpointPageBufferSize" value="#{2L * 
1024 * 1024 * 1024}"/>
                </bean>
            </property>
            </bean>
        </property>

        <property name="cacheConfiguration">
            <list>
                <!-- Partitioned cache example configuration -->
                <bean 
class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="default*"/>
                    <property name="cacheMode" value="PARTITIONED" />
                    <property name="atomicityMode" value="TRANSACTIONAL"/>
                    <property name="queryParallelism" value="8" />
                </bean>
            </list>
        </property>


Regards,
Favas

RE: Ignite bulk data load issue

Reply via email to