Could you please attach logs with enabled
IGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER
parameter.

2018-05-11 6:09 GMT+03:00 NO <727418...@qq.com>:

> I only encountered this problem once. I did not reproduce this problem in
> FSYNC mode, I have time to find ways to reproduce the problem.
>
> Regarding the issue of read performance degradation, I used the
> -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true parameter, but the problem
> was not solved. However, after I stopped the write request, the response
> time of the read request was significantly reduced. I do not understand why
> the write request affects the read request, and How should I optimize,
> thank you very much.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>;
> *发送时间:* 2018年5月10日(星期四) 晚上6:37
> *收件人:* "user"<user@ignite.apache.org>;
> *主题:* Re: Read request response time is unstable, often
> morethan500milliseconds, but the cluster load is small
>
> Ignite node should start with any wal mode. I suppose that the same error
> should be occurred with FSYNC mode.
> Would you be able to restart with LOG_ONLY mode and show the logs.
>
> 2018-05-10 12:39 GMT+03:00 NO <727418...@qq.com>:
>
>> Using the LOG_ONLY mode, I remember having encountered this problem.
>> After the node rebooted and printed an error message, the node could not be
>> started. At that time, I did not reserve the error message. I searched for
>> the source code, which may be one of the two.
>> 1. 'Failed to find checkpoint record at the given WAL pointer'
>> 2. 'on disk, but checkpoint record is missed in WAL '
>>
>> In the LOG_ONLY mode, it may not start in case of node crash?
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>;
>> *发送时间:* 2018年5月10日(星期四) 下午5:13
>> *收件人:* "user"<user@ignite.apache.org>;
>> *主题:* Re: Read request response time is unstable, often more
>> than500milliseconds, but the cluster load is small
>>
>> Please, try to check performance with LOG_ONLY mode.
>>
>> 2018-05-10 12:03 GMT+03:00 NO <727418...@qq.com>:
>>
>>> Hi,
>>>
>>> I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this
>>> parameter, but it will seriously affect the write speed, I do not know what
>>> the impact of setting this parameter is, whether it is necessary to set
>>> other parameters to increase the write speed?
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>;
>>> *发送时间:* 2018年5月10日(星期四) 下午4:59
>>> *收件人:* "user"<user@ignite.apache.org>;
>>> *主题:* Re: Read request response time is unstable, often more than
>>> 500milliseconds, but the cluster load is small
>>>
>>> Hi,
>>>
>>> I see several exceptions in your logs. Probably it causes the slowdown.
>>> >> java.lang.ClassCastException: org.apache.ignite.internal.pro
>>> cessors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot
>>> be cast to org.apache.ignite.internal.processors.cache.persistence.wal.
>>> FileWriteAheadLogManager
>>>
>>> Seems to you have the issue related to https://issues.apache.org/j
>>> ira/browse/IGNITE-7865 that fixed in the 2.5 version.
>>> As workaround you could change WALMode to LOG_ONLY or start ignite with
>>> the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true
>>>
>>> Thanks,
>>> Pavel
>>>
>>>
>>>
>>>
>>>
>>> 2018-05-10 5:42 GMT+03:00 NO <727418...@qq.com>:
>>>
>>>> hi,
>>>>
>>>> Ignite version : 2.4.0
>>>>
>>>> Read operations often exceed 500 milliseconds, but the cluster traffic
>>>> is very small. I don't know why. Please help me solve this problem. Thank
>>>> you very much. Here is some configuration information.
>>>>
>>>> 8 node : (48 core ,192G RAM, 4TB SSD)
>>>> Cluster records : 1.7 billion primary keys , 1.7 billion backup keys
>>>> Get requests per second : 100+
>>>> Put requests per second : 400+
>>>> Each node occupies more than 500GB of disk space.
>>>>
>>>> 2 node :
>>>> LSB Version:    :core-4.1-amd64:core-4.1-noarc
>>>> h:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1
>>>> -noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.
>>>> 1-amd64:printing-4.1-noarch
>>>> Distributor ID:    CentOS
>>>> Description:    CentOS Linux release 7.2.1511 (Core)
>>>> Release:    7.2.1511
>>>> Codename:    Core
>>>>
>>>> 6 node:
>>>> LSB Version:    :base-4.0-amd64:base-4.0-noarc
>>>> h:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics
>>>> -4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>>> Distributor ID:    CentOS
>>>> Description:    CentOS release 6.7 (Final)
>>>> Release:    6.7
>>>> Codename:    Final
>>>> ============================================================
>>>> =============
>>>> The node configuration is as follows
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <beans xmlns="http://www.springframework.org/schema/beans";
>>>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>>>        xmlns:util="http://www.springframework.org/schema/util";
>>>>        xsi:schemaLocation="http://www.springframework.org/schema/beans
>>>> http://www.springframework.org/schema/beans/spring-beans.xsd
>>>>         http://www.springframework.org/schema/util
>>>> http://www.springframework.org/schema/util/spring-util.xsd
>>>>         ">
>>>>     <bean id="ignite.cfg" class="org.apache.ignite.confi
>>>> guration.IgniteConfiguration">
>>>>            <property name="failureDetectionTimeout" value="60000"/>
>>>>
>>>>         <property name="clientFailureDetectionTimeout"
>>>> value="60000"/>
>>>>         <property name="segmentationPolicy" value="RESTART_JVM"/>
>>>>
>>>>         <property name="publicThreadPoolSize" value="64"/>
>>>>         <property name="systemThreadPoolSize" value="64"/>
>>>>         <property name="dataStreamerThreadPoolSize" value="64"/>
>>>>         <property name="rebalanceThreadPoolSize" value="4" />
>>>>         <property name="dataStorageConfiguration">
>>>>             <bean class="org.apache.ignite.confi
>>>> guration.DataStorageConfiguration">
>>>>                 <property name="defaultDataRegionConfiguration">
>>>>                     <bean class="org.apache.ignite.confi
>>>> guration.DataRegionConfiguration">
>>>>                         <property name="name"
>>>> value="qipu_entity_cache_data_region"/>
>>>>                         <property name="initialSize" value="#{10L *
>>>> 1024 * 1024 * 1024}"/>
>>>>                         <property name="maxSize" value="#{100L * 1024 *
>>>> 1024 * 1024}"/>
>>>>                         <property name="persistenceEnabled"
>>>> value="true"/>
>>>>                         <property name="metricsEnabled" value="true"/>
>>>>                         <property name="checkpointPageBufferSize"
>>>> value="#{1 * 1024 * 1024 * 1024}"/>
>>>>                     </bean>
>>>>                 </property>
>>>>                 <property name="walSegmentSize" value="#{64 * 1024 *
>>>> 1024}"/>
>>>>                 <property name="pageSize" value="#{4 * 1024}"/>
>>>>                 <property name="walSegments" value="#{20}"/>
>>>>                 <property name="walMode" value="FSYNC"/>
>>>>                 <property name="metricsEnabled" value="true"/>
>>>>                 <property name="writeThrottlingEnabled"
>>>> value="true"/>
>>>>                 <property name="checkpointThreads" value="8"/>
>>>>
>>>>                 <property name="walThreadLocalBufferSize" value="#{1 *
>>>> 1024 * 1024}"/>
>>>>             </bean>
>>>>         </property>
>>>>
>>>>         <property name="cacheConfiguration">
>>>>             <bean class="org.apache.ignite.confi
>>>> guration.CacheConfiguration">
>>>>                 <property name="dataRegionName"
>>>> value="qipu_entity_cache_data_region"/>
>>>>                 <property name="name" value="qipu_entity_cache"/>
>>>>                 <property name="cacheMode" value="PARTITIONED"/>
>>>>                 <property name="partitionLossPolicy" value="IGNORE"/>
>>>>                 <property name="atomicityMode" value="ATOMIC"/>
>>>>                 <property name="backups" value="1"/>
>>>>                 <property name="writeSynchronizationMode"
>>>> value="FULL_SYNC"/>
>>>>                 <property name="statisticsEnabled" value="true"/>
>>>>                 <property name="rebalanceBatchSize" value="#{20 * 1024
>>>> * 1024}"/>
>>>>                 <property name="rebalanceThrottle" value="0"/>
>>>>
>>>>                 <property name="rebalanceMode" value="ASYNC"/>
>>>>
>>>>                 <property name="rebalanceBatchesPrefetchCount"
>>>> value="4"/>
>>>>                 <property name="rebalanceTimeout" value="20000"/>
>>>>
>>>>                 <property name="maxConcurrentAsyncOperations"
>>>> value="#{4 * 500}"/>
>>>>             </bean>
>>>>         </property>
>>>>
>>>>         <property name="communicationSpi">
>>>>             <bean class="org.apache.ignite.spi.c
>>>> ommunication.tcp.TcpCommunicationSpi">
>>>>                 <property name="messageQueueLimit" value="20480"/>
>>>>             </bean>
>>>>         </property>
>>>>         <property name="discoverySpi">
>>>>             <bean class="org.apache.ignite.spi.d
>>>> iscovery.tcp.TcpDiscoverySpi">
>>>>                 <property name="forceServerMode" value="true"/>
>>>>                 <property name="ipFinder">
>>>>                     <bean class="org.apache.ignite.spi.d
>>>> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>>>                         <property name="addresses">
>>>>                             <list>
>>>>                                 <!-- In distributed environment,
>>>> replace with actual host IP address. -->
>>>>                                 <value>10.13.13.39:47500..47509</value>
>>>>                                 <value>10.13.13.49:47500..47509</value>
>>>>                                 <value>10.13.13.50:47500..47509</value>
>>>>                                 <value>10.13.13.51:47500..47509</value>
>>>>                                 <value>10.13.13.59:47500..47509</value>
>>>>                                 <value>10.13.13.60:47500..47509</value>
>>>>                                 <value>10.13.13.61:47500..47509</value>
>>>>                                 <value>10.13.13.63:47500..47509</value>
>>>>                             </list>
>>>>                         </property>
>>>>                     </bean>
>>>>                 </property>
>>>>             </bean>
>>>>         </property>
>>>>         <property name="gridLogger">
>>>>             <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>>>                 <constructor-arg type="java.lang.String"
>>>> value="/home/qipu/production/apache-ignite-2.4.0/config/igni
>>>> te-log4j2.xml"/>
>>>>             </bean>
>>>>         </property>
>>>>     </bean>
>>>> </beans>
>>>> ============================================================
>>>> =====================================
>>>> #ignite.sh
>>>> JVM config
>>>> JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts
>>>> -XX:MaxMetaspaceSize=512m"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError "
>>>> JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation"
>>>> JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10"
>>>> JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M"
>>>> JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log"
>>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy"
>>>> JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100"
>>>> ============================================================
>>>> =========================================
>>>> node config
>>>> #/etc/sysctl.conf
>>>> fs.file-max = 512000
>>>> net.core.rmem_max = 67108864
>>>> net.core.wmem_max = 67108864
>>>> net.core.rmem_default = 65536
>>>> net.core.wmem_default = 65536
>>>> net.core.netdev_max_backlog = 4096
>>>> net.core.somaxconn = 4096
>>>> net.ipv4.tcp_syncookies = 1
>>>> net.ipv4.tcp_tw_reuse = 1
>>>> net.ipv4.tcp_tw_recycle = 0
>>>> net.ipv4.tcp_fin_timeout = 30
>>>> net.ipv4.tcp_keepalive_time = 1200
>>>> net.ipv4.ip_local_port_range = 10000 65000
>>>> net.ipv4.tcp_max_syn_backlog = 4096
>>>> net.ipv4.tcp_max_tw_buckets = 5000
>>>> net.ipv4.tcp_rmem = 4096 87380 67108864
>>>> net.ipv4.tcp_wmem = 4096 65536 67108864
>>>> net.ipv4.tcp_mtu_probing = 1
>>>> vm.swappiness=0
>>>> vm.zone_reclaim_mode = 0
>>>> vm.dirty_writeback_centisecs = 500
>>>> vm.dirty_expire_centisecs = 500
>>>> ===============================================
>>>> #/etc/security/limits.conf
>>>> *       soft    nofile          65535
>>>> *       hard    nofile          65535
>>>>
>>>>
>>>> # End of file
>>>> *               soft    nofile             65535
>>>> *               hard    nofile             65535
>>>> *       soft    nofile          81920
>>>> *       hard    nofile          81920
>>>> *       soft    nproc           81920
>>>> *       hard    nproc           81920
>>>> *       soft    core            10240
>>>> *       hard    core            10240
>>>> *    soft    data       unlimited
>>>> *    hard    data       unlimited
>>>> *    soft    stack      unlimited
>>>> *    hard    stack      unlimited
>>>> *    soft    memory     unlimited
>>>> *    hard    memory     unlimited
>>>> *    soft    cpu        unlimited
>>>> *    hard    cpu        unlimited
>>>> *    soft    memlock    unlimited
>>>> *    hard    memlock    unlimited
>>>>
>>>> * hard memlock      unlimited
>>>> * soft memlock      unlimited
>>>> ===============================================
>>>>
>>>> client code
>>>> ==============================================
>>>> Ignition.setClientMode(true);
>>>>
>>>>         IgniteConfiguration cfg = new IgniteConfiguration();
>>>>         TcpDiscoverySpi spi = new TcpDiscoverySpi();
>>>>
>>>>         TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder();
>>>>         finder.setAddresses(Arrays.asList(env.getProperty("ignite.se
>>>> rver").split(",")));
>>>>         spi.setIpFinder(finder);
>>>>
>>>>         cfg.setDiscoverySpi(spi);
>>>>         cfg.setGridLogger(new Slf4jLogger());
>>>>         Ignite ignite = Ignition.start(cfg);
>>>>         IgniteCache<String, byte[]> igniteCache = ignite
>>>> .getOrCreateCache("qipu_entity_cache");
>>>>
>>>>         // get code 【Read operation response time often exceeds 1s】
>>>>         igniteCache.getAllAsync(keySet).get(1000);
>>>>
>>>>         // put code
>>>>         // cache.putAllAsync(map).get(3000);
>>>> ==============================================
>>>>
>>>>
>>>> Attachment is a node's gc log and node log
>>>>
>>>> Please give some suggestions on how to reduce the read operation
>>>> response time. Thank you.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards
>>>
>>> Pavel Vinokurov
>>>
>>
>>
>>
>> --
>>
>> Regards
>>
>> Pavel Vinokurov
>>
>
>
>
> --
>
> Regards
>
> Pavel Vinokurov
>



-- 

Regards

Pavel Vinokurov

Reply via email to