Could you please attach logs with enabled IGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER parameter.
2018-05-11 6:09 GMT+03:00 NO <727418...@qq.com>: > I only encountered this problem once. I did not reproduce this problem in > FSYNC mode, I have time to find ways to reproduce the problem. > > Regarding the issue of read performance degradation, I used the > -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true parameter, but the problem > was not solved. However, after I stopped the write request, the response > time of the read request was significantly reduced. I do not understand why > the write request affects the read request, and How should I optimize, > thank you very much. > > > ------------------ 原始邮件 ------------------ > *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>; > *发送时间:* 2018年5月10日(星期四) 晚上6:37 > *收件人:* "user"<user@ignite.apache.org>; > *主题:* Re: Read request response time is unstable, often > morethan500milliseconds, but the cluster load is small > > Ignite node should start with any wal mode. I suppose that the same error > should be occurred with FSYNC mode. > Would you be able to restart with LOG_ONLY mode and show the logs. > > 2018-05-10 12:39 GMT+03:00 NO <727418...@qq.com>: > >> Using the LOG_ONLY mode, I remember having encountered this problem. >> After the node rebooted and printed an error message, the node could not be >> started. At that time, I did not reserve the error message. I searched for >> the source code, which may be one of the two. >> 1. 'Failed to find checkpoint record at the given WAL pointer' >> 2. 'on disk, but checkpoint record is missed in WAL ' >> >> In the LOG_ONLY mode, it may not start in case of node crash? >> >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>; >> *发送时间:* 2018年5月10日(星期四) 下午5:13 >> *收件人:* "user"<user@ignite.apache.org>; >> *主题:* Re: Read request response time is unstable, often more >> than500milliseconds, but the cluster load is small >> >> Please, try to check performance with LOG_ONLY mode. >> >> 2018-05-10 12:03 GMT+03:00 NO <727418...@qq.com>: >> >>> Hi, >>> >>> I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this >>> parameter, but it will seriously affect the write speed, I do not know what >>> the impact of setting this parameter is, whether it is necessary to set >>> other parameters to increase the write speed? >>> >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "Pavel Vinokurov"<vinokurov.pa...@gmail.com>; >>> *发送时间:* 2018年5月10日(星期四) 下午4:59 >>> *收件人:* "user"<user@ignite.apache.org>; >>> *主题:* Re: Read request response time is unstable, often more than >>> 500milliseconds, but the cluster load is small >>> >>> Hi, >>> >>> I see several exceptions in your logs. Probably it causes the slowdown. >>> >> java.lang.ClassCastException: org.apache.ignite.internal.pro >>> cessors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot >>> be cast to org.apache.ignite.internal.processors.cache.persistence.wal. >>> FileWriteAheadLogManager >>> >>> Seems to you have the issue related to https://issues.apache.org/j >>> ira/browse/IGNITE-7865 that fixed in the 2.5 version. >>> As workaround you could change WALMode to LOG_ONLY or start ignite with >>> the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true >>> >>> Thanks, >>> Pavel >>> >>> >>> >>> >>> >>> 2018-05-10 5:42 GMT+03:00 NO <727418...@qq.com>: >>> >>>> hi, >>>> >>>> Ignite version : 2.4.0 >>>> >>>> Read operations often exceed 500 milliseconds, but the cluster traffic >>>> is very small. I don't know why. Please help me solve this problem. Thank >>>> you very much. Here is some configuration information. >>>> >>>> 8 node : (48 core ,192G RAM, 4TB SSD) >>>> Cluster records : 1.7 billion primary keys , 1.7 billion backup keys >>>> Get requests per second : 100+ >>>> Put requests per second : 400+ >>>> Each node occupies more than 500GB of disk space. >>>> >>>> 2 node : >>>> LSB Version: :core-4.1-amd64:core-4.1-noarc >>>> h:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1 >>>> -noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4. >>>> 1-amd64:printing-4.1-noarch >>>> Distributor ID: CentOS >>>> Description: CentOS Linux release 7.2.1511 (Core) >>>> Release: 7.2.1511 >>>> Codename: Core >>>> >>>> 6 node: >>>> LSB Version: :base-4.0-amd64:base-4.0-noarc >>>> h:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics >>>> -4.0-noarch:printing-4.0-amd64:printing-4.0-noarch >>>> Distributor ID: CentOS >>>> Description: CentOS release 6.7 (Final) >>>> Release: 6.7 >>>> Codename: Final >>>> ============================================================ >>>> ============= >>>> The node configuration is as follows >>>> <?xml version="1.0" encoding="UTF-8"?> >>>> <beans xmlns="http://www.springframework.org/schema/beans" >>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>> xmlns:util="http://www.springframework.org/schema/util" >>>> xsi:schemaLocation="http://www.springframework.org/schema/beans >>>> http://www.springframework.org/schema/beans/spring-beans.xsd >>>> http://www.springframework.org/schema/util >>>> http://www.springframework.org/schema/util/spring-util.xsd >>>> "> >>>> <bean id="ignite.cfg" class="org.apache.ignite.confi >>>> guration.IgniteConfiguration"> >>>> <property name="failureDetectionTimeout" value="60000"/> >>>> >>>> <property name="clientFailureDetectionTimeout" >>>> value="60000"/> >>>> <property name="segmentationPolicy" value="RESTART_JVM"/> >>>> >>>> <property name="publicThreadPoolSize" value="64"/> >>>> <property name="systemThreadPoolSize" value="64"/> >>>> <property name="dataStreamerThreadPoolSize" value="64"/> >>>> <property name="rebalanceThreadPoolSize" value="4" /> >>>> <property name="dataStorageConfiguration"> >>>> <bean class="org.apache.ignite.confi >>>> guration.DataStorageConfiguration"> >>>> <property name="defaultDataRegionConfiguration"> >>>> <bean class="org.apache.ignite.confi >>>> guration.DataRegionConfiguration"> >>>> <property name="name" >>>> value="qipu_entity_cache_data_region"/> >>>> <property name="initialSize" value="#{10L * >>>> 1024 * 1024 * 1024}"/> >>>> <property name="maxSize" value="#{100L * 1024 * >>>> 1024 * 1024}"/> >>>> <property name="persistenceEnabled" >>>> value="true"/> >>>> <property name="metricsEnabled" value="true"/> >>>> <property name="checkpointPageBufferSize" >>>> value="#{1 * 1024 * 1024 * 1024}"/> >>>> </bean> >>>> </property> >>>> <property name="walSegmentSize" value="#{64 * 1024 * >>>> 1024}"/> >>>> <property name="pageSize" value="#{4 * 1024}"/> >>>> <property name="walSegments" value="#{20}"/> >>>> <property name="walMode" value="FSYNC"/> >>>> <property name="metricsEnabled" value="true"/> >>>> <property name="writeThrottlingEnabled" >>>> value="true"/> >>>> <property name="checkpointThreads" value="8"/> >>>> >>>> <property name="walThreadLocalBufferSize" value="#{1 * >>>> 1024 * 1024}"/> >>>> </bean> >>>> </property> >>>> >>>> <property name="cacheConfiguration"> >>>> <bean class="org.apache.ignite.confi >>>> guration.CacheConfiguration"> >>>> <property name="dataRegionName" >>>> value="qipu_entity_cache_data_region"/> >>>> <property name="name" value="qipu_entity_cache"/> >>>> <property name="cacheMode" value="PARTITIONED"/> >>>> <property name="partitionLossPolicy" value="IGNORE"/> >>>> <property name="atomicityMode" value="ATOMIC"/> >>>> <property name="backups" value="1"/> >>>> <property name="writeSynchronizationMode" >>>> value="FULL_SYNC"/> >>>> <property name="statisticsEnabled" value="true"/> >>>> <property name="rebalanceBatchSize" value="#{20 * 1024 >>>> * 1024}"/> >>>> <property name="rebalanceThrottle" value="0"/> >>>> >>>> <property name="rebalanceMode" value="ASYNC"/> >>>> >>>> <property name="rebalanceBatchesPrefetchCount" >>>> value="4"/> >>>> <property name="rebalanceTimeout" value="20000"/> >>>> >>>> <property name="maxConcurrentAsyncOperations" >>>> value="#{4 * 500}"/> >>>> </bean> >>>> </property> >>>> >>>> <property name="communicationSpi"> >>>> <bean class="org.apache.ignite.spi.c >>>> ommunication.tcp.TcpCommunicationSpi"> >>>> <property name="messageQueueLimit" value="20480"/> >>>> </bean> >>>> </property> >>>> <property name="discoverySpi"> >>>> <bean class="org.apache.ignite.spi.d >>>> iscovery.tcp.TcpDiscoverySpi"> >>>> <property name="forceServerMode" value="true"/> >>>> <property name="ipFinder"> >>>> <bean class="org.apache.ignite.spi.d >>>> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> >>>> <property name="addresses"> >>>> <list> >>>> <!-- In distributed environment, >>>> replace with actual host IP address. --> >>>> <value>10.13.13.39:47500..47509</value> >>>> <value>10.13.13.49:47500..47509</value> >>>> <value>10.13.13.50:47500..47509</value> >>>> <value>10.13.13.51:47500..47509</value> >>>> <value>10.13.13.59:47500..47509</value> >>>> <value>10.13.13.60:47500..47509</value> >>>> <value>10.13.13.61:47500..47509</value> >>>> <value>10.13.13.63:47500..47509</value> >>>> </list> >>>> </property> >>>> </bean> >>>> </property> >>>> </bean> >>>> </property> >>>> <property name="gridLogger"> >>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >>>> <constructor-arg type="java.lang.String" >>>> value="/home/qipu/production/apache-ignite-2.4.0/config/igni >>>> te-log4j2.xml"/> >>>> </bean> >>>> </property> >>>> </bean> >>>> </beans> >>>> ============================================================ >>>> ===================================== >>>> #ignite.sh >>>> JVM config >>>> JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts >>>> -XX:MaxMetaspaceSize=512m" >>>> JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch" >>>> JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC" >>>> JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC" >>>> JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC" >>>> JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError " >>>> JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work" >>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails" >>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps" >>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps" >>>> JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation" >>>> JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10" >>>> JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M" >>>> JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log" >>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy" >>>> JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100" >>>> ============================================================ >>>> ========================================= >>>> node config >>>> #/etc/sysctl.conf >>>> fs.file-max = 512000 >>>> net.core.rmem_max = 67108864 >>>> net.core.wmem_max = 67108864 >>>> net.core.rmem_default = 65536 >>>> net.core.wmem_default = 65536 >>>> net.core.netdev_max_backlog = 4096 >>>> net.core.somaxconn = 4096 >>>> net.ipv4.tcp_syncookies = 1 >>>> net.ipv4.tcp_tw_reuse = 1 >>>> net.ipv4.tcp_tw_recycle = 0 >>>> net.ipv4.tcp_fin_timeout = 30 >>>> net.ipv4.tcp_keepalive_time = 1200 >>>> net.ipv4.ip_local_port_range = 10000 65000 >>>> net.ipv4.tcp_max_syn_backlog = 4096 >>>> net.ipv4.tcp_max_tw_buckets = 5000 >>>> net.ipv4.tcp_rmem = 4096 87380 67108864 >>>> net.ipv4.tcp_wmem = 4096 65536 67108864 >>>> net.ipv4.tcp_mtu_probing = 1 >>>> vm.swappiness=0 >>>> vm.zone_reclaim_mode = 0 >>>> vm.dirty_writeback_centisecs = 500 >>>> vm.dirty_expire_centisecs = 500 >>>> =============================================== >>>> #/etc/security/limits.conf >>>> * soft nofile 65535 >>>> * hard nofile 65535 >>>> >>>> >>>> # End of file >>>> * soft nofile 65535 >>>> * hard nofile 65535 >>>> * soft nofile 81920 >>>> * hard nofile 81920 >>>> * soft nproc 81920 >>>> * hard nproc 81920 >>>> * soft core 10240 >>>> * hard core 10240 >>>> * soft data unlimited >>>> * hard data unlimited >>>> * soft stack unlimited >>>> * hard stack unlimited >>>> * soft memory unlimited >>>> * hard memory unlimited >>>> * soft cpu unlimited >>>> * hard cpu unlimited >>>> * soft memlock unlimited >>>> * hard memlock unlimited >>>> >>>> * hard memlock unlimited >>>> * soft memlock unlimited >>>> =============================================== >>>> >>>> client code >>>> ============================================== >>>> Ignition.setClientMode(true); >>>> >>>> IgniteConfiguration cfg = new IgniteConfiguration(); >>>> TcpDiscoverySpi spi = new TcpDiscoverySpi(); >>>> >>>> TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder(); >>>> finder.setAddresses(Arrays.asList(env.getProperty("ignite.se >>>> rver").split(","))); >>>> spi.setIpFinder(finder); >>>> >>>> cfg.setDiscoverySpi(spi); >>>> cfg.setGridLogger(new Slf4jLogger()); >>>> Ignite ignite = Ignition.start(cfg); >>>> IgniteCache<String, byte[]> igniteCache = ignite >>>> .getOrCreateCache("qipu_entity_cache"); >>>> >>>> // get code 【Read operation response time often exceeds 1s】 >>>> igniteCache.getAllAsync(keySet).get(1000); >>>> >>>> // put code >>>> // cache.putAllAsync(map).get(3000); >>>> ============================================== >>>> >>>> >>>> Attachment is a node's gc log and node log >>>> >>>> Please give some suggestions on how to reduce the read operation >>>> response time. Thank you. >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> >>> Regards >>> >>> Pavel Vinokurov >>> >> >> >> >> -- >> >> Regards >> >> Pavel Vinokurov >> > > > > -- > > Regards > > Pavel Vinokurov > -- Regards Pavel Vinokurov