Using the LOG_ONLY mode, I remember having encountered this problem. After the node rebooted and printed an error message, the node could not be started. At that time, I did not reserve the error message. I searched for the source code, which may be one of the two. 1. 'Failed to find checkpoint record at the given WAL pointer' 2. 'on disk, but checkpoint record is missed in WAL '
In the LOG_ONLY mode, it may not start in case of node crash? ------------------ ???????? ------------------ ??????: "Pavel Vinokurov"<[email protected]>; ????????: 2018??5??10??(??????) ????5:13 ??????: "user"<[email protected]>; ????: Re: Read request response time is unstable, often more than500milliseconds, but the cluster load is small Please, try to check performance with LOG_ONLY mode. 2018-05-10 12:03 GMT+03:00 NO <[email protected]>: Hi, I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this parameter, but it will seriously affect the write speed, I do not know what the impact of setting this parameter is, whether it is necessary to set other parameters to increase the write speed? ------------------ ???????? ------------------ ??????: "Pavel Vinokurov"<[email protected]>; ????????: 2018??5??10??(??????) ????4:59 ??????: "user"<[email protected]>; ????: Re: Read request response time is unstable, often more than 500milliseconds, but the cluster load is small Hi, I see several exceptions in your logs. Probably it causes the slowdown. >> java.lang.ClassCastException: >> org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager >> cannot be cast to >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager Seems to you have the issue related to https://issues.apache.org/jira/browse/IGNITE-7865 that fixed in the 2.5 version. As workaround you could change WALMode to LOG_ONLY or start ignite with the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true Thanks, Pavel 2018-05-10 5:42 GMT+03:00 NO <[email protected]>: hi?? Ignite version : 2.4.0 Read operations often exceed 500 milliseconds, but the cluster traffic is very small. I don't know why. Please help me solve this problem. Thank you very much. Here is some configuration information. 8 node : (48 core ,192G RAM, 4TB SSD) Cluster records ?? 1.7 billion primary keys , 1.7 billion backup keys Get requests per second ?? 100+ Put requests per second ?? 400+ Each node occupies more than 500GB of disk space. 2 node : LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.2.1511 (Core) Release: 7.2.1511 Codename: Core 6 node: LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.7 (Final) Release: 6.7 Codename: Final ========================================================================= The node configuration is as follows <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd "> <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="failureDetectionTimeout" value="60000"/> <property name="clientFailureDetectionTimeout" value="60000"/> <property name="segmentationPolicy" value="RESTART_JVM"/> <property name="publicThreadPoolSize" value="64"/> <property name="systemThreadPoolSize" value="64"/> <property name="dataStreamerThreadPoolSize" value="64"/> <property name="rebalanceThreadPoolSize" value="4" /> <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="name" value="qipu_entity_cache_data_region"/> <property name="initialSize" value="#{10L * 1024 * 1024 * 1024}"/> <property name="maxSize" value="#{100L * 1024 * 1024 * 1024}"/> <property name="persistenceEnabled" value="true"/> <property name="metricsEnabled" value="true"/> <property name="checkpointPageBufferSize" value="#{1 * 1024 * 1024 * 1024}"/> </bean> </property> <property name="walSegmentSize" value="#{64 * 1024 * 1024}"/> <property name="pageSize" value="#{4 * 1024}"/> <property name="walSegments" value="#{20}"/> <property name="walMode" value="FSYNC"/> <property name="metricsEnabled" value="true"/> <property name="writeThrottlingEnabled" value="true"/> <property name="checkpointThreads" value="8"/> <property name="walThreadLocalBufferSize" value="#{1 * 1024 * 1024}"/> </bean> </property> <property name="cacheConfiguration"> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="dataRegionName" value="qipu_entity_cache_data_region"/> <property name="name" value="qipu_entity_cache"/> <property name="cacheMode" value="PARTITIONED"/> <property name="partitionLossPolicy" value="IGNORE"/> <property name="atomicityMode" value="ATOMIC"/> <property name="backups" value="1"/> <property name="writeSynchronizationMode" value="FULL_SYNC"/> <property name="statisticsEnabled" value="true"/> <property name="rebalanceBatchSize" value="#{20 * 1024 * 1024}"/> <property name="rebalanceThrottle" value="0"/> <property name="rebalanceMode" value="ASYNC"/> <property name="rebalanceBatchesPrefetchCount" value="4"/> <property name="rebalanceTimeout" value="20000"/> <property name="maxConcurrentAsyncOperations" value="#{4 * 500}"/> </bean> </property> <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <property name="messageQueueLimit" value="20480"/> </bean> </property> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="forceServerMode" value="true"/> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <!-- In distributed environment, replace with actual host IP address. --> <value>10.13.13.39:47500..47509</value> <value>10.13.13.49:47500..47509</value> <value>10.13.13.50:47500..47509</value> <value>10.13.13.51:47500..47509</value> <value>10.13.13.59:47500..47509</value> <value>10.13.13.60:47500..47509</value> <value>10.13.13.61:47500..47509</value> <value>10.13.13.63:47500..47509</value> </list> </property> </bean> </property> </bean> </property> <property name="gridLogger"> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> <constructor-arg type="java.lang.String" value="/home/qipu/production/apache-ignite-2.4.0/config/ignite-log4j2.xml"/> </bean> </property> </bean> </beans> ================================================================================================= #ignite.sh JVM config JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=512m" JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch" JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC" JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC" JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC" JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError " JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps" JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation" JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10" JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M" JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log" JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy" JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100" ===================================================================================================== node config #/etc/sysctl.conf fs.file-max = 512000 net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.core.netdev_max_backlog = 4096 net.core.somaxconn = 4096 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1200 net.ipv4.ip_local_port_range = 10000 65000 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 net.ipv4.tcp_mtu_probing = 1 vm.swappiness=0 vm.zone_reclaim_mode = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 500 =============================================== #/etc/security/limits.conf * soft nofile 65535 * hard nofile 65535 # End of file * soft nofile 65535 * hard nofile 65535 * soft nofile 81920 * hard nofile 81920 * soft nproc 81920 * hard nproc 81920 * soft core 10240 * hard core 10240 * soft data unlimited * hard data unlimited * soft stack unlimited * hard stack unlimited * soft memory unlimited * hard memory unlimited * soft cpu unlimited * hard cpu unlimited * soft memlock unlimited * hard memlock unlimited * hard memlock unlimited * soft memlock unlimited =============================================== client code ============================================== Ignition.setClientMode(true); IgniteConfiguration cfg = new IgniteConfiguration(); TcpDiscoverySpi spi = new TcpDiscoverySpi(); TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder(); finder.setAddresses(Arrays.asList(env.getProperty("ignite.server").split(","))); spi.setIpFinder(finder); cfg.setDiscoverySpi(spi); cfg.setGridLogger(new Slf4jLogger()); Ignite ignite = Ignition.start(cfg); IgniteCache<String, byte[]> igniteCache = ignite.getOrCreateCache("qipu_entity_cache"); // get code ??Read operation response time often exceeds 1s?? igniteCache.getAllAsync(keySet).get(1000); // put code // cache.putAllAsync(map).get(3000); ============================================== Attachment is a node's gc log and node log Please give some suggestions on how to reduce the read operation response time. Thank you. -- Regards Pavel Vinokurov -- Regards Pavel Vinokurov
