Hi, I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this parameter, but it will seriously affect the write speed, I do not know what the impact of setting this parameter is, whether it is necessary to set other parameters to increase the write speed?
------------------ ???????? ------------------ ??????: "Pavel Vinokurov"<[email protected]>; ????????: 2018??5??10??(??????) ????4:59 ??????: "user"<[email protected]>; ????: Re: Read request response time is unstable, often more than 500milliseconds, but the cluster load is small Hi, I see several exceptions in your logs. Probably it causes the slowdown. >> java.lang.ClassCastException: >> org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager >> cannot be cast to >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager Seems to you have the issue related to https://issues.apache.org/jira/browse/IGNITE-7865 that fixed in the 2.5 version. As workaround you could change WALMode to LOG_ONLY or start ignite with the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true Thanks, Pavel 2018-05-10 5:42 GMT+03:00 NO <[email protected]>: hi?? Ignite version : 2.4.0 Read operations often exceed 500 milliseconds, but the cluster traffic is very small. I don't know why. Please help me solve this problem. Thank you very much. Here is some configuration information. 8 node : (48 core ,192G RAM, 4TB SSD) Cluster records ?? 1.7 billion primary keys , 1.7 billion backup keys Get requests per second ?? 100+ Put requests per second ?? 400+ Each node occupies more than 500GB of disk space. 2 node : LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.2.1511 (Core) Release: 7.2.1511 Codename: Core 6 node: LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.7 (Final) Release: 6.7 Codename: Final ========================================================================= The node configuration is as follows <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd "> <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="failureDetectionTimeout" value="60000"/> <property name="clientFailureDetectionTimeout" value="60000"/> <property name="segmentationPolicy" value="RESTART_JVM"/> <property name="publicThreadPoolSize" value="64"/> <property name="systemThreadPoolSize" value="64"/> <property name="dataStreamerThreadPoolSize" value="64"/> <property name="rebalanceThreadPoolSize" value="4" /> <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="name" value="qipu_entity_cache_data_region"/> <property name="initialSize" value="#{10L * 1024 * 1024 * 1024}"/> <property name="maxSize" value="#{100L * 1024 * 1024 * 1024}"/> <property name="persistenceEnabled" value="true"/> <property name="metricsEnabled" value="true"/> <property name="checkpointPageBufferSize" value="#{1 * 1024 * 1024 * 1024}"/> </bean> </property> <property name="walSegmentSize" value="#{64 * 1024 * 1024}"/> <property name="pageSize" value="#{4 * 1024}"/> <property name="walSegments" value="#{20}"/> <property name="walMode" value="FSYNC"/> <property name="metricsEnabled" value="true"/> <property name="writeThrottlingEnabled" value="true"/> <property name="checkpointThreads" value="8"/> <property name="walThreadLocalBufferSize" value="#{1 * 1024 * 1024}"/> </bean> </property> <property name="cacheConfiguration"> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="dataRegionName" value="qipu_entity_cache_data_region"/> <property name="name" value="qipu_entity_cache"/> <property name="cacheMode" value="PARTITIONED"/> <property name="partitionLossPolicy" value="IGNORE"/> <property name="atomicityMode" value="ATOMIC"/> <property name="backups" value="1"/> <property name="writeSynchronizationMode" value="FULL_SYNC"/> <property name="statisticsEnabled" value="true"/> <property name="rebalanceBatchSize" value="#{20 * 1024 * 1024}"/> <property name="rebalanceThrottle" value="0"/> <property name="rebalanceMode" value="ASYNC"/> <property name="rebalanceBatchesPrefetchCount" value="4"/> <property name="rebalanceTimeout" value="20000"/> <property name="maxConcurrentAsyncOperations" value="#{4 * 500}"/> </bean> </property> <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <property name="messageQueueLimit" value="20480"/> </bean> </property> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="forceServerMode" value="true"/> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <!-- In distributed environment, replace with actual host IP address. --> <value>10.13.13.39:47500..47509</value> <value>10.13.13.49:47500..47509</value> <value>10.13.13.50:47500..47509</value> <value>10.13.13.51:47500..47509</value> <value>10.13.13.59:47500..47509</value> <value>10.13.13.60:47500..47509</value> <value>10.13.13.61:47500..47509</value> <value>10.13.13.63:47500..47509</value> </list> </property> </bean> </property> </bean> </property> <property name="gridLogger"> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> <constructor-arg type="java.lang.String" value="/home/qipu/production/apache-ignite-2.4.0/config/ignite-log4j2.xml"/> </bean> </property> </bean> </beans> ================================================================================================= #ignite.sh JVM config JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=512m" JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch" JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC" JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC" JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC" JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError " JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps" JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps" JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation" JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10" JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M" JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log" JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy" JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100" ===================================================================================================== node config #/etc/sysctl.conf fs.file-max = 512000 net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.core.netdev_max_backlog = 4096 net.core.somaxconn = 4096 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 1200 net.ipv4.ip_local_port_range = 10000 65000 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 net.ipv4.tcp_mtu_probing = 1 vm.swappiness=0 vm.zone_reclaim_mode = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 500 =============================================== #/etc/security/limits.conf * soft nofile 65535 * hard nofile 65535 # End of file * soft nofile 65535 * hard nofile 65535 * soft nofile 81920 * hard nofile 81920 * soft nproc 81920 * hard nproc 81920 * soft core 10240 * hard core 10240 * soft data unlimited * hard data unlimited * soft stack unlimited * hard stack unlimited * soft memory unlimited * hard memory unlimited * soft cpu unlimited * hard cpu unlimited * soft memlock unlimited * hard memlock unlimited * hard memlock unlimited * soft memlock unlimited =============================================== client code ============================================== Ignition.setClientMode(true); IgniteConfiguration cfg = new IgniteConfiguration(); TcpDiscoverySpi spi = new TcpDiscoverySpi(); TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder(); finder.setAddresses(Arrays.asList(env.getProperty("ignite.server").split(","))); spi.setIpFinder(finder); cfg.setDiscoverySpi(spi); cfg.setGridLogger(new Slf4jLogger()); Ignite ignite = Ignition.start(cfg); IgniteCache<String, byte[]> igniteCache = ignite.getOrCreateCache("qipu_entity_cache"); // get code ??Read operation response time often exceeds 1s?? igniteCache.getAllAsync(keySet).get(1000); // put code // cache.putAllAsync(map).get(3000); ============================================== Attachment is a node's gc log and node log Please give some suggestions on how to reduce the read operation response time. Thank you. -- Regards Pavel Vinokurov
