Ignite node should start with any wal mode. I suppose that the same error should be occurred with FSYNC mode. Would you be able to restart with LOG_ONLY mode and show the logs.
2018-05-10 12:39 GMT+03:00 NO <[email protected]>: > Using the LOG_ONLY mode, I remember having encountered this problem. After > the node rebooted and printed an error message, the node could not be > started. At that time, I did not reserve the error message. I searched for > the source code, which may be one of the two. > 1. 'Failed to find checkpoint record at the given WAL pointer' > 2. 'on disk, but checkpoint record is missed in WAL ' > > In the LOG_ONLY mode, it may not start in case of node crash? > > > ------------------ 原始邮件 ------------------ > *发件人:* "Pavel Vinokurov"<[email protected]>; > *发送时间:* 2018年5月10日(星期四) 下午5:13 > *收件人:* "user"<[email protected]>; > *主题:* Re: Read request response time is unstable, often more > than500milliseconds, but the cluster load is small > > Please, try to check performance with LOG_ONLY mode. > > 2018-05-10 12:03 GMT+03:00 NO <[email protected]>: > >> Hi, >> >> I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this >> parameter, but it will seriously affect the write speed, I do not know what >> the impact of setting this parameter is, whether it is necessary to set >> other parameters to increase the write speed? >> >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Pavel Vinokurov"<[email protected]>; >> *发送时间:* 2018年5月10日(星期四) 下午4:59 >> *收件人:* "user"<[email protected]>; >> *主题:* Re: Read request response time is unstable, often more than >> 500milliseconds, but the cluster load is small >> >> Hi, >> >> I see several exceptions in your logs. Probably it causes the slowdown. >> >> java.lang.ClassCastException: org.apache.ignite.internal.pro >> cessors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot >> be cast to org.apache.ignite.internal.processors.cache.persistence.wal. >> FileWriteAheadLogManager >> >> Seems to you have the issue related to https://issues.apache.org/j >> ira/browse/IGNITE-7865 that fixed in the 2.5 version. >> As workaround you could change WALMode to LOG_ONLY or start ignite with >> the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true >> >> Thanks, >> Pavel >> >> >> >> >> >> 2018-05-10 5:42 GMT+03:00 NO <[email protected]>: >> >>> hi, >>> >>> Ignite version : 2.4.0 >>> >>> Read operations often exceed 500 milliseconds, but the cluster traffic >>> is very small. I don't know why. Please help me solve this problem. Thank >>> you very much. Here is some configuration information. >>> >>> 8 node : (48 core ,192G RAM, 4TB SSD) >>> Cluster records : 1.7 billion primary keys , 1.7 billion backup keys >>> Get requests per second : 100+ >>> Put requests per second : 400+ >>> Each node occupies more than 500GB of disk space. >>> >>> 2 node : >>> LSB Version: :core-4.1-amd64:core-4.1-noarc >>> h:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1 >>> -noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4. >>> 1-amd64:printing-4.1-noarch >>> Distributor ID: CentOS >>> Description: CentOS Linux release 7.2.1511 (Core) >>> Release: 7.2.1511 >>> Codename: Core >>> >>> 6 node: >>> LSB Version: :base-4.0-amd64:base-4.0-noarc >>> h:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics >>> -4.0-noarch:printing-4.0-amd64:printing-4.0-noarch >>> Distributor ID: CentOS >>> Description: CentOS release 6.7 (Final) >>> Release: 6.7 >>> Codename: Final >>> ============================================================ >>> ============= >>> The node configuration is as follows >>> <?xml version="1.0" encoding="UTF-8"?> >>> <beans xmlns="http://www.springframework.org/schema/beans" >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>> xmlns:util="http://www.springframework.org/schema/util" >>> xsi:schemaLocation="http://www.springframework.org/schema/beans >>> http://www.springframework.org/schema/beans/spring-beans.xsd >>> http://www.springframework.org/schema/util >>> http://www.springframework.org/schema/util/spring-util.xsd >>> "> >>> <bean id="ignite.cfg" class="org.apache.ignite.confi >>> guration.IgniteConfiguration"> >>> <property name="failureDetectionTimeout" value="60000"/> >>> >>> <property name="clientFailureDetectionTimeout" >>> value="60000"/> >>> <property name="segmentationPolicy" value="RESTART_JVM"/> >>> <property name="publicThreadPoolSize" value="64"/> >>> <property name="systemThreadPoolSize" value="64"/> >>> <property name="dataStreamerThreadPoolSize" value="64"/> >>> <property name="rebalanceThreadPoolSize" value="4" /> >>> <property name="dataStorageConfiguration"> >>> <bean class="org.apache.ignite.confi >>> guration.DataStorageConfiguration"> >>> <property name="defaultDataRegionConfiguration"> >>> <bean class="org.apache.ignite.confi >>> guration.DataRegionConfiguration"> >>> <property name="name" >>> value="qipu_entity_cache_data_region"/> >>> <property name="initialSize" value="#{10L * 1024 >>> * 1024 * 1024}"/> >>> <property name="maxSize" value="#{100L * 1024 * >>> 1024 * 1024}"/> >>> <property name="persistenceEnabled" >>> value="true"/> >>> <property name="metricsEnabled" value="true"/> >>> <property name="checkpointPageBufferSize" >>> value="#{1 * 1024 * 1024 * 1024}"/> >>> </bean> >>> </property> >>> <property name="walSegmentSize" value="#{64 * 1024 * >>> 1024}"/> >>> <property name="pageSize" value="#{4 * 1024}"/> >>> <property name="walSegments" value="#{20}"/> >>> <property name="walMode" value="FSYNC"/> >>> <property name="metricsEnabled" value="true"/> >>> <property name="writeThrottlingEnabled" >>> value="true"/> >>> <property name="checkpointThreads" value="8"/> >>> >>> <property name="walThreadLocalBufferSize" value="#{1 * >>> 1024 * 1024}"/> >>> </bean> >>> </property> >>> >>> <property name="cacheConfiguration"> >>> <bean class="org.apache.ignite.confi >>> guration.CacheConfiguration"> >>> <property name="dataRegionName" >>> value="qipu_entity_cache_data_region"/> >>> <property name="name" value="qipu_entity_cache"/> >>> <property name="cacheMode" value="PARTITIONED"/> >>> <property name="partitionLossPolicy" value="IGNORE"/> >>> <property name="atomicityMode" value="ATOMIC"/> >>> <property name="backups" value="1"/> >>> <property name="writeSynchronizationMode" >>> value="FULL_SYNC"/> >>> <property name="statisticsEnabled" value="true"/> >>> <property name="rebalanceBatchSize" value="#{20 * 1024 * >>> 1024}"/> >>> <property name="rebalanceThrottle" value="0"/> >>> >>> <property name="rebalanceMode" value="ASYNC"/> >>> >>> <property name="rebalanceBatchesPrefetchCount" >>> value="4"/> >>> <property name="rebalanceTimeout" value="20000"/> >>> >>> <property name="maxConcurrentAsyncOperations" >>> value="#{4 * 500}"/> >>> </bean> >>> </property> >>> >>> <property name="communicationSpi"> >>> <bean class="org.apache.ignite.spi.c >>> ommunication.tcp.TcpCommunicationSpi"> >>> <property name="messageQueueLimit" value="20480"/> >>> </bean> >>> </property> >>> <property name="discoverySpi"> >>> <bean class="org.apache.ignite.spi.d >>> iscovery.tcp.TcpDiscoverySpi"> >>> <property name="forceServerMode" value="true"/> >>> <property name="ipFinder"> >>> <bean class="org.apache.ignite.spi.d >>> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> >>> <property name="addresses"> >>> <list> >>> <!-- In distributed environment, >>> replace with actual host IP address. --> >>> <value>10.13.13.39:47500..47509</value> >>> <value>10.13.13.49:47500..47509</value> >>> <value>10.13.13.50:47500..47509</value> >>> <value>10.13.13.51:47500..47509</value> >>> <value>10.13.13.59:47500..47509</value> >>> <value>10.13.13.60:47500..47509</value> >>> <value>10.13.13.61:47500..47509</value> >>> <value>10.13.13.63:47500..47509</value> >>> </list> >>> </property> >>> </bean> >>> </property> >>> </bean> >>> </property> >>> <property name="gridLogger"> >>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >>> <constructor-arg type="java.lang.String" >>> value="/home/qipu/production/apache-ignite-2.4.0/config/igni >>> te-log4j2.xml"/> >>> </bean> >>> </property> >>> </bean> >>> </beans> >>> ============================================================ >>> ===================================== >>> #ignite.sh >>> JVM config >>> JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts >>> -XX:MaxMetaspaceSize=512m" >>> JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch" >>> JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC" >>> JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC" >>> JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC" >>> JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError " >>> JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work" >>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails" >>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps" >>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps" >>> JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation" >>> JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10" >>> JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M" >>> JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log" >>> JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy" >>> JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100" >>> ============================================================ >>> ========================================= >>> node config >>> #/etc/sysctl.conf >>> fs.file-max = 512000 >>> net.core.rmem_max = 67108864 >>> net.core.wmem_max = 67108864 >>> net.core.rmem_default = 65536 >>> net.core.wmem_default = 65536 >>> net.core.netdev_max_backlog = 4096 >>> net.core.somaxconn = 4096 >>> net.ipv4.tcp_syncookies = 1 >>> net.ipv4.tcp_tw_reuse = 1 >>> net.ipv4.tcp_tw_recycle = 0 >>> net.ipv4.tcp_fin_timeout = 30 >>> net.ipv4.tcp_keepalive_time = 1200 >>> net.ipv4.ip_local_port_range = 10000 65000 >>> net.ipv4.tcp_max_syn_backlog = 4096 >>> net.ipv4.tcp_max_tw_buckets = 5000 >>> net.ipv4.tcp_rmem = 4096 87380 67108864 >>> net.ipv4.tcp_wmem = 4096 65536 67108864 >>> net.ipv4.tcp_mtu_probing = 1 >>> vm.swappiness=0 >>> vm.zone_reclaim_mode = 0 >>> vm.dirty_writeback_centisecs = 500 >>> vm.dirty_expire_centisecs = 500 >>> =============================================== >>> #/etc/security/limits.conf >>> * soft nofile 65535 >>> * hard nofile 65535 >>> >>> >>> # End of file >>> * soft nofile 65535 >>> * hard nofile 65535 >>> * soft nofile 81920 >>> * hard nofile 81920 >>> * soft nproc 81920 >>> * hard nproc 81920 >>> * soft core 10240 >>> * hard core 10240 >>> * soft data unlimited >>> * hard data unlimited >>> * soft stack unlimited >>> * hard stack unlimited >>> * soft memory unlimited >>> * hard memory unlimited >>> * soft cpu unlimited >>> * hard cpu unlimited >>> * soft memlock unlimited >>> * hard memlock unlimited >>> >>> * hard memlock unlimited >>> * soft memlock unlimited >>> =============================================== >>> >>> client code >>> ============================================== >>> Ignition.setClientMode(true); >>> >>> IgniteConfiguration cfg = new IgniteConfiguration(); >>> TcpDiscoverySpi spi = new TcpDiscoverySpi(); >>> >>> TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder(); >>> finder.setAddresses(Arrays.asList(env.getProperty("ignite.se >>> rver").split(","))); >>> spi.setIpFinder(finder); >>> >>> cfg.setDiscoverySpi(spi); >>> cfg.setGridLogger(new Slf4jLogger()); >>> Ignite ignite = Ignition.start(cfg); >>> IgniteCache<String, byte[]> igniteCache = ignite >>> .getOrCreateCache("qipu_entity_cache"); >>> >>> // get code 【Read operation response time often exceeds 1s】 >>> igniteCache.getAllAsync(keySet).get(1000); >>> >>> // put code >>> // cache.putAllAsync(map).get(3000); >>> ============================================== >>> >>> >>> Attachment is a node's gc log and node log >>> >>> Please give some suggestions on how to reduce the read operation >>> response time. Thank you. >>> >>> >>> >>> >> >> >> -- >> >> Regards >> >> Pavel Vinokurov >> > > > > -- > > Regards > > Pavel Vinokurov > -- Regards Pavel Vinokurov
