Please, try to check performance with LOG_ONLY mode. 2018-05-10 12:03 GMT+03:00 NO <[email protected]>:
> Hi, > > I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this > parameter, but it will seriously affect the write speed, I do not know what > the impact of setting this parameter is, whether it is necessary to set > other parameters to increase the write speed? > > > ------------------ 原始邮件 ------------------ > *发件人:* "Pavel Vinokurov"<[email protected]>; > *发送时间:* 2018年5月10日(星期四) 下午4:59 > *收件人:* "user"<[email protected]>; > *主题:* Re: Read request response time is unstable, often more than > 500milliseconds, but the cluster load is small > > Hi, > > I see several exceptions in your logs. Probably it causes the slowdown. > >> java.lang.ClassCastException: org.apache.ignite.internal. > processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot > be cast to org.apache.ignite.internal.processors.cache.persistence. > wal.FileWriteAheadLogManager > > Seems to you have the issue related to https://issues.apache.org/j > ira/browse/IGNITE-7865 that fixed in the 2.5 version. > As workaround you could change WALMode to LOG_ONLY or start ignite with > the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true > > Thanks, > Pavel > > > > > > 2018-05-10 5:42 GMT+03:00 NO <[email protected]>: > >> hi, >> >> Ignite version : 2.4.0 >> >> Read operations often exceed 500 milliseconds, but the cluster traffic is >> very small. I don't know why. Please help me solve this problem. Thank you >> very much. Here is some configuration information. >> >> 8 node : (48 core ,192G RAM, 4TB SSD) >> Cluster records : 1.7 billion primary keys , 1.7 billion backup keys >> Get requests per second : 100+ >> Put requests per second : 400+ >> Each node occupies more than 500GB of disk space. >> >> 2 node : >> LSB Version: :core-4.1-amd64:core-4.1-noarc >> h:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4. >> 1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing- >> 4.1-amd64:printing-4.1-noarch >> Distributor ID: CentOS >> Description: CentOS Linux release 7.2.1511 (Core) >> Release: 7.2.1511 >> Codename: Core >> >> 6 node: >> LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0- >> noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0- >> amd64:printing-4.0-noarch >> Distributor ID: CentOS >> Description: CentOS release 6.7 (Final) >> Release: 6.7 >> Codename: Final >> ========================================================================= >> The node configuration is as follows >> <?xml version="1.0" encoding="UTF-8"?> >> <beans xmlns="http://www.springframework.org/schema/beans" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xmlns:util="http://www.springframework.org/schema/util" >> xsi:schemaLocation="http://www.springframework.org/schema/beans >> http://www.springframework.org/schema/beans/spring-beans.xsd >> http://www.springframework.org/schema/util >> http://www.springframework.org/schema/util/spring-util.xsd >> "> >> <bean id="ignite.cfg" class="org.apache.ignite.confi >> guration.IgniteConfiguration"> >> <property name="failureDetectionTimeout" value="60000"/> >> >> <property name="clientFailureDetectionTimeout" >> value="60000"/> >> <property name="segmentationPolicy" value="RESTART_JVM"/> >> <property name="publicThreadPoolSize" value="64"/> >> <property name="systemThreadPoolSize" value="64"/> >> <property name="dataStreamerThreadPoolSize" value="64"/> >> <property name="rebalanceThreadPoolSize" value="4" /> >> <property name="dataStorageConfiguration"> >> <bean class="org.apache.ignite.confi >> guration.DataStorageConfiguration"> >> <property name="defaultDataRegionConfiguration"> >> <bean class="org.apache.ignite.confi >> guration.DataRegionConfiguration"> >> <property name="name" >> value="qipu_entity_cache_data_region"/> >> <property name="initialSize" value="#{10L * 1024 >> * 1024 * 1024}"/> >> <property name="maxSize" value="#{100L * 1024 * >> 1024 * 1024}"/> >> <property name="persistenceEnabled" value="true"/> >> <property name="metricsEnabled" value="true"/> >> <property name="checkpointPageBufferSize" >> value="#{1 * 1024 * 1024 * 1024}"/> >> </bean> >> </property> >> <property name="walSegmentSize" value="#{64 * 1024 * >> 1024}"/> >> <property name="pageSize" value="#{4 * 1024}"/> >> <property name="walSegments" value="#{20}"/> >> <property name="walMode" value="FSYNC"/> >> <property name="metricsEnabled" value="true"/> >> <property name="writeThrottlingEnabled" value="true"/> >> >> <property name="checkpointThreads" value="8"/> >> >> <property name="walThreadLocalBufferSize" value="#{1 * >> 1024 * 1024}"/> >> </bean> >> </property> >> >> <property name="cacheConfiguration"> >> <bean class="org.apache.ignite.confi >> guration.CacheConfiguration"> >> <property name="dataRegionName" >> value="qipu_entity_cache_data_region"/> >> <property name="name" value="qipu_entity_cache"/> >> <property name="cacheMode" value="PARTITIONED"/> >> <property name="partitionLossPolicy" value="IGNORE"/> >> <property name="atomicityMode" value="ATOMIC"/> >> <property name="backups" value="1"/> >> <property name="writeSynchronizationMode" >> value="FULL_SYNC"/> >> <property name="statisticsEnabled" value="true"/> >> <property name="rebalanceBatchSize" value="#{20 * 1024 * >> 1024}"/> >> <property name="rebalanceThrottle" value="0"/> >> >> <property name="rebalanceMode" value="ASYNC"/> >> <property name="rebalanceBatchesPrefetchCount" >> value="4"/> >> <property name="rebalanceTimeout" value="20000"/> >> >> <property name="maxConcurrentAsyncOperations" value="#{4 >> * 500}"/> >> </bean> >> </property> >> >> <property name="communicationSpi"> >> <bean class="org.apache.ignite.spi.c >> ommunication.tcp.TcpCommunicationSpi"> >> <property name="messageQueueLimit" value="20480"/> >> </bean> >> </property> >> <property name="discoverySpi"> >> <bean class="org.apache.ignite.spi.d >> iscovery.tcp.TcpDiscoverySpi"> >> <property name="forceServerMode" value="true"/> >> <property name="ipFinder"> >> <bean class="org.apache.ignite.spi.d >> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> >> <property name="addresses"> >> <list> >> <!-- In distributed environment, replace >> with actual host IP address. --> >> <value>10.13.13.39:47500..47509</value> >> <value>10.13.13.49:47500..47509</value> >> <value>10.13.13.50:47500..47509</value> >> <value>10.13.13.51:47500..47509</value> >> <value>10.13.13.59:47500..47509</value> >> <value>10.13.13.60:47500..47509</value> >> <value>10.13.13.61:47500..47509</value> >> <value>10.13.13.63:47500..47509</value> >> </list> >> </property> >> </bean> >> </property> >> </bean> >> </property> >> <property name="gridLogger"> >> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >> <constructor-arg type="java.lang.String" >> value="/home/qipu/production/apache-ignite-2.4.0/config/igni >> te-log4j2.xml"/> >> </bean> >> </property> >> </bean> >> </beans> >> ============================================================ >> ===================================== >> #ignite.sh >> JVM config >> JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts >> -XX:MaxMetaspaceSize=512m" >> JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch" >> JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC" >> JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC" >> JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC" >> JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError " >> JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work" >> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails" >> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps" >> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps" >> JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation" >> JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10" >> JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M" >> JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log" >> JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy" >> JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100" >> ============================================================ >> ========================================= >> node config >> #/etc/sysctl.conf >> fs.file-max = 512000 >> net.core.rmem_max = 67108864 >> net.core.wmem_max = 67108864 >> net.core.rmem_default = 65536 >> net.core.wmem_default = 65536 >> net.core.netdev_max_backlog = 4096 >> net.core.somaxconn = 4096 >> net.ipv4.tcp_syncookies = 1 >> net.ipv4.tcp_tw_reuse = 1 >> net.ipv4.tcp_tw_recycle = 0 >> net.ipv4.tcp_fin_timeout = 30 >> net.ipv4.tcp_keepalive_time = 1200 >> net.ipv4.ip_local_port_range = 10000 65000 >> net.ipv4.tcp_max_syn_backlog = 4096 >> net.ipv4.tcp_max_tw_buckets = 5000 >> net.ipv4.tcp_rmem = 4096 87380 67108864 >> net.ipv4.tcp_wmem = 4096 65536 67108864 >> net.ipv4.tcp_mtu_probing = 1 >> vm.swappiness=0 >> vm.zone_reclaim_mode = 0 >> vm.dirty_writeback_centisecs = 500 >> vm.dirty_expire_centisecs = 500 >> =============================================== >> #/etc/security/limits.conf >> * soft nofile 65535 >> * hard nofile 65535 >> >> >> # End of file >> * soft nofile 65535 >> * hard nofile 65535 >> * soft nofile 81920 >> * hard nofile 81920 >> * soft nproc 81920 >> * hard nproc 81920 >> * soft core 10240 >> * hard core 10240 >> * soft data unlimited >> * hard data unlimited >> * soft stack unlimited >> * hard stack unlimited >> * soft memory unlimited >> * hard memory unlimited >> * soft cpu unlimited >> * hard cpu unlimited >> * soft memlock unlimited >> * hard memlock unlimited >> >> * hard memlock unlimited >> * soft memlock unlimited >> =============================================== >> >> client code >> ============================================== >> Ignition.setClientMode(true); >> >> IgniteConfiguration cfg = new IgniteConfiguration(); >> TcpDiscoverySpi spi = new TcpDiscoverySpi(); >> >> TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder(); >> finder.setAddresses(Arrays.asList(env.getProperty("ignite. >> server").split(","))); >> spi.setIpFinder(finder); >> >> cfg.setDiscoverySpi(spi); >> cfg.setGridLogger(new Slf4jLogger()); >> Ignite ignite = Ignition.start(cfg); >> IgniteCache<String, byte[]> igniteCache = ignite >> .getOrCreateCache("qipu_entity_cache"); >> >> // get code 【Read operation response time often exceeds 1s】 >> igniteCache.getAllAsync(keySet).get(1000); >> >> // put code >> // cache.putAllAsync(map).get(3000); >> ============================================== >> >> >> Attachment is a node's gc log and node log >> >> Please give some suggestions on how to reduce the read operation response >> time. Thank you. >> >> >> >> > > > -- > > Regards > > Pavel Vinokurov > -- Regards Pavel Vinokurov
