Hello! It seems that, as time goes, I/O can't catch up with you.
The recommendation here is, probably, to increase checkpoint frequency value (measured in ms; to do checkpoints less often) Let's say, set it to 600000 (10 minutes). The downside here is that in case of crash, node will take more time to come online. Regards, -- Ilya Kasnacheev пт, 22 февр. 2019 г. в 17:37, Antonio Conforti <[email protected]>: > Hi support, > > I'm running a performance test writing 4000 entry per second on a cache: > 1. TRANSACTIONAL > 2. partitioned > 3. with backup 1 (and affinity with exclude neighbors enabled) > 4. write synchronization mode FULL_ASYNC > 5. indexed on key and value (and enabled to SQL inquiry) > > Writes are performed by a client node using a data stream with > StreamVisitor > and set autoFlushFrequency 1 sec. > > We have configured: > 1. failureDetectionTimeout to 120000msec > 2. Data region (only 1): > a. Persistence enabled > b. max size 8 GB > c. checkpointPageBufferSize 2 GB > 3. WAL mode LOG_ONLY > 4. disabled WAL archiving (WAL path and the WAL archive path to the > same > value) > 5. Pages Writes Throttling enabled > > > After some hour submitting about 20 million entries without problems, the > client node starts to accuse delays: the queue from the client node Ignite > reads messages start to grow. > > Verifying the logs of server and client node there isn’t any error message > but from the statistics of WAL high FSYNC values are observed. > > Could you help me to understand why inspite a constant rate and a constant > consumption of cpu of about 30% only after a certain amount of entry it > seems the server slow down in term of performance? > > May be there is some param to tune that I missed? > > Below the configuration used for the simulation: > > Total server nodes 8 so distributed: > HOST1 with 4 nodes server and 1 client node on HDD disk > HOST2 with 4 nodes on HDD disk > > > Both hosts are machines with 16 cores of 256 GB of memory and HHD disk. > > The DataStorageConfiguration for each server node is as follows: > > > <property name="dataStorageConfiguration"> > <bean > > class="org.apache.ignite.configuration.DataStorageConfiguration"> > > <property name="writeThrottlingEnabled" > value="true" > /> > <property name="defaultDataRegionConfiguration"> > <bean > > class="org.apache.ignite.configuration.DataRegionConfiguration"> > <property name="persistenceEnabled" > value="true" /> > <property name="maxSize" > value="#{8L > * 1024 * 1024 * 1024}"/> > <property > name="checkpointPageBufferSize" > value="#{2048L * 1024 * > 1024}" /> > </bean> > </property> > > > > <property name="walMode" value="LOG_ONLY" /> > <property name="walPath" value="wal/path" /> > <property name="walArchivePath" value="wal/path" /> > </bean> > </property> > > > JVM option used for start each server node: > > -server -Xms4g -Xmx8g -XX:+AlwaysPreTouch -XX:+UseG1GC > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC > > > > I report the WAL statistics from log of node 1 : > > At Simulation start: > 2019-02-22 10:19:44.195 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=c115ead9-643d-45e5-be41-cd7ae5caac14, pages=11891, > markPos=FileWALPointer [idx=1, fileOff=36517886, len=79426], > walSegmentsCleared=0, walSegmentsCovered=[0], markDuration=34ms, > pagesWrite=87ms, fsync=1931ms, total=2052ms] > 2019-02-22 10:22:44.742 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=d40c9096-2e10-46ca-ae8e-2a39e242b768, pages=66732, > markPos=FileWALPointer [idx=7, fileOff=63806638, len=79426], > walSegmentsCleared=7, walSegmentsCovered=[1 - 6], markDuration=98ms, > pagesWrite=407ms, fsync=2085ms, total=2590ms] > 2019-02-22 10:25:44.900 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=5124f12f-6ed8-4ed3-9644-3c58957600ed, pages=70253, > markPos=FileWALPointer [idx=14, fileOff=47159207, len=79426], > walSegmentsCleared=6, walSegmentsCovered=[7 - 13], markDuration=98ms, > pagesWrite=402ms, fsync=2241ms, total=2741ms] > 2019-02-22 10:28:47.866 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=9094d36c-cd79-4d30-9b90-e48de78fa3e6, pages=72524, > markPos=FileWALPointer [idx=21, fileOff=39728290, len=79426], > walSegmentsCleared=8, walSegmentsCovered=[14 - 20], markDuration=83ms, > pagesWrite=365ms, fsync=5255ms, total=5703ms] > 2019-02-22 10:31:53.635 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=7132d53a-e2a6-4ac8-b1a8-2621cc39c82b, pages=77471, > markPos=FileWALPointer [idx=28, fileOff=64681287, len=79426], > walSegmentsCleared=7, walSegmentsCovered=[21 - 27], markDuration=494ms, > pagesWrite=748ms, fsync=10136ms, total=11472ms] > > > At end of simulation > > 2019-02-22 11:52:36.339 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=dc8369b9-b100-4dd0-bfaf-9a5b13620072, pages=129942, > markPos=FileWALPointer [idx=309, fileOff=19048810, len=79426], > walSegmentsCleared=11, walSegmentsCovered=[298 - 308], markDuration=77ms, > pagesWrite=797ms, fsync=171049ms, total=171923ms] > 2019-02-22 11:56:24.001 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=1fb1ced8-a257-4c6f-be97-a1867ba6692e, pages=133096, > markPos=FileWALPointer [idx=320, fileOff=13420410, len=79426], > walSegmentsCleared=11, walSegmentsCovered=[309 - 319], markDuration=1707ms, > pagesWrite=1537ms, fsync=216332ms, total=219576ms] > 2019-02-22 12:00:23.052 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=0d48f6d5-7601-4839-b422-55b228978da5, pages=150587, > markPos=FileWALPointer [idx=332, fileOff=47800048, len=79426], > walSegmentsCleared=12, walSegmentsCovered=[320 - 331], markDuration=2275ms, > pagesWrite=752ms, fsync=236023ms, total=239051ms] > 2019-02-22 12:04:05.562 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=b23526c4-5121-48a9-9902-522a5ffb3a28, pages=155805, > markPos=FileWALPointer [idx=345, fileOff=40020477, len=79426], > walSegmentsCleared=13, walSegmentsCovered=[332 - 344], markDuration=525ms, > pagesWrite=1324ms, fsync=220654ms, total=222504ms] > 2019-02-22 12:07:54.005 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=1bb2a0c0-7a89-47f2-af9f-90e90d44c14b, pages=149055, > markPos=FileWALPointer [idx=357, fileOff=51666923, len=79426], > walSegmentsCleared=12, walSegmentsCovered=[345 - 356], markDuration=995ms, > pagesWrite=1559ms, fsync=225888ms, total=228442ms] > 2019-02-22 12:11:49.962 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=19f14a04-842f-409e-9c6b-afb59193e419, pages=153022, > markPos=FileWALPointer [idx=370, fileOff=16234647, len=79426], > walSegmentsCleared=13, walSegmentsCovered=[357 - 369], markDuration=1773ms, > pagesWrite=1044ms, fsync=233139ms, total=235957ms] > 2019-02-22 12:15:59.332 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=c1316fb7-1ecc-4358-bf90-a87772969c03, pages=159668, > markPos=FileWALPointer [idx=383, fileOff=21979375, len=79426], > walSegmentsCleared=13, walSegmentsCovered=[370 - 382], markDuration=1249ms, > pagesWrite=1693ms, fsync=246428ms, total=249370ms] > 2019-02-22 12:20:05.814 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=a9e394f2-0011-4f89-8b5a-a6ed77774103, pages=156891, > markPos=FileWALPointer [idx=396, fileOff=3956799, len=79426], > walSegmentsCleared=13, walSegmentsCovered=[383 - 395], markDuration=1030ms, > pagesWrite=1275ms, fsync=244176ms, total=246482ms] > 2019-02-22 12:24:40.217 INFO 5271 --- [oint-thread-#67] > i.i.p.c.p.GridCacheDatabaseSharedManager : Checkpoint finished > [cpId=0bc71cc5-97c4-4f7d-95c2-430f379eeeb0, pages=148039, > markPos=FileWALPointer [idx=407, fileOff=57767331, len=79426], > walSegmentsCleared=11, walSegmentsCovered=[396 - 406], markDuration=323ms, > pagesWrite=1620ms, fsync=272460ms, total=274403ms] > > > Thanks in advance. > > Antonio > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
