On 2015-04-23 5:56 PM, Howard Chu wrote: > Geoff Swan wrote: >> >> >> On 2015-04-23 4:43 PM, Howard Chu wrote: >>> Geoff Swan wrote: >>>> >>>> >>>> On 2015-04-23 4:07 PM, Ulrich Windl wrote: >>>>>>>> Geoff Swan <[email protected]> schrieb am 22.04.2015 um 23:18 >>>>>>>> in Nachricht >>>>> <[email protected]>: >>>>> >>>>> [...] >>>>>> Free stats look fine to me. No swap is being used, or has been used >>>>>> yet >>>>>> on this system. >>>>>> >>>>>> total used free shared buffers >>>>>> cached >>>>>> Mem: 132005400 21928192 110077208 10612 363064 >>>>>> 19230072 >>>>>> -/+ buffers/cache: 2335056 129670344 >>>>>> Swap: 8388604 0 8388604 >>>>>> >>>>>> The effect I am seeing is that despite a reasonably fast disc >>>>>> system, >>>>>> the kernel writing of dirty pages is painfully slow. >>>>> So disk I/O stats is also available via sar. May we see them? >>>>> Finally there's blktrace where you can follow the timing and >>>>> positions of each individual block being written, but that's not >>>>> quite easy to run and analyze (unless I missed the easy way). >>>>> >>>>> I suspect scattered writes that bring your I/O rate down. >>>>> >>>>> Regards, >>>>> Ulrich >>>>> >>>> sysstat (and sar) is not installed on this system, so I can't give you >>>> that output. >>> >>> As I said in my previous email - use iostat and/or vmstat to monitor >>> paging activity and system wait time. You can also install atop and >>> get all of the relevant stats on one screen. >>> >>> http://www.atoptool.nl/ >>> >>> I find it indispensible these days. >>> >> BTW, just using nmon whilst running a test script that writes 20K small >> objects shows a write speed of around 3-5 MB/s to the disc (without >> dbnosync, so as to see the immediate effects). This is on a SAS disc >> with a 140MB/s capability, checked using regular tools such a dd >> (writing 1000 files of 384K each). I'm failing to understand why the >> write operations from the kernel page cache is so drastically slower. >> With dbnosync enabled, lmdb writes to the pages fast (as expected), >> however the pdflush that follows writes at the same slow speed, causing >> delays for further processes. > > In normal (safe) operation, every transaction commit performs 2 > fsyncs. Your 140MB/s throughput spec isn't relevant here, your disk's > IOPS rate is what matters. You can use NOMETASYNC to do only 1 fsync > per commit. > OK. I ran a reduced version of test script (20 processes each performing 40 read/write operations) with normal (safe) mode of operation on a test server that has 32GB RAM, and everything else identical to the server with 128GB. A quick test using vmstat at 1s intervals gave the following output whilst it was running.
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 20 0 0 32011144 167764 330416 0 0 1 15 40 56 0 0 99 1 0 0 0 0 31914848 167764 330424 0 0 0 1560 2594 2130 2 1 97 0 0 0 0 0 31914336 167764 330424 0 0 0 1708 754 1277 0 0 100 0 0 0 0 0 31914508 167772 330420 0 0 0 2028 779 1300 0 0 99 1 0 0 0 0 31913956 167772 330428 0 0 0 1996 936 1354 0 0 100 0 0 0 0 0 31914272 167772 330428 0 0 0 1588 785 1286 0 0 100 0 0 0 0 0 31914356 167780 330424 0 0 0 1604 832 1388 0 0 99 1 0 0 0 0 31914572 167780 330428 0 0 0 1240 1258 1910 0 0 100 0 0 0 0 0 31914596 167780 330428 0 0 0 1776 905 1626 0 0 100 0 0 0 0 0 31914812 167780 330428 0 0 0 1340 1117 1731 0 0 100 0 0 0 0 0 31914796 167788 330420 0 0 0 1504 1154 1883 0 0 100 0 0 0 0 0 31914936 167788 330428 0 0 0 1476 835 1390 0 0 100 0 0 0 0 0 31914844 167796 330432 0 0 0 1140 1203 1891 0 0 100 0 0 0 0 0 31914744 167796 330432 0 0 0 1600 1034 1667 0 0 100 0 0 0 0 0 31914976 167796 330432 0 0 0 1488 1149 1715 0 0 100 0 0 2 0 0 31915052 167796 330432 0 0 0 1248 1249 1930 0 0 100 0 0 0 0 0 31914316 167796 330432 0 0 0 1736 1076 1625 0 0 99 0 0 0 0 0 31914680 167804 330424 0 0 0 1176 1103 1751 0 0 100 0 0 0 0 0 31915108 167804 330432 0 0 0 1476 1063 1776 0 0 100 0 0 0 0 0 31914796 167804 330432 0 0 0 1660 967 1594 0 0 100 0 0 0 0 0 31914820 167812 330424 0 0 0 1336 1358 2124 0 0 100 0 0 0 0 0 31914904 167812 330432 0 0 0 1608 857 1464 0 0 100 0 0 0 0 0 31914972 167828 330420 0 0 0 1316 1061 1741 0 0 99 1 0 0 0 0 31915140 167828 330428 0 0 0 1320 1103 1751 0 0 100 0 0 0 0 0 31915756 167828 330436 0 0 0 1604 1049 1544 0 0 99 0 0 0 0 0 31915708 167828 330436 0 0 0 1244 1358 2052 0 0 100 0 0 0 0 0 31916008 167828 330436 0 0 0 1620 912 1595 0 0 100 0 0 0 0 0 31915964 167836 330428 0 0 0 1492 1105 1715 0 0 100 0 0 0 0 0 31915916 167836 330436 0 0 0 1376 1164 1878 0 0 100 0 0 0 0 0 31915924 167836 330436 0 0 0 1484 848 1390 0 0 100 0 0 0 0 0 31916176 167844 330428 0 0 0 1368 1447 2123 0 0 99 0 0 0 0 0 31916864 167844 330444 0 0 0 1444 1355 1832 0 0 99 1 0 The script took about 60s to complete, which is a lot longer than expected. It appears almost all I/O bound, at a fairly slow rate (1500 blocks in a second is 6MB/s). I'll build atop or iostat to get more detail.
