On Thu, 2018-07-26 at 12:07 +1200, Ian Collins wrote: > On Thu, Jul 26, 2018 at 2:58 AM, Len Weincier <[email protected]> > wrote: > > Hi > > > > We a very strange situation trying to upgrade to a newer smartos > > image where the disk I/O is *very* slow. > > > > I have been working through the released images and the last one > > that works 100% is 20180329T002644Z > > > > From 20180412T003259Z onwards, the release with the new zfs > > features like spacemaps etc, the hosts become unusable in terms of > > disk i/o > > > > In our testing with the lab machine with only 128G ram we see no > > pathologies. > > > > Hosts are running ALL SSDs (RAIDZ2), and Intel Gold 6150 x2 > > processors on an SMC X11DPH-T board.. > > The lab machine with 128GB RAM has exactly the same processors, > > board, and SSD-only setup - except for RAM.. > > > > On a production machine with 768G ram and the newer image for eg > > zfs create -V 10G zones/test takes 2 minutes while at the same time > > iostat is showing the disks as relatively idle (%b = 10) > > > > For example inside an ubuntu kvm with postgres we are seeing 40% > > wait time for any disk i/o and there are only 2 vm's running, > > underlying disks essentially idle. > > > > Is there anything we can look at to get to the bottom of this as it > > pretty critical and affecting our customers > > > > > > Hello Len, > > I just tried your volume create test on a machine with Gold 6154 CPUs > and 768G of RAM running 20180629T124501Z and it was quick: > > # time zfs create -V 10G zones/test > > real 0m0.155s > user 0m0.003s > sys 0m0.005s > > Is there anything unusual in your pool configuration? I have a > stripe of 5 SAS drive mirrors and a couple of Toshiba PX05S SAS SSD > logs. > > Cheers, > Ian.
Hi Ian The pools are 8 SSD's in a raidz2 pool with NVMe slog. The issue seems to be related to the amount of mem and shows up when we start to create load on the machine, i.e. we create a 2 of 64G vms with 500G disks and start running pgbench inside them, then we create a bunch (100+) smaller vms to simulate production workloads. See (2) for example output. Initially the disk IO is quick, it looks great and normal. Once the system starts to get loaded though it gets very slow in terms of IO. This does not happen on the smaller 128G lab machines. It looks like there was a commit to do the slab selection in parallel (1) on large mem machines that I assume might be related. Regards Len (1) https://github.com/joyent/illumos-joyent/commit/f78cdc34af236a6199dd9e21376f4a46348c0d56 (2) output from pgbench - tps numbers drop to zero progress: 396.0 s, 13692.9 tps, lat 7.306 ms stddev 4.350 progress: 397.0 s, 7627.5 tps, lat 12.356 ms stddev 14.295 progress: 398.0 s, 2211.0 tps, lat 44.647 ms stddev 32.982 progress: 399.0 s, 436.0 tps, lat 228.927 ms stddev 55.447 progress: 400.0 s, 363.0 tps, lat 257.176 ms stddev 81.445 progress: 401.0 s, 237.0 tps, lat 394.185 ms stddev 20.987 progress: 402.0 s, 2068.5 tps, lat 56.925 ms stddev 109.824 progress: 403.0 s, 190.0 tps, lat 473.026 ms stddev 158.184 progress: 404.0 s, 110.0 tps, lat 676.603 ms stddev 11.804 progress: 405.0 s, 100.0 tps, lat 735.862 ms stddev 9.088 progress: 406.0 s, 100.0 tps, lat 842.551 ms stddev 142.035 progress: 407.0 s, 100.0 tps, lat 1417.929 ms stddev 158.737 progress: 408.0 s, 90.0 tps, lat 1015.713 ms stddev 0.980 progress: 409.0 s, 10.0 tps, lat 1217.544 ms stddev 0.500 progress: 410.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 411.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 412.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 413.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 414.0 s, 90.0 tps, lat 5929.350 ms stddev 1.076 progress: 415.0 s, 10.0 tps, lat 6232.817 ms stddev 0.525 progress: 416.0 s, 90.0 tps, lat 1720.543 ms stddev 1.224 progress: 417.0 s, 10.0 tps, lat 1812.332 ms stddev 0.663 progress: 418.0 s, 200.1 tps, lat 1154.220 ms stddev 640.452 progress: 419.0 s, 190.0 tps, lat 570.999 ms stddev 13.684 progress: 420.0 s, 110.0 tps, lat 618.339 ms stddev 7.270 progress: 421.0 s, 190.1 tps, lat 672.076 ms stddev 20.367 progress: 422.0 s, 100.0 tps, lat 900.233 ms stddev 62.150 progress: 423.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 424.0 s, 10.0 tps, lat 2113.682 ms stddev 0.384 progress: 425.0 s, 90.1 tps, lat 3223.316 ms stddev 0.727 progress: 426.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 427.0 s, 10.0 tps, lat 3480.481 ms stddev 0.357 progress: 428.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 429.0 s, 90.1 tps, lat 3785.302 ms stddev 1.069 progress: 430.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 431.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 432.0 s, 5671.4 tps, lat 62.444 ms stddev 387.511 progress: 433.0 s, 15212.9 tps, lat 6.617 ms stddev 4.339 progress: 434.0 s, 15435.2 tps, lat 6.460 ms stddev 4.291 progress: 435.0 s, 14487.3 tps, lat 6.917 ms stddev 3.869 ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125 Powered by Listbox: https://www.listbox.com
