On Thu, 2018-07-26 at 12:07 +1200, Ian Collins wrote:
> On Thu, Jul 26, 2018 at 2:58 AM, Len Weincier <[email protected]>
> wrote:
> > Hi 
> > 
> > We a very strange situation trying to upgrade to a newer smartos
> > image where the disk I/O is *very* slow.
> > 
> > I have been working through the released images and the last one
> > that works 100% is 20180329T002644Z
> > 
> > From 20180412T003259Z onwards, the release with the new zfs
> > features like spacemaps etc, the hosts become unusable in terms of
> > disk i/o
> > 
> > In our testing with the lab machine with only 128G ram we see no
> > pathologies.
> > 
> > Hosts are running ALL SSDs (RAIDZ2), and Intel Gold 6150 x2
> > processors on an SMC X11DPH-T board.. 
> > The lab machine with 128GB RAM has exactly the same processors,
> > board, and SSD-only setup - except for RAM..
> > 
> > On a production machine with 768G ram and the newer image for eg
> > zfs create -V 10G zones/test takes 2 minutes while at the same time
> > iostat is showing the disks as relatively idle (%b = 10) 
> > 
> > For example inside an ubuntu kvm with postgres we are seeing 40%
> > wait time for any disk i/o and there are only 2 vm's running,
> > underlying disks essentially idle.
> > 
> > Is there anything we can look at to get to the bottom of this as it
> > pretty critical and affecting our customers 
> > 
> > 
> 
> Hello Len,
> 
> I just tried your volume create test on a machine with Gold 6154 CPUs
> and 768G of RAM running 20180629T124501Z and it was quick:
> 
> # time zfs create -V 10G zones/test
> 
> real    0m0.155s
> user    0m0.003s
> sys     0m0.005s
> 
> Is there anything unusual in your pool configuration?  I have a
> stripe of 5 SAS drive mirrors and a couple of Toshiba PX05S SAS SSD
> logs.
> 
> Cheers,
> Ian.

Hi Ian

The pools are 8 SSD's in a raidz2 pool with NVMe slog.

The issue seems to be related to the amount of mem and shows up when we
start to create load on the machine, i.e. we create a 2 of 64G vms with
500G disks and start running pgbench inside them, then we create a
bunch (100+) smaller vms to simulate production workloads. See (2) for
example output.

Initially the disk IO is quick, it looks great and normal. Once the
system starts to get loaded though it gets very slow in terms of IO.
This does not happen on the smaller 128G lab machines. It looks like
there was a commit to do the slab selection in parallel (1) on large
mem machines that I assume might be related.

Regards
Len

(1)  
https://github.com/joyent/illumos-joyent/commit/f78cdc34af236a6199dd9e21376f4a46348c0d56

(2) output from pgbench - tps numbers drop to zero 

progress: 396.0 s, 13692.9 tps, lat 7.306 ms stddev 4.350
progress: 397.0 s, 7627.5 tps, lat 12.356 ms stddev 14.295
progress: 398.0 s, 2211.0 tps, lat 44.647 ms stddev 32.982
progress: 399.0 s, 436.0 tps, lat 228.927 ms stddev 55.447
progress: 400.0 s, 363.0 tps, lat 257.176 ms stddev 81.445
progress: 401.0 s, 237.0 tps, lat 394.185 ms stddev 20.987
progress: 402.0 s, 2068.5 tps, lat 56.925 ms stddev 109.824
progress: 403.0 s, 190.0 tps, lat 473.026 ms stddev 158.184
progress: 404.0 s, 110.0 tps, lat 676.603 ms stddev 11.804
progress: 405.0 s, 100.0 tps, lat 735.862 ms stddev 9.088
progress: 406.0 s, 100.0 tps, lat 842.551 ms stddev 142.035
progress: 407.0 s, 100.0 tps, lat 1417.929 ms stddev 158.737
progress: 408.0 s, 90.0 tps, lat 1015.713 ms stddev 0.980
progress: 409.0 s, 10.0 tps, lat 1217.544 ms stddev 0.500
progress: 410.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 411.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 412.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 413.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 414.0 s, 90.0 tps, lat 5929.350 ms stddev 1.076
progress: 415.0 s, 10.0 tps, lat 6232.817 ms stddev 0.525
progress: 416.0 s, 90.0 tps, lat 1720.543 ms stddev 1.224
progress: 417.0 s, 10.0 tps, lat 1812.332 ms stddev 0.663
progress: 418.0 s, 200.1 tps, lat 1154.220 ms stddev 640.452
progress: 419.0 s, 190.0 tps, lat 570.999 ms stddev 13.684
progress: 420.0 s, 110.0 tps, lat 618.339 ms stddev 7.270
progress: 421.0 s, 190.1 tps, lat 672.076 ms stddev 20.367
progress: 422.0 s, 100.0 tps, lat 900.233 ms stddev 62.150
progress: 423.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 424.0 s, 10.0 tps, lat 2113.682 ms stddev 0.384
progress: 425.0 s, 90.1 tps, lat 3223.316 ms stddev 0.727
progress: 426.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 427.0 s, 10.0 tps, lat 3480.481 ms stddev 0.357
progress: 428.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 429.0 s, 90.1 tps, lat 3785.302 ms stddev 1.069
progress: 430.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 431.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 432.0 s, 5671.4 tps, lat 62.444 ms stddev 387.511
progress: 433.0 s, 15212.9 tps, lat 6.617 ms stddev 4.339
progress: 434.0 s, 15435.2 tps, lat 6.460 ms stddev 4.291
progress: 435.0 s, 14487.3 tps, lat 6.917 ms stddev 3.869



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com

Reply via email to