> On 26 Jul 2018, at 20:43, Mike Gerdts <mike.ger...@joyent.com> wrote: > > On Thu, Jul 26, 2018 at 5:02 AM, Len Weincier <l...@cloudafrica.net > <mailto:l...@cloudafrica.net>> wrote: > On Wed, 2018-07-25 at 16:58 +0200, Len Weincier wrote: >> Hi >> >> We a very strange situation trying to upgrade to a newer smartos image where >> the disk I/O is *very* slow. >> >> I have been working through the released images and the last one that works >> 100% is 20180329T002644Z >> >> From 20180412T003259Z onwards, the release with the new zfs features like >> spacemaps etc, the hosts become unusable in terms of disk i/o >> >> In our testing with the lab machine with only 128G ram we see no pathologies. >> >> Hosts are running ALL SSDs (RAIDZ2), and Intel Gold 6150 x2 processors on an >> SMC X11DPH-T board.. >> The lab machine with 128GB RAM has exactly the same processors, board, and >> SSD-only setup - except for RAM.. >> >> On a production machine with 768G ram and the newer image for eg zfs create >> -V 10G zones/test takes 2 minutes while at the same time iostat is showing >> the disks as relatively idle (%b = 10) >> >> For example inside an ubuntu kvm with postgres we are seeing 40% wait time >> for any disk i/o and there are only 2 vm's running, underlying disks >> essentially idle. >> >> Is there anything we can look at to get to the bottom of this as it pretty >> critical and affecting our customers > > > Hi > > I have managed to grab a bunch of stack traces from a dtrace script on the > fbt:zfs::entry events and generated a flamegraph while the system was > behaving badly > > https://static.prod.cloudafrica.net/out.svg > <https://static.prod.cloudafrica.net/out.svg> > > This show a bunch of activity in the metaslab allocation if I read it > correctly ? > > Any ideas or anything I can look at please let me know. > > I have confirmed that this only occurs when the system is under i/o load. > > > I've created a platform image based on 20180329T002644Z with this change that > you mentioned removed. > > commit f78cdc34af236a6199dd9e21376f4a46348c0d56 > Author: Paul Dagnelie <p...@delphix.com <mailto:p...@delphix.com>> > Date: Mon Feb 12 12:56:06 2018 -0800 > > 9112 Improve allocation performance on high-end systems > Reviewed by: Matthew Ahrens <mahr...@delphix.com > <mailto:mahr...@delphix.com>> > Reviewed by: George Wilson <george.wil...@delphix.com > <mailto:george.wil...@delphix.com>> > Reviewed by: Serapheim Dimitropoulos <serapheim.dimi...@delphix.com > <mailto:serapheim.dimi...@delphix.com>> > Reviewed by: Alexander Motin <m...@freebsd.org> > Approved by: Gordon Ross <g...@nexenta.com <mailto:g...@nexenta.com>> > > > My testing has involved booting the iso under vmware and verifying that it > could import an existing single disk pool and run the VMs on it. > > Can you give this PI a try? As a reminder, my testing has been quite > superficial. I hope it won't eat your data, but can offer no guarantees. > > https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.tgz > > <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.tgz> > https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.iso > > <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.iso> > https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.usb.bz2 > > <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.usb.bz2> > > Regards, > Mike
Hi Mike Do you mean its the latest image from master with that commit removed ? afaics that commit cam just after 20180329 ? I am away until monday and can give it a test then. Thanks Len ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125 Powered by Listbox: https://www.listbox.com