> On 26 Jul 2018, at 20:43, Mike Gerdts <mike.ger...@joyent.com> wrote:
> 
> On Thu, Jul 26, 2018 at 5:02 AM, Len Weincier <l...@cloudafrica.net 
> <mailto:l...@cloudafrica.net>> wrote:
> On Wed, 2018-07-25 at 16:58 +0200, Len Weincier wrote:
>> Hi
>> 
>> We a very strange situation trying to upgrade to a newer smartos image where 
>> the disk I/O is *very* slow.
>> 
>> I have been working through the released images and the last one that works 
>> 100% is 20180329T002644Z
>> 
>> From 20180412T003259Z onwards, the release with the new zfs features like 
>> spacemaps etc, the hosts become unusable in terms of disk i/o
>> 
>> In our testing with the lab machine with only 128G ram we see no pathologies.
>> 
>> Hosts are running ALL SSDs (RAIDZ2), and Intel Gold 6150 x2 processors on an 
>> SMC X11DPH-T board.. 
>> The lab machine with 128GB RAM has exactly the same processors, board, and 
>> SSD-only setup - except for RAM..
>> 
>> On a production machine with 768G ram and the newer image for eg zfs create 
>> -V 10G zones/test takes 2 minutes while at the same time iostat is showing 
>> the disks as relatively idle (%b = 10)
>> 
>> For example inside an ubuntu kvm with postgres we are seeing 40% wait time 
>> for any disk i/o and there are only 2 vm's running, underlying disks 
>> essentially idle.
>> 
>> Is there anything we can look at to get to the bottom of this as it pretty 
>> critical and affecting our customers
> 
> 
> Hi
> 
> I have managed to grab a bunch of stack traces from a dtrace script on the 
> fbt:zfs::entry events and generated a flamegraph while the system was 
> behaving badly
> 
> https://static.prod.cloudafrica.net/out.svg 
> <https://static.prod.cloudafrica.net/out.svg>
> 
> This show a bunch of activity in the metaslab allocation if I read it 
> correctly ?
> 
> Any ideas or anything I can look at please let me know.
> 
> I have confirmed that this only occurs when the system is under i/o load.
> 
> 
> I've created a platform image based on 20180329T002644Z with this change that 
> you mentioned removed.
> 
> commit f78cdc34af236a6199dd9e21376f4a46348c0d56
> Author: Paul Dagnelie <p...@delphix.com <mailto:p...@delphix.com>>
> Date:   Mon Feb 12 12:56:06 2018 -0800
> 
>     9112 Improve allocation performance on high-end systems
>     Reviewed by: Matthew Ahrens <mahr...@delphix.com 
> <mailto:mahr...@delphix.com>>
>     Reviewed by: George Wilson <george.wil...@delphix.com 
> <mailto:george.wil...@delphix.com>>
>     Reviewed by: Serapheim Dimitropoulos <serapheim.dimi...@delphix.com 
> <mailto:serapheim.dimi...@delphix.com>>
>     Reviewed by: Alexander Motin <m...@freebsd.org>
>     Approved by: Gordon Ross <g...@nexenta.com <mailto:g...@nexenta.com>>
> 
> 
> My testing has involved booting the iso under vmware and verifying that it 
> could import an existing single disk pool and run the VMs on it.
> 
> Can you give this PI a try?  As a reminder, my testing has been quite 
> superficial.  I hope it won't eat your data, but can offer no guarantees.
> 
> https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.tgz
>  
> <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.tgz>
> https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.iso
>  
> <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.iso>
> https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.usb.bz2
>  
> <https://us-east.manta.joyent.com/mgerdts/public/pi/len/platform-20180726T160921Z.usb.bz2>
> 
> Regards,
> Mike

Hi Mike 

Do you mean its the latest image from master with that commit removed ? afaics 
that commit cam just after 20180329 ?

I am away until monday and can give it a test then.

Thanks
Len




-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com

Reply via email to