Dan Kenigsberg has submitted this change and it was merged.
Change subject: nfsSD: Remove unneeded and expensive mount check
......................................................................
nfsSD: Remove unneeded and expensive mount check
NFS Storage domain used to check if the domain is mounted in the selftest
method. This check dominates the profiles on a system with many NFS storage
domains. This check happens during the monitoring flow, before we run dd to
read a file from the remote server. If the domain is not mounted, dd will fail
to read the file. This patch remove the unneeded and expensive check.
I tested this patch on RHEL 6.5 system connected to 45 NFS storage domains. The
test included starting the test host as spm, waiting 4 minutes, then creating a
template from a vm with one system disk and one empty disk. After 10 minutes,
puting both hosts to maintenance. During this test, vdsm cpu usage has huge
spikes (5-300%), and while the storage operation is running, it consume about
200% constantly. Even worse, when trying to put the host to maintenance, one
host responded only after about 7 tries - it looks that vdsm does not behave
well under such load.
In the profile, we can see that the mount checks (isMounted) dominates
the profile:
Sun May 25 17:48:03 2014 nfs-selftest-spm.prof
10478858 function calls (10548072 primitive calls) in 619.277 CPU
seconds
Ordered by: cumulative time
List reduced from 1985 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
790/1958 0.000 0.000 495.314 0.253
domainMonitor.py:186(DomainMonitorThread._monitorDomain)
4308/4452 182.416 0.042 451.408 0.101 threading.py:481(Thread.run)
38/45 0.000 0.000 437.224 9.716
domainMonitor.py:163(DomainMonitorThread._monitorLoop)
36/49 0.000 0.000 389.649 7.952 utils.py:750(wrapper)
2077/2135 1.971 0.001 384.503 0.180
mount.py:148(getMountFromTarget)
2012/2100 0.120 0.000 383.153 0.182 mount.py:137(isMounted)
105751/109301 0.000 0.000 380.141 0.003
mount.py:121(_iterMountRecords)
1836/1958 0.264 0.000 333.065 0.170
nfsSD.py:105(NfsStorageDomain.selftest)
466934/466985 70.404 0.000 330.915 0.001 mount.py:42(_parseFstabLine)
106768/107120 20.775 0.000 286.090 0.003
mount.py:78(_resolveLoopDevice)
365495/366408 12.537 0.000 260.821 0.001 mount.py:64(_iterateMtab)
10669/12758 0.556 0.000 199.831 0.016
sdc.py:51(DomainProxy.getRealDomain)
11452/12758 0.282 0.000 169.311 0.013
sdc.py:101(StorageDomainCache._realProduce)
241/242 0.000 0.000 158.204 0.654
sdc.py:93(StorageDomainCache.produce)
941546/941554 66.137 0.000 128.888 0.000 posixpath.py:311(normpath)
11924/12515 0.406 0.000 116.818 0.009
sdc.py:48(DomainProxy.__getattr__)
136 0.056 0.000 112.311 0.826
sdc.py:133(StorageDomainCache._findDomain)
136 0.042 0.000 110.590 0.813 nfsSD.py:131(findDomain)
136 0.248 0.002 101.375 0.745 nfsSD.py:115(findDomainPath)
4279715/4279780 25.588 0.000 91.276 0.000 mount.py:70(_parseFstabPath)
With this patch, cpu usage still has spikes, but typically at (3-40%). During
the create template operation, there were about 10-20 secodns where vdsm
consumed about 180% cpu.
Sun May 25 18:20:52 2014 nfs-selftest-spm-2.prof
5106574 function calls (5188395 primitive calls) in 302.323 CPU
seconds
Ordered by: cumulative time
List reduced from 2002 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
4339/4465 59.113 0.014 236.219 0.053 threading.py:481(Thread.run)
1133/2550 0.342 0.000 222.124 0.087
domainMonitor.py:186(DomainMonitorThread._monitorDomain)
41/45 0.000 0.000 213.524 4.745
domainMonitor.py:163(DomainMonitorThread._monitorLoop)
41/49 0.000 0.000 204.217 4.168 utils.py:750(wrapper)
14210/16433 0.000 0.000 133.085 0.008
sdc.py:51(DomainProxy.getRealDomain)
15206/16433 1.243 0.000 127.866 0.008
sdc.py:101(StorageDomainCache._realProduce)
129/136 0.062 0.000 122.497 0.901
sdc.py:133(StorageDomainCache._findDomain)
135/136 0.000 0.000 114.658 0.843 nfsSD.py:121(findDomain)
135/136 0.204 0.002 103.037 0.758 nfsSD.py:105(findDomainPath)
14778/16187 0.530 0.000 99.758 0.006
sdc.py:48(DomainProxy.__getattr__)
2550 0.214 0.000 88.393 0.035
fileSD.py:258(NfsStorageDomain.getReadDelay)
2550 1.529 0.001 88.155 0.035 misc.py:198(readspeed)
2662 0.823 0.000 85.840 0.032 utils.py:646(execCmd)
141/142 0.029 0.000 73.539 0.518 mount.py:137(isMounted)
195 0.353 0.002 73.002 0.374
mount.py:148(getMountFromTarget)
13977/14823 0.000 0.000 72.712 0.005
mount.py:121(_iterMountRecords)
244/245 0.000 0.000 65.844 0.269
sdc.py:93(StorageDomainCache.produce)
54958/55112 12.500 0.000 64.269 0.001 mount.py:42(_parseFstabLine)
14291/14537 3.456 0.000 54.202 0.004
mount.py:78(_resolveLoopDevice)
10801 3.400 0.000 51.056 0.005
remoteFileHandler.py:286(RemoteFileHandlerPool.callCrabRPCFunction)
Comparing both profile we can see that:
- cpu seconds dropped from 619 to 302 (-51%)
- Number of functions calls dropped from 10478858 to 5106574 (-51%)
- Time spent in isMounted dropped from 383 to 73 seconds (-80%)
We should investigate later why isMounted is called so many times, and consider
optimizing this code. We should also investigate the create template flow, to
understand why it cause a such a big spike in cpu usage.
The system behavior experienced while testing this patch, is similar to what
bug 1081962 describes.
Change-Id: Ie8c617c3e9b9e24b5404a197427551408f659549
Bug-Url: https://bugzilla.redhat.com/1099856
Relates-to: https://bugzilla.redhat.com/1081962
Signed-off-by: Nir Soffer <[email protected]>
Reviewed-on: http://gerrit.ovirt.org/28089
Reviewed-by: Dan Kenigsberg <[email protected]>
Reviewed-by: Michal Skrivanek <[email protected]>
Reviewed-by: Federico Simoncelli <[email protected]>
---
M vdsm/storage/nfsSD.py
1 file changed, 0 insertions(+), 10 deletions(-)
Approvals:
Nir Soffer: Verified
Federico Simoncelli: Looks good to me, approved
Dan Kenigsberg: Looks good to me, but someone else must approve
Michal Skrivanek: Looks good to me, but someone else must approve
--
To view, visit http://gerrit.ovirt.org/28089
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie8c617c3e9b9e24b5404a197427551408f659549
Gerrit-PatchSet: 3
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
Gerrit-Reviewer: Dan Kenigsberg <[email protected]>
Gerrit-Reviewer: Federico Simoncelli <[email protected]>
Gerrit-Reviewer: Michal Skrivanek <[email protected]>
Gerrit-Reviewer: Nir Soffer <[email protected]>
Gerrit-Reviewer: Saggi Mizrahi <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-Reviewer: oVirt Jenkins CI Server
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches