Dan Kenigsberg has submitted this change and it was merged.

Change subject: nfsSD: Remove unneeded and expensive mount check
......................................................................


nfsSD: Remove unneeded and expensive mount check

NFS Storage domain used to check if the domain is mounted in the selftest
method. This check dominates the profiles on a system with many NFS storage
domains. This check happens during the monitoring flow, before we run dd to
read a file from the remote server. If the domain is not mounted, dd will fail
to read the file. This patch remove the unneeded and expensive check.

I tested this patch on RHEL 6.5 system connected to 45 NFS storage domains. The
test included starting the test host as spm, waiting 4 minutes, then creating a
template from a vm with one system disk and one empty disk. After 10 minutes,
puting both hosts to maintenance. During this test, vdsm cpu usage has huge
spikes (5-300%), and while the storage operation is running, it consume about
200% constantly. Even worse, when trying to put the host to maintenance, one
host responded only after about 7 tries - it looks that vdsm does not behave
well under such load.

In the profile, we can see that the mount checks (isMounted) dominates
the profile:

    Sun May 25 17:48:03 2014    nfs-selftest-spm.prof

             10478858 function calls (10548072 primitive calls) in 619.277 CPU 
seconds

       Ordered by: cumulative time
       List reduced from 1985 to 20 due to restriction <20>

         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       790/1958    0.000    0.000  495.314    0.253 
domainMonitor.py:186(DomainMonitorThread._monitorDomain)
      4308/4452  182.416    0.042  451.408    0.101 threading.py:481(Thread.run)
          38/45    0.000    0.000  437.224    9.716 
domainMonitor.py:163(DomainMonitorThread._monitorLoop)
          36/49    0.000    0.000  389.649    7.952 utils.py:750(wrapper)
      2077/2135    1.971    0.001  384.503    0.180 
mount.py:148(getMountFromTarget)
      2012/2100    0.120    0.000  383.153    0.182 mount.py:137(isMounted)
  105751/109301    0.000    0.000  380.141    0.003 
mount.py:121(_iterMountRecords)
      1836/1958    0.264    0.000  333.065    0.170 
nfsSD.py:105(NfsStorageDomain.selftest)
  466934/466985   70.404    0.000  330.915    0.001 mount.py:42(_parseFstabLine)
  106768/107120   20.775    0.000  286.090    0.003 
mount.py:78(_resolveLoopDevice)
  365495/366408   12.537    0.000  260.821    0.001 mount.py:64(_iterateMtab)
    10669/12758    0.556    0.000  199.831    0.016 
sdc.py:51(DomainProxy.getRealDomain)
    11452/12758    0.282    0.000  169.311    0.013 
sdc.py:101(StorageDomainCache._realProduce)
        241/242    0.000    0.000  158.204    0.654 
sdc.py:93(StorageDomainCache.produce)
  941546/941554   66.137    0.000  128.888    0.000 posixpath.py:311(normpath)
    11924/12515    0.406    0.000  116.818    0.009 
sdc.py:48(DomainProxy.__getattr__)
            136    0.056    0.000  112.311    0.826 
sdc.py:133(StorageDomainCache._findDomain)
            136    0.042    0.000  110.590    0.813 nfsSD.py:131(findDomain)
            136    0.248    0.002  101.375    0.745 nfsSD.py:115(findDomainPath)
4279715/4279780   25.588    0.000   91.276    0.000 mount.py:70(_parseFstabPath)

With this patch, cpu usage still has spikes, but typically at (3-40%). During
the create template operation, there were about 10-20 secodns where vdsm
consumed about 180% cpu.

    Sun May 25 18:20:52 2014    nfs-selftest-spm-2.prof

             5106574 function calls (5188395 primitive calls) in 302.323 CPU 
seconds

       Ordered by: cumulative time
       List reduced from 2002 to 20 due to restriction <20>

         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4339/4465   59.113    0.014  236.219    0.053 threading.py:481(Thread.run)
      1133/2550    0.342    0.000  222.124    0.087 
domainMonitor.py:186(DomainMonitorThread._monitorDomain)
          41/45    0.000    0.000  213.524    4.745 
domainMonitor.py:163(DomainMonitorThread._monitorLoop)
          41/49    0.000    0.000  204.217    4.168 utils.py:750(wrapper)
    14210/16433    0.000    0.000  133.085    0.008 
sdc.py:51(DomainProxy.getRealDomain)
    15206/16433    1.243    0.000  127.866    0.008 
sdc.py:101(StorageDomainCache._realProduce)
        129/136    0.062    0.000  122.497    0.901 
sdc.py:133(StorageDomainCache._findDomain)
        135/136    0.000    0.000  114.658    0.843 nfsSD.py:121(findDomain)
        135/136    0.204    0.002  103.037    0.758 nfsSD.py:105(findDomainPath)
    14778/16187    0.530    0.000   99.758    0.006 
sdc.py:48(DomainProxy.__getattr__)
           2550    0.214    0.000   88.393    0.035 
fileSD.py:258(NfsStorageDomain.getReadDelay)
           2550    1.529    0.001   88.155    0.035 misc.py:198(readspeed)
           2662    0.823    0.000   85.840    0.032 utils.py:646(execCmd)
        141/142    0.029    0.000   73.539    0.518 mount.py:137(isMounted)
            195    0.353    0.002   73.002    0.374 
mount.py:148(getMountFromTarget)
    13977/14823    0.000    0.000   72.712    0.005 
mount.py:121(_iterMountRecords)
        244/245    0.000    0.000   65.844    0.269 
sdc.py:93(StorageDomainCache.produce)
    54958/55112   12.500    0.000   64.269    0.001 mount.py:42(_parseFstabLine)
    14291/14537    3.456    0.000   54.202    0.004 
mount.py:78(_resolveLoopDevice)
          10801    3.400    0.000   51.056    0.005 
remoteFileHandler.py:286(RemoteFileHandlerPool.callCrabRPCFunction)

Comparing both profile we can see that:

- cpu seconds dropped from 619 to 302 (-51%)
- Number of functions calls dropped from 10478858 to 5106574 (-51%)
- Time spent in isMounted dropped from 383 to 73 seconds (-80%)

We should investigate later why isMounted is called so many times, and consider
optimizing this code. We should also investigate the create template flow, to
understand why it cause a such a big spike in cpu usage.

The system behavior experienced while testing this patch, is similar to what
bug 1081962 describes.

Change-Id: Ie8c617c3e9b9e24b5404a197427551408f659549
Bug-Url: https://bugzilla.redhat.com/1099856
Relates-to: https://bugzilla.redhat.com/1081962
Signed-off-by: Nir Soffer <[email protected]>
Reviewed-on: http://gerrit.ovirt.org/28089
Reviewed-by: Dan Kenigsberg <[email protected]>
Reviewed-by: Michal Skrivanek <[email protected]>
Reviewed-by: Federico Simoncelli <[email protected]>
---
M vdsm/storage/nfsSD.py
1 file changed, 0 insertions(+), 10 deletions(-)

Approvals:
  Nir Soffer: Verified
  Federico Simoncelli: Looks good to me, approved
  Dan Kenigsberg: Looks good to me, but someone else must approve
  Michal Skrivanek: Looks good to me, but someone else must approve



-- 
To view, visit http://gerrit.ovirt.org/28089
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie8c617c3e9b9e24b5404a197427551408f659549
Gerrit-PatchSet: 3
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <[email protected]>
Gerrit-Reviewer: Allon Mureinik <[email protected]>
Gerrit-Reviewer: Dan Kenigsberg <[email protected]>
Gerrit-Reviewer: Federico Simoncelli <[email protected]>
Gerrit-Reviewer: Michal Skrivanek <[email protected]>
Gerrit-Reviewer: Nir Soffer <[email protected]>
Gerrit-Reviewer: Saggi Mizrahi <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-Reviewer: oVirt Jenkins CI Server
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Reply via email to