Hi Thomas,

FWIW I ran your example code below on my VM host (CentOS 7.3, Intel(R) Xeon(R) CPU E3-1245 v6 @ 3.70GHz) and in a Linux VM (Debian 9).

n = 6000    Host = 3.8 secs    VM = 145 secs    ~40 times slower
n = 1000    Host = 0.03 secs    VM = 0.6 secs    ~20 times slower

So based on these timings, your 50% penalty in the VM sounds really good :-).

Now, the example maxes out all available cores on the host, but sticks to a single core in the VM. I don't know the reason for that, but perhaps differing build options for numpy on CentOS <> Debian ? That explains roughly a factor 8 for me (CPU has 4 cores, 8 threads). Still, after correcting for active core count, the VM will end up taking 2 - 5 times as long as the host.

For other workloads I generally don't see such a dramatic difference; more like 10-30% slower performance in VMs compared to native. Seems like you have hit a particular VM weak spot with your workload.

If container deployment is an alternative option instead of VM, perhaps that would improve matters ? Of course, that won't help you if you need to deploy on Windows.

Cheers
-- Jan

On 2020-01-24 09:30, Thomas Strunz wrote:
Hi Maciek,

yeah I thought that this could be the issue as well but according to the tools (grep flags /proc/cpuinfo | uniq) or coreinfo on windows the VMs also support sse4.2 (and lower) and AVX.

In fact I seem to have to look further as I noticed that in general python performance (and possible more, not tested) is much slower on the VMs. See below code which is actually a way to see performance impact of vector extension and especially of intel mkl.

import numpy as np
import time

n = 20000
A = np.random.randn(n,n).astype('float64')
B = np.random.randn(n,n).astype('float64')

start_time = time.time()
nrm = np.linalg.norm(A@B)
print(" took {} seconds ".format(time.time() - start_time))
print(" norm = ",nrm)


Last code fragment runs about 50% slower on the Windows VM compared to my laptop accounting for clock and core count differences. It's confusing to me as the performance difference is so consitent and apparent but I would assume if this was normal people would have noticed a long time ago?Yet I can't find anything about it. Or does everyone run their code native?

Best Regards,

Thomas

------------------------------------------------------------------------
*Von:* Maciek Wójcikowski <mac...@wojcikowski.pl>
*Gesendet:* Donnerstag, 23. Januar 2020 11:04
*An:* Thomas Strunz <beginn...@hotmail.de>
*Cc:* Greg Landrum <greg.land...@gmail.com>; rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net> *Betreff:* Re: [Rdkit-discuss] Observations about RDKit performance: PatternFingerprinter, Windows, Linux and Virtual machines
Thomas,

Could you double check if your VM has the same set of instructions as your host? For hardware popcounts, which are used to accelerate fingerprint operations, they might have profound impact on performance. SSE4.2 is probably the one that is used in the RDKit (at least this is stated in the code).

For KVM https://www.linux-kvm.org/page/Tuning_KVM (there are linux commands to check what is available on guest, so might be helpful for you too). It also seems that in VMWare world this might be tricky, as it is considered to be a stability hazard: https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.vcenterhost.doc_50%2FGUID-8B226625-4923-410C-B7AF-51BCD2806A3B.html

Best,
Maciek

----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl <mailto:mac...@wojcikowski.pl>


czw., 23 sty 2020 o 08:15 Thomas Strunz <beginn...@hotmail.de <mailto:beginn...@hotmail.de>> napisał(a):

    Hi Greg,

    reopening this old question. I can see that there are potential
    differences between rdkit version and especially Linux and Windows
    but let's lieave that aside for now.

    After further "playing around" however I really have the
    impression there is a real issue with running rdkit (or python?)
    in a virtualized operating sytem. Since most production software
    and/or when using the cloud will mostly run in a virtualized
    operating system, I think this should be a fairly relevant topic
    worth investigation. As you showed yourself, the AWS System also
    was fairly slow.

    For following observations I'm keeping the same datasets as before
    which is from your blog post (
    /Regress/Scripts/fingerprint_screenout.py). basically it's that
    code slightly adapted:

    mols = []
    with gzip.open(data_dir + 'chembl21_25K.pairs.txt.gz', 'rb') as inf:
        for line in inf:
            line = line.decode().strip().split()
            smi1 = line[1]
            smi2 = line[3]
            m1 = Chem.MolFromSmiles(smi1)
            m2 = Chem.MolFromSmiles(smi2)
            mols.append(m1)
            mols.append(m2)

    frags = [Chem.MolFromSmiles(x.split()[0]) for x in open(data_dir +
    'zinc.frags.500.q.smi', 'r')]

    mfps = [Chem.PatternFingerprint(m, 512) for m in mols]
    fragsfps = [Chem.PatternFingerprint(m, 512) for m in frags]

    %%timeit -n1 -r1
    for i, fragfp in enumerate(fragsfps):
        hits = 0
        for j, mfp in enumerate(mfps):
            if DataStructs.AllProbeBitsMatch(fragfp, mfp):
                if mols[j].HasSubstructMatch(frags[i]):
                    hits = hits + 1


    I want to focus on the last cell and namley the
    "AllProbeBitsMatch" method:

    %%timeit
    DataStructs.AllProbeBitsMatch(fragsfps[10], mfps[10])

    Results:

    Windows 10 native i7-8850H:                               567 ns ±
    5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    Lubuntu 16.04 virtualized i7-8850H: 1.81 µs ± 56.7 ns per loop
    (mean ± std. dev. of 7 runs, 1000000 loops each) // the high
    variation is consistent
    Windows Server 2012 R2 virtualized Xeon E5-2620 v4:    1.18 µs ±
    4.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    So it seems virtualization seems to reduce  the performance of
    this specific method by half which is also what I see by running
    the full substructure search code which takes double the time on
    the virtualized machines. (The windows server actually runs on ESX
    (eg type 1 hypervisor) while the Lubuntu VM is a type 2 (Vmware
    workstation) but both seem to suffer the same.).

    we can try same thing with

    %%timeit
    mols[10].HasSubstructMatch(frags[10])

    The difference here is smaller but VMs also take >50% more time.

    So there seems to be a consistent large performance impact in VMs.

    Of course the VM will be a bit slower but not by that much? What
    am I missing? Other experiences?

    Best Regards,

    Thomas
    ------------------------------------------------------------------------
    *Von:* Greg Landrum <greg.land...@gmail.com
    <mailto:greg.land...@gmail.com>>
    *Gesendet:* Montag, 16. Dezember 2019 17:10
    *An:* Thomas Strunz <beginn...@hotmail.de
    <mailto:beginn...@hotmail.de>>
    *Cc:* rdkit-discuss@lists.sourceforge.net
    <mailto:rdkit-discuss@lists.sourceforge.net>
    <rdkit-discuss@lists.sourceforge.net
    <mailto:rdkit-discuss@lists.sourceforge.net>>
    *Betreff:* Re: [Rdkit-discuss] Observations about RDKit
    performance: PatternFingerprinter, Windows, Linux and Virtual
    machines
    Hi Thomas,

    First it is important to compare equivalent major versions to each
    other. Particularly in this case. On my linux box generating the
    pattern fingerprints takes 24.2 seconds with v2019.03.x and 15.9
    seconds with v2019.09.x (that's due to the improvements in the
    substructure matcher that the blog post you link to discusses).

    Comparing the same versions to each other:

    Performance on windows vs linux
    Windows performance with the RDKit has always lagged behind linux
    performance. There's something in the code (or in the way we use
    the compiler) that leads to big differences on some benchmarks.
    The most straightforward way I can demonstrate this is with
    results from my windows 10 laptop.
    Here's the output when running the fingerprint_screenout.py
    benchmark using the windows build:
    | 2019.09.1 | 13.6 | 0.3 | 38.1 | 0.8 | 25.5 | 25.9 | 84.1 |
    and here's the output from a linux build running on the Windows
    Linux Subsystem:
    | 2019.09.2 | 10.7 | 0.2 | 19.3 | 0.4 | 19.4 | 19.2 | 53.2 |
    You can see the differences are not small.
    I haven't invested massive time into it, but I haven't been able
    to figure out what causes this.

    Performance on (linux) VMs
    I can't think of any particular reason why there should be huge
    differences and it's really difficult to compare apples to apples
    here.
    Since I have the numbers, here's one comparison

    Here's a run on my linux workstation:
    | 2019.09.2 | 7.6 | 0.3 | 15.9 | 0.4 | 21.4 | 20.4 | 55.7 |
    and here's the same thing on an AWS t3.xlarge instance:
    | 2019.09.2 | 9.6 | 0.2 | 20.3 | 0.4 | 38.4 | 38.2 | 94.7 |
    The VM is significantly slower, but t3.xlarge an instance type
    that's intended to be used for compute intensive jobs (I don't
    have on of those active and configured at the moment).

    Does that help at all?
    -greg


    On Mon, Dec 16, 2019 at 8:27 AM Thomas Strunz
    <beginn...@hotmail.de <mailto:beginn...@hotmail.de>> wrote:

        Hi All,

        I was looking at a blog post from greg:

        
https://rdkit.blogspot.com/2019/07/a-couple-of-substructure-search-topics.html

        about fingerprint screenout. The part that got me confused was
        the timings in his blog post because run times in my case
        where a lot slower.

        Gregs numbers:

        [07:21:19] INFO: mols from smiles
        [07:21:27] INFO: Results1:  7.77 seconds, 50000 mols
        [07:21:27] INFO: queries from smiles
        [07:21:27] INFO: Results2:  0.16 seconds
        *[07:21:27] INFO: generating pattern fingerprints for mols
        [07:21:43] INFO: Results3: 16.11 seconds*
        [07:21:43] INFO: generating pattern fingerprints for queries
        [07:21:43] INFO: Results4:  0.34 seconds
        [07:21:43] INFO: testing frags queries
        [07:22:03] INFO: Results5:  19.90 seconds. 6753 tested (0.0003 of 
total), 3989 found,  0.59 accuracy. 0 errors.
        [07:22:03] INFO: testing leads queries
        [07:22:23] INFO: Results6:  19.77 seconds. 1586 tested (0.0001 of 
total), 1067 found,  0.67 accuracy. 0 errors.
        [07:22:23] INFO: testing pieces queries
        [07:23:19] INFO: Results7:  55.37 seconds. 3333202 tested (0.0810 of 
total), 1925628 found,  0.58 accuracy. 0 errors.

        | 2019.09.1dev1 | 7.8 | 0.2 | 16.1 | 0.3 | 19.9 | 19.8 | 55.4 |




        *Machine 1:*
        Virtual machine, Windows Server 2012 R2 with an intel xeon (4
        virtual cores)

        Since the test is single-threaded it makes a bit of sense that
        it isn't fast here but it's not just a bit slower, but a lot
        slower, depending on test almost 3xtimes slower

        [09:03:19] INFO: mols from smiles
        [09:03:38] INFO: Results1:  19.44 seconds, 50000 mols
        [09:03:38] INFO: queries from smiles
        [09:03:38] INFO: Results2:  0.36 seconds
        *[09:03:38] INFO: generating pattern fingerprints for mols
        *
        *[09:04:54] INFO: Results3:  75.99 seconds*
        [09:04:54] INFO: generating pattern fingerprints for queries
        [09:04:56] INFO: Results4:  1.55 seconds
        [09:04:56] INFO: testing frags queries
        [09:05:34] INFO: Results5:  37.59 seconds. 6753 tested (0.0003
        of total), 3989 f
        ound,  0.59 accuracy. 0 errors.
        [09:05:34] INFO: testing leads queries
        [09:06:11] INFO: Results6:  37.34 seconds. 1586 tested (0.0001
        of total), 1067 f
        ound,  0.67 accuracy. 0 errors.
        [09:06:11] INFO: testing pieces queries
        [09:08:39] INFO: Results7:  147.79 seconds. 3333202 tested
        (0.0810 of total), 19
        25628 found,  0.58 accuracy. 0 errors.
        | 2019.03.3 | 19.4 | 0.4 | 76.0 | 1.5 | 37.6 | 37.3 | 147.8 |

        I thought maybe another issue with windows being slow so I
        tested on a linux VM on my laptop

        *Machine 2:*
        Virtual machine, Lubuntu 16.04 on a laptop i7-8850H 6-core

        [09:23:31] INFO: mols from smiles
        [09:23:54] INFO: Results1:  23.71 seconds, 50000 mols
        [09:23:54] INFO: queries from smiles
        [09:23:55] INFO: Results2:  0.48 seconds
        *[09:23:55] INFO: generating pattern fingerprints for mols
        *
        *[09:24:53] INFO: Results3:  58.31 seconds*
        [09:24:53] INFO: generating pattern fingerprints for queries
        [09:24:54] INFO: Results4:  1.19 seconds
        [09:24:54] INFO: testing frags queries
        [09:25:41] INFO: Results5:  46.22 seconds. 6753 tested (0.0003
        of total), 3989 found,  0.59 accuracy. 0 errors.
        [09:25:41] INFO: testing leads queries
        [09:26:26] INFO: Results6:  45.84 seconds. 1586 tested (0.0001
        of total), 1067 found,  0.67 accuracy. 0 errors.
        [09:26:26] INFO: testing pieces queries
        [09:28:33] INFO: Results7:  126.78 seconds. 3333202 tested
        (0.0810 of total), 1925628 found,  0.58 accuracy. 0 errors.
        | 2019.03.3 | 23.7 | 0.5 | 58.3 | 1.2 | 46.2 | 45.8 | 126.8 |

        Pretty weird sometimes even slower sometimes faster than the
        windows VM but still a lot slower than Gregs numbers (I
        repeated with rdkit 2019.09.2 and got comparable results)

        So I also tested on above laptop directly:

        *Machine 3:*
        physical install, windows 10 on a laptop i7-8850H 6-core (same
        machine as 2)

        [09:51:43] INFO: mols from smiles
        [09:51:54] INFO: Results1:  10.59 seconds, 50000 mols
        [09:51:54] INFO: queries from smiles
        [09:51:54] INFO: Results2:  0.20 seconds
        *[09:51:54] INFO: generating pattern fingerprints for mols
        *
        *[09:52:24] INFO: Results3:  29.50 seconds*
        [09:52:24] INFO: generating pattern fingerprints for queries
        [09:52:24] INFO: Results4:  0.61 seconds
        [09:52:24] INFO: testing frags queries
        [09:52:44] INFO: Results5:  19.71 seconds. 6753 tested (0.0003
        of total), 3989 found,  0.59 accuracy. 0 errors.
        [09:52:44] INFO: testing leads queries
        [09:53:04] INFO: Results6:  19.48 seconds. 1586 tested (0.0001
        of total), 1067 found,  0.67 accuracy. 0 errors.
        [09:53:04] INFO: testing pieces queries
        [09:54:05] INFO: Results7:  61.94 seconds. 3333202 tested
        (0.0810 of total), 1925628 found,  0.58 accuracy. 0 errors.
        | 2019.09.1 | 10.6 | 0.2 | 29.5 | 0.6 | 19.7 | 19.5 | 61.9 |

        This is much closer to Gregs results, except for the
        fingerprinting which takes almost double the time.  Also
        notice how the fingerprinting on the linux VM is much faster
        also compared to other results than on the windows VM?

        *Conclusions:*

         1. Form what I see, it seems that the pattern fingerprinter
            runs a lot slower on windows. Is this known issue?
         2. In virtual machines the rdkits performance simply tanks,
            is much worse. A certain penalty is to be expected but not
            this much. Or what am I missing? Machine 1 runs on central
            infrastructure so I would assume virtualization is
            configured correctly. For the local VM, vt-x is enabled.
            Yet it is much slower compared to the physical machine
            (plus that AFAIK rdkit runs faster in linux vs windows)

        Especially the virtual machine aspect is kind of troubling
        because I would assume many real-world applications are
        deployed as VM and hence might suffer from this too?
        I don't have a well defined question but more interested in
        other users experience especially regarding the virtualization.

        Best Regards,

        Thomas


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to