another cpickle patch round. http://paste.fedoraproject.org/86653/ The only change is: - DO NOT restart vdsmd between runs (just because it was faster to test)
the timings/profile is now much more closer to what I was expecting. The benefits of switching to cPickle are indeed small, but the change is quick easy and safe, so I'm still all for it. ----- Original Message ----- > From: "Francesco Romani" <from...@redhat.com> > To: "vdsm-devel" <vdsm-devel@lists.fedorahosted.org> > Sent: Wednesday, March 19, 2014 9:40:07 AM > Subject: Re: [vdsm] VDSM profiling results, round 1 > > pastebins have short expiration time. > The following should be there forever (!) > > baseline : http://paste.fedoraproject.org/86610/ > namedtuple patch: http://paste.fedoraproject.org/86611/ > cpickle patch : http://paste.fedoraproject.org/86613/ > > Bests, > > ----- Original Message ----- > > From: "Francesco Romani" <from...@redhat.com> > > To: "vdsm-devel" <vdsm-devel@lists.fedorahosted.org> > > Sent: Wednesday, March 19, 2014 9:33:51 AM > > Subject: [vdsm] VDSM profiling results, round 1 > > > > (sending again WITHOUT the attachments) > > > > Hi everyone > > > > I'd like to share the first round of profiling results for VDSM and my next > > steps. > > > > Summary: > > - experimented a couple of profiling approaches and found a good one > > - benchmarked http://gerrit.ovirt.org/#/c/25678/ : it is beneficial, was > > merged > > - found a few low-hanging fruits which seems quite safe to merge and > > beneficial to *all* flows > > - started engagement with infra (see other thread) to have common and > > polished performance > > tools > > - test roadmap is shaping up, wiki/ML will be updated in the coming days > > > > Please read through for a more detailed discussion. Every comment is > > welcome. > > > > Disclaimer: > > long mail, lot of content, please point out if something is missing or not > > clear enough > > or if deserves more discussion. > > > > +++ > > > > == First round results == > > > > First round of profiling was a follow-up of what I shown during the VDSM > > gathering. > > The results file contains a full profile ordered by descending time. > > In a nutshell: parallel start of 32 tiny VMs using engine REST API and a > > single hypervisor host. > > > > VMs are tiny just because I want to stuff as much VMs I can in my mini-dell > > (16 GB ram, 4 core + HT CPUs) > > > > It is worth to point out a few differences with respect to the *profile* > > (NOT > > the graphs) > > I shown during the gathering: > > > > - profile data is now collected using the profile decorator (see > > http://www.ovirt.org/Profiling_Vdsm) > > just around Vm._startUnderlyingVm. The gathering profile was obtained > > using > > the yappi application-wide > > profiler (see https://code.google.com/p/yappi/) and 40 VMs. > > * why yappi? > > I thought an application-wide profiler gathers more information and let > > us to have a better picture. > > I actually still think that but I faced some yappi misbehaviour which I > > want to fix later; > > function-level profile so far is easier to collect (just grab the data > > dumped to file). > > * why 40 VMs? > > I started with 64 but exausted my storage backing store :) > > Will add more storage space in the next days, for the moment I stepped > > back to 32. > > > > It is worth to note that while on one hand numbers change a bit (if you > > remember the old profile data > > and the scary 80secs wasted on namedtuple), on the other hand the suspects > > are the same and the > > relative positions are roughly the same. > > So I believe our initial findings (namedtuple patch) and the plan are still > > valid. > > > > == how it was done == > > > > I am still focusing just on the "monday morning" scenario (mass start of > > many > > VMs at the same time). > > Each run consisted in a parallel start of 32 VMs as described in the result > > data. > > VDSM was restarted between one run and the another. > > engine was *NOT* restarted between runs. > > individual profiles have been gathered after all the runs and the profile > > was > > extracted from the aggregation of them. > > > > profile dumps are available to everyone, just drop me a note and I'll put > > the > > tarball somewhere. > > > > please find attached the profile data as txt format. For easier > > consumption, > > they are also > > available on pastebin: > > > > baseline : http://paste.fedoraproject.org/86318/ > > namedtuple fix: http://paste.fedoraproject.org/86378/ > > pickle fix : http://paste.fedoraproject.org/86600/ (see below) > > > > == hotspots == > > > > the baseline profile data highlights five major areas and hotspots: > > > > 1. internal concurrency (possible patch: http://gerrit.ovirt.org/#/c/25857/ > > - > > see below) > > 2. libvirt > > 3. XML processing (initial patch: http://gerrit.ovirt.org/#/c/17694/) > > 4. namedtuple (patch: http://gerrit.ovirt.org/#/c/25678/ - fixed, merged) > > 5. pickling (patch: http://gerrit.ovirt.org/#/c/25860/ - see below) > > > > #4 is beneficial in the ISCSI path and it was already merged. > > #1 shows some potential but it needs to be carefully evaluated to avoid > > performance regressions > > on different scenarios (e.g. bigger machines than mine :)) > > #2 is basically outside of our control but it needs to be watched out > > #3 and #5 are beneficial for all flows and scenarios and are safe to merge. > > #5 is almost a no-brainer IMO > > > > == Note about the third profile == > > > > When profiling the cPickle patch http://paste.fedoraproject.org/86600/ > > the tests turned out actually *slower* with respect the second profile with > > just the namedtuple > > patch. > > > > The hotspots seems to be around concurrency and libvirt: > > location profile2(s) profile3(s) > > diff(s) > > pthread.py:129(wait) 1230.640 1377.992 > > +147.28 (BAD) > > virDomainCreateXML 155.171 175.681 > > +20.51 (BAD) > > 'select.epoll' objects 52.523 53.635 > > +1.112 (negligible) > > expatbuilder.py:743(start_element_handler) 28.172 33.975 > > +5.803 (BAD?) > > virDomainGetXMLDesc 23.947 23.217 > > -0.73 (negligible) > > > > I'm OK with some variance (it is expected) but this is also a warning sign > > to > > be extra-carefully > > in tuning the concurrency patch (bullet point #1 above). We should > > definitely > > evaluate more scenarios > > before to merge it. > > > > If we edge out those diffs, we see the cPickle patch has the (small) > > benefits > > we expect, > > and I think it is 100% safe to merge. I already did some minimal > > extra-verification just in case. > > > > == Next steps == > > > > For the near term (the coming days/next weeks) > > * benchmark the remaining easy fixes which are beneficial for all flows > > and quite safe to merge (XML processing being first) and to work to have > > them > > merged. > > * polish scripts and benchmarking code, start to submit to infra for review > > * continue investigation about our (in)famous BoundedSempahore > > (http://gerrit.ovirt.org/#/c/25857/) > > to see if dropping it has regressions or other bad effects > > * find other test scenarios > > > > I also have noted all the suggestion received so far and I planning more > > test > > cases just for this scenario. > > > > For example: > > 1. just start N QEMUs to obtain our lower bound (we cannot get faster than > > this) > > 2. run with different storage (NFS) > > 3. run with no storage > > 4. run with Guest OS installed on disks > > > > And of course we need more scenarios. > > Let me just repeat myself: those are just the first steps of a long > > journey. > > > > > > -- > > Francesco Romani > > RedHat Engineering Virtualization R & D > > Phone: 8261328 > > IRC: fromani > > _______________________________________________ > > vdsm-devel mailing list > > vdsm-devel@lists.fedorahosted.org > > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > > > > -- > Francesco Romani > RedHat Engineering Virtualization R & D > Phone: 8261328 > IRC: fromani > _______________________________________________ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel