> pthread.py:129(wait)                          1230.640       1377.992
> +147.28 (BAD)
The threadpool would just get stuck on wait()
if there are no tasks since Queues use Conditions
internally.

This might explain how the average wait time is
so long.

----- Original Message -----
> From: "Francesco Romani" <from...@redhat.com>
> To: "vdsm-devel" <vdsm-devel@lists.fedorahosted.org>
> Sent: Wednesday, March 19, 2014 10:33:51 AM
> Subject: [vdsm] VDSM profiling results, round 1
> 
> (sending again WITHOUT the attachments)
> 
> Hi everyone
> 
> I'd like to share the first round of profiling results for VDSM and my next
> steps.
> 
> Summary:
> - experimented a couple of profiling approaches and found a good one
> - benchmarked http://gerrit.ovirt.org/#/c/25678/ : it is beneficial, was
> merged
> - found a few low-hanging fruits which seems quite safe to merge and
> beneficial to *all* flows
> - started engagement with infra (see other thread) to have common and
> polished performance
>   tools
> - test roadmap is shaping up, wiki/ML will be updated in the coming days
> 
> Please read through for a more detailed discussion. Every comment is welcome.
> 
> Disclaimer:
> long mail, lot of content, please point out if something is missing or not
> clear enough
> or if deserves more discussion.
> 
> +++
> 
> == First round results ==
> 
> First round of profiling was a follow-up of what I shown during the VDSM
> gathering.
> The results file contains a full profile ordered by descending time.
> In a nutshell: parallel start of 32 tiny VMs using engine REST API and a
> single hypervisor host.
> 
> VMs are tiny just because I want to stuff as much VMs I can in my mini-dell
> (16 GB ram, 4 core + HT CPUs)
> 
> It is worth to point out a few differences with respect to the *profile* (NOT
> the graphs)
> I shown during the gathering:
> 
> - profile data is now collected using the profile decorator (see
> http://www.ovirt.org/Profiling_Vdsm)
>   just around Vm._startUnderlyingVm. The gathering profile was obtained using
>   the yappi application-wide
>   profiler (see https://code.google.com/p/yappi/) and 40 VMs.
>   * why yappi?
>     I thought an application-wide profiler gathers more information and let
>     us to have a better picture.
>     I actually still think that but I faced some yappi misbehaviour which I
>     want to fix later;
>     function-level profile so far is easier to collect (just grab the data
>     dumped to file).
>   * why 40 VMs?
>     I started with 64 but exausted my storage backing store :)
>     Will add more storage space in the next days, for the moment I stepped
>     back to 32.
> 
> It is worth to note that while on one hand numbers change a bit (if you
> remember the old profile data
> and the scary 80secs wasted on namedtuple), on the other hand the suspects
> are the same and the
> relative positions are roughly the same.
> So I believe our initial findings (namedtuple patch) and the plan are still
> valid.
> 
> == how it was done ==
> 
> I am still focusing just on the "monday morning" scenario (mass start of many
> VMs at the same time).
> Each run consisted in a parallel start of 32 VMs as described in the result
> data.
> VDSM was restarted between one run and the another.
> engine was *NOT* restarted between runs.
> individual profiles have been gathered after all the runs and the profile was
> extracted from the aggregation of them.
> 
> profile dumps are available to everyone, just drop me a note and I'll put the
> tarball somewhere.
> 
> please find attached the profile data as txt format. For easier consumption,
> they are also
> available on pastebin:
> 
> baseline      : http://paste.fedoraproject.org/86318/
> namedtuple fix: http://paste.fedoraproject.org/86378/
> pickle fix    : http://paste.fedoraproject.org/86600/ (see below)
> 
> == hotspots ==
> 
> the baseline profile data highlights five major areas and hotspots:
> 
> 1. internal concurrency (possible patch: http://gerrit.ovirt.org/#/c/25857/ -
> see below)
> 2. libvirt
> 3. XML processing (initial patch: http://gerrit.ovirt.org/#/c/17694/)
> 4. namedtuple (patch: http://gerrit.ovirt.org/#/c/25678/ - fixed, merged)
> 5. pickling (patch: http://gerrit.ovirt.org/#/c/25860/ - see below)
> 
> #4 is beneficial in the ISCSI path and it was already merged.
> #1 shows some potential but it needs to be carefully evaluated to avoid
> performance regressions
> on different scenarios (e.g. bigger machines than mine :))
> #2 is basically outside of our control but it needs to be watched out
> #3 and #5 are beneficial for all flows and scenarios and are safe to merge.
> #5 is almost a no-brainer IMO
> 
> == Note about the third profile ==
> 
> When profiling the cPickle patch http://paste.fedoraproject.org/86600/
> the tests turned out actually *slower* with respect the second profile with
> just the namedtuple
> patch.
> 
> The hotspots seems to be around concurrency and libvirt:
> location                                      profile2(s)    profile3(s)
> diff(s)
> pthread.py:129(wait)                          1230.640       1377.992
> +147.28 (BAD)
> virDomainCreateXML                            155.171        175.681
> +20.51 (BAD)
> 'select.epoll' objects                        52.523         53.635
> +1.112 (negligible)
> expatbuilder.py:743(start_element_handler)    28.172         33.975
> +5.803 (BAD?)
> virDomainGetXMLDesc                           23.947         23.217
> -0.73 (negligible)
> 
> I'm OK with some variance (it is expected) but this is also a warning sign to
> be extra-carefully
> in tuning the concurrency patch (bullet point #1 above). We should definitely
> evaluate more scenarios
> before to merge it.
> 
> If we edge out those diffs, we see the cPickle patch has the (small) benefits
> we expect,
> and I think it is 100% safe to merge. I already did some minimal
> extra-verification just in case.
> 
> == Next steps ==
> 
> For the near term (the coming days/next weeks)
> * benchmark the remaining easy fixes which are beneficial for all flows
> and quite safe to merge (XML processing being first) and to work to have them
> merged.
> * polish scripts and benchmarking code, start to submit to infra for review
> * continue investigation about our (in)famous BoundedSempahore
> (http://gerrit.ovirt.org/#/c/25857/)
>   to see if dropping it has regressions or other bad effects
> * find other test scenarios
> 
> I also have noted all the suggestion received so far and I planning more test
> cases just for this scenario.
> 
> For example:
> 1. just start N QEMUs to obtain our lower bound (we cannot get faster than
> this)
> 2. run with different storage (NFS)
> 3. run with no storage
> 4. run with Guest OS installed on disks
> 
> And of course we need more scenarios.
> Let me just repeat myself: those are just the first steps of a long journey.
> 
> 
> --
> Francesco Romani
> RedHat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to