Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Fri, May 24, 2013 at 11:37:04AM -0400, Peter Feiner wrote: > On Wed, May 22, 2013 at 7:31 PM, Peter Feiner wrote: > > Since some security driver operations are costly, I think it's > > worthwhile to reduce the scope of the security manager lock or > > increase the granularity by introducing more locks. After a cursory > > look, the security manager lock seems to have a much broader scope > > than necessary. The system / library calls underlying the security > > drivers are all thread safe (e.g., defining apparmor security profiles > > or chowning disk files), so a global lock isn't strictly necessary. > > Moreover, since most virSecurity calls are made whilst a virDomainObj > > lock is held and the security calls are generally domain specific, > > *most* of the security calls are probably thread safe in the absence > > of the global security manager lock. Obviously some work will have to > > be done to see where the security lock actually matters and some > > finer-grained locks will have to be introduced to handle these > > situations. > > To verify that this is worthwhile, I disabled the apparmor driver > entirely. My 20 VM creation test ran about 10s faster (down from 35s > to 25s). > > After giving this approach a little more thought, I think an > incremental series of patches is a good way to go. The responsibility > of locking could be pushed down into the security drivers. At first, > all of the drivers would lock where their managers' locked. Then each > driver could be updated to do more fine-grained locking. I'm going to > work on a patch to push the locking down into the drivers, then I'm > going to work on a patch for better locking in the apparmor driver. Yep, that sounds like a sane approach to me. Previously the security drivers had no locking at all, since they were relying on the global lock at the QEMU driver level. When I introduced the lock into the security manager module, I was pessimistic and used coarse locking. As you say, we can clearly relax this somewhat, if we have the locking in the individual security drivers. > > I also think it's worthwhile to eliminate locking from the the > > virDomainObjList lookups and traversals. Since virDomainObjLists are > > accessed in a bunch of places, I think it's a good defensive idea to > > decouple the performance of these accesses from virDomainObj locks, > > which are held during potentially long-running operations like domain > > creation. An easy way to divorce virDomainObjListSearchName from the > > virDomainObj lock would be to keep a copy of the domain names in the > > virDomainObjList and protect that list with the virDomainObjList lock. > > After removing the security driver contention, this was still a > substantial bottleneck: virConnectDefineXML could still take a few > seconds. I removed the contention by keeping a copy of the domain > definition's name in the domain object. Since the name is immutable > and the domain object is protected by the list lock, the list > traversal can read the name without taking any additional locks. This > patch reduced virConnectDefineXML to tens of milliseconds. Yep, I had a patch to add a security hash table to the domain object list, hashing based on name, but I lost the code when a disk died. I didn't find it made any difference, but agree we should just do it anyway, since it'll almost certainly be a problem in some scenarios. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Wed, May 22, 2013 at 7:31 PM, Peter Feiner wrote: > Since some security driver operations are costly, I think it's > worthwhile to reduce the scope of the security manager lock or > increase the granularity by introducing more locks. After a cursory > look, the security manager lock seems to have a much broader scope > than necessary. The system / library calls underlying the security > drivers are all thread safe (e.g., defining apparmor security profiles > or chowning disk files), so a global lock isn't strictly necessary. > Moreover, since most virSecurity calls are made whilst a virDomainObj > lock is held and the security calls are generally domain specific, > *most* of the security calls are probably thread safe in the absence > of the global security manager lock. Obviously some work will have to > be done to see where the security lock actually matters and some > finer-grained locks will have to be introduced to handle these > situations. To verify that this is worthwhile, I disabled the apparmor driver entirely. My 20 VM creation test ran about 10s faster (down from 35s to 25s). After giving this approach a little more thought, I think an incremental series of patches is a good way to go. The responsibility of locking could be pushed down into the security drivers. At first, all of the drivers would lock where their managers' locked. Then each driver could be updated to do more fine-grained locking. I'm going to work on a patch to push the locking down into the drivers, then I'm going to work on a patch for better locking in the apparmor driver. > I also think it's worthwhile to eliminate locking from the the > virDomainObjList lookups and traversals. Since virDomainObjLists are > accessed in a bunch of places, I think it's a good defensive idea to > decouple the performance of these accesses from virDomainObj locks, > which are held during potentially long-running operations like domain > creation. An easy way to divorce virDomainObjListSearchName from the > virDomainObj lock would be to keep a copy of the domain names in the > virDomainObjList and protect that list with the virDomainObjList lock. After removing the security driver contention, this was still a substantial bottleneck: virConnectDefineXML could still take a few seconds. I removed the contention by keeping a copy of the domain definition's name in the domain object. Since the name is immutable and the domain object is protected by the list lock, the list traversal can read the name without taking any additional locks. This patch reduced virConnectDefineXML to tens of milliseconds. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ongoing work on lock contention in qemu driver?
>> > > One theory I had was that the virDomainObjListSearchName method could >> > > be a bottleneck, becaue that acquires a lock on every single VM. This >> > > is invoked when starting a VM, when we call virDomainObjListAddLocked. >> > > I tried removing this locking though & didn't see any performance >> > > benefit, so never persued this further. Before trying things like >> > > this again, I think we'd need to find a way to actually identify where >> > > the true bottlenecks are, rather than guesswork. ... > Oh someone has already written such a systemtap script > > http://sourceware.org/systemtap/examples/process/mutex-contention.stp > > I think that is preferrable to trying to embed special code in > libvirt for this task. > > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| Cool! The systemtap approach was very fruitful. BTW, at the time of writing, the example script has a bug. See http://sourceware.org/ml/systemtap/2013-q2/msg00169.html for the fix. So the root cause of my bottleneck is the virSecurityManager lock. >From this root cause a few other bottlenecks emerge. The interesting parts of the mutex-contention.stp report are pasted at the end of this email. Here's the summary & my analysis: When a domain is created (domainCreateWithFlags), the domain object's lock is held. During the domain creation, various virSecurity functions are called, which all grab the security manager's lock. Since the security manager's lock is global, some fraction of domainCreateWithFlags is serialized by this lock. Since some virSecurity functions can take a long time, such as virSecurityManagerGenLabel for the apparmor security driver, which takes around 1s, the serialization that the security manager lock induces in domainCreateWithFlags is substantial. Since the domain's object lock is held all of this time, virDomainObjListSearchName blocks, thereby serializing virConnectDefineXML via virDomainObjListAdd, as you suggested earlier. Moreover, since the virDomainObjList lock is held while blocking in virDomainObjListSearchName, there's measurable contention whilst looking up domains during domainCreateWithFlags. Since some security driver operations are costly, I think it's worthwhile to reduce the scope of the security manager lock or increase the granularity by introducing more locks. After a cursory look, the security manager lock seems to have a much broader scope than necessary. The system / library calls underlying the security drivers are all thread safe (e.g., defining apparmor security profiles or chowning disk files), so a global lock isn't strictly necessary. Moreover, since most virSecurity calls are made whilst a virDomainObj lock is held and the security calls are generally domain specific, *most* of the security calls are probably thread safe in the absence of the global security manager lock. Obviously some work will have to be done to see where the security lock actually matters and some finer-grained locks will have to be introduced to handle these situations. I also think it's worthwhile to eliminate locking from the the virDomainObjList lookups and traversals. Since virDomainObjLists are accessed in a bunch of places, I think it's a good defensive idea to decouple the performance of these accesses from virDomainObj locks, which are held during potentially long-running operations like domain creation. An easy way to divorce virDomainObjListSearchName from the virDomainObj lock would be to keep a copy of the domain names in the virDomainObjList and protect that list with the virDomainObjList lock. What do you think? Peter == stack contended 4 times, 261325 avg usec, 576521 max usec, 1045301 total usec, at __lll_lock_wait+0x1c [libpthread-2.15.so] _L_lock_858+0xf [libpthread-2.15.so] __pthread_mutex_lock+0x3a [libpthread-2.15.so] virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4] qemuDomainGetXMLDesc+0x48 [libvirt_driver_qemu.so] virDomainGetXMLDesc+0xf5 [libvirt.so.0.1000.4] remoteDispatchDomainGetXMLDescHelper+0xb6 [libvirtd] virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4] virNetServerProcessMsg+0x2a [libvirt.so.0.1000.4] virNetServerHandleJob+0x73 [libvirt.so.0.1000.4] virThreadPoolWorker+0x10e == stack contended 12 times, 128053 avg usec, 992567 max usec, 1536640 total usec, at __lll_lock_wait+0x1c [libpthread-2.15.so] _L_lock_858+0xf [libpthread-2.15.so] __pthread_mutex_lock+0x3a [libpthread-2.15.so] virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4] qemuDomainStartWithFlags+0x5a [libvirt_driver_qemu.so] virDomainCreateWithFlags+0xf5 [libvirt.so.0.1000.4] remoteDispatchDomainCreateWithFlagsHelper+0xbe [libvirtd] virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4] v
Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Thu, May 16, 2013 at 06:18:57PM +0100, Daniel P. Berrange wrote: > On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote: > > > How many CPU cores are you testing on ? That's a good improvement, > > > but I'd expect the improvement to be greater as # of core is larger. > > > > I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out > > software bottlenecks, I'm intentionally running fewer tasks (20 parallel > > creations) than the number of logical cores (24). The memory, disk and > > network are also well over provisioned. > > > > > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we > > > limit a single connection to only 5 RPC calls. Beyond that calls > > > queue up, even if libvirtd is otherwise idle. OpenStack uses a > > > single connection for everythin so will hit this. I suspect this > > > would be why virConnectGetLibVersion would appear to be slow. That > > > API does absolutely nothing of any consequence, so the only reason > > > I'd expect that to be slow is if you're hitting a libvirtd RPC > > > limit causing the API to be queued up. > > > > I hadn't tuned libvirtd.conf at all. I have just increased > > max_{clients,workers,requests,client_requests} to 50 and repeated my > > experiment. As you expected, virtConnectGetLibVersion is now very fast. > > Unfortunately, the median VM creation time didn't change. > > > > > I'm not actively doing anything in this area. Mostly because I've got not > > > clear data on where any remaining bottlenecks are. > > > > Unless there are other parameters to tweak, I believe I'm still hitting a > > bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for > > libvirt > > calls are > > > > virConnectDefineXML*: 13ms vs 4.5s > > virDomainCreateWithFlags*: 1.8s vs 20s > > > > * I had said that virConnectDefineXML wasn't serialized in my first email. I > > based that observation on a single trace I looked at :-) In the average > > case, > > virConnectDefineXML is affected by a bottleneck. > > virConnectDefineXML would at least hit the possible bottleneck on > the virDomainObjListAddLocked method. In fact that's pretty much > the only contended lock I'd expect it to hit. Nothing else that > it runs has any serious locking involved. > > > Note that when I took these measurements, I also monitored CPU & disk > > utilization. > > During the 20 VM test, both CPU & disk were well below 100% for 97% of the > > test > > (i.e., 60s test duration, measured utilization with atop using a 2 > > second interval, > > CPU was pegged for 2s). > > > > > One theory I had was that the virDomainObjListSearchName method could > > > be a bottleneck, becaue that acquires a lock on every single VM. This > > > is invoked when starting a VM, when we call virDomainObjListAddLocked. > > > I tried removing this locking though & didn't see any performance > > > benefit, so never persued this further. Before trying things like > > > this again, I think we'd need to find a way to actually identify where > > > the true bottlenecks are, rather than guesswork. > > > > Testing your hypothesis would be straightforward. I'll add some > > instrumentation to > > measure the time spent waiting for the locks and repeat my 20 VM > > experiment. Or, > > if there's some systematic lock profiling in place, then I can turn > > that on and report > > the results. > > There's no lock profiling support built-in to libvirt. I'm not sure > of the best way introduce such support without it impacting the very > thing we're trying to test. Suggestions welcome > > Perhaps a systemtap script would do a reasonable job at it though. > eg record any stack traces associated with long futex_wait() system > calls or something like that. Oh someone has already written such a systemtap script http://sourceware.org/systemtap/examples/process/mutex-contention.stp I think that is preferrable to trying to embed special code in libvirt for this task. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Thu, May 16, 2013 at 1:18 PM, Daniel P. Berrange wrote: > On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote: >> > How many CPU cores are you testing on ? That's a good improvement, >> > but I'd expect the improvement to be greater as # of core is larger. >> >> I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out >> software bottlenecks, I'm intentionally running fewer tasks (20 parallel >> creations) than the number of logical cores (24). The memory, disk and >> network are also well over provisioned. >> >> > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we >> > limit a single connection to only 5 RPC calls. Beyond that calls >> > queue up, even if libvirtd is otherwise idle. OpenStack uses a >> > single connection for everythin so will hit this. I suspect this >> > would be why virConnectGetLibVersion would appear to be slow. That >> > API does absolutely nothing of any consequence, so the only reason >> > I'd expect that to be slow is if you're hitting a libvirtd RPC >> > limit causing the API to be queued up. >> >> I hadn't tuned libvirtd.conf at all. I have just increased >> max_{clients,workers,requests,client_requests} to 50 and repeated my >> experiment. As you expected, virtConnectGetLibVersion is now very fast. >> Unfortunately, the median VM creation time didn't change. >> >> > I'm not actively doing anything in this area. Mostly because I've got not >> > clear data on where any remaining bottlenecks are. >> >> Unless there are other parameters to tweak, I believe I'm still hitting a >> bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt >> calls are >> >> virConnectDefineXML*: 13ms vs 4.5s >> virDomainCreateWithFlags*: 1.8s vs 20s >> >> * I had said that virConnectDefineXML wasn't serialized in my first email. I >> based that observation on a single trace I looked at :-) In the average >> case, >> virConnectDefineXML is affected by a bottleneck. > > virConnectDefineXML would at least hit the possible bottleneck on > the virDomainObjListAddLocked method. In fact that's pretty much > the only contended lock I'd expect it to hit. Nothing else that > it runs has any serious locking involved. Okay cool, I'll measure this. I'll also try to figure out what virDomainCreateWithFlags is waiting on. >> Note that when I took these measurements, I also monitored CPU & disk >> utilization. >> During the 20 VM test, both CPU & disk were well below 100% for 97% of the >> test >> (i.e., 60s test duration, measured utilization with atop using a 2 >> second interval, >> CPU was pegged for 2s). >> >> > One theory I had was that the virDomainObjListSearchName method could >> > be a bottleneck, becaue that acquires a lock on every single VM. This >> > is invoked when starting a VM, when we call virDomainObjListAddLocked. >> > I tried removing this locking though & didn't see any performance >> > benefit, so never persued this further. Before trying things like >> > this again, I think we'd need to find a way to actually identify where >> > the true bottlenecks are, rather than guesswork. >> >> Testing your hypothesis would be straightforward. I'll add some >> instrumentation to >> measure the time spent waiting for the locks and repeat my 20 VM experiment. >> Or, >> if there's some systematic lock profiling in place, then I can turn >> that on and report >> the results. > > There's no lock profiling support built-in to libvirt. I'm not sure > of the best way introduce such support without it impacting the very > thing we're trying to test. Suggestions welcome A straightforward way to keep lock statistics with low overhead and w/out affecting concurrency would be to use thread local storage (TLS). At the end of a run, or periodically, the stats could be aggregated and reported. Since the stats don't have to be precise, it's OK to do the aggregation racily. Simple statistics to keep are * For each lock L, the time spent waiting. * For each lock L and callsite C, the time spent waiting. It would probably be sufficient to identify L as the lock's parent class name. If per-instance stats are necessary, then we could add the address of the object to the identity of L. So pseudo code would look something like this: struct lock_stats { map of (lock_class) to unsigned long: wait_time; map of (lock_class, stack_trace) to unsigned long: callsite_wait_time; }; __thread struct lock_stats * lock_stats; thread_local_storage_init() { lock_stats = new lock_stats; } /* return microseconds elapsed since some arbitrary start time */ unsigned long timestamp(void) { struct timespec timespec; clock_gettime(CLOCK_MONOTONIC, ×pec); return timespec.tv_sec * 1e6 + timespec.tv_sec / 10; } void virObjectLock(void *anyobj) { unsigned long start, elapsed; virObjectLockablePtr obj = anyobj; start = timestamp(); virMutexLock(&obj->lock); elapsed = timestamp() - start; lock_stats->wait_time[obj
Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote: > > How many CPU cores are you testing on ? That's a good improvement, > > but I'd expect the improvement to be greater as # of core is larger. > > I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out > software bottlenecks, I'm intentionally running fewer tasks (20 parallel > creations) than the number of logical cores (24). The memory, disk and > network are also well over provisioned. > > > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we > > limit a single connection to only 5 RPC calls. Beyond that calls > > queue up, even if libvirtd is otherwise idle. OpenStack uses a > > single connection for everythin so will hit this. I suspect this > > would be why virConnectGetLibVersion would appear to be slow. That > > API does absolutely nothing of any consequence, so the only reason > > I'd expect that to be slow is if you're hitting a libvirtd RPC > > limit causing the API to be queued up. > > I hadn't tuned libvirtd.conf at all. I have just increased > max_{clients,workers,requests,client_requests} to 50 and repeated my > experiment. As you expected, virtConnectGetLibVersion is now very fast. > Unfortunately, the median VM creation time didn't change. > > > I'm not actively doing anything in this area. Mostly because I've got not > > clear data on where any remaining bottlenecks are. > > Unless there are other parameters to tweak, I believe I'm still hitting a > bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt > calls are > > virConnectDefineXML*: 13ms vs 4.5s > virDomainCreateWithFlags*: 1.8s vs 20s > > * I had said that virConnectDefineXML wasn't serialized in my first email. I > based that observation on a single trace I looked at :-) In the average > case, > virConnectDefineXML is affected by a bottleneck. virConnectDefineXML would at least hit the possible bottleneck on the virDomainObjListAddLocked method. In fact that's pretty much the only contended lock I'd expect it to hit. Nothing else that it runs has any serious locking involved. > Note that when I took these measurements, I also monitored CPU & disk > utilization. > During the 20 VM test, both CPU & disk were well below 100% for 97% of the > test > (i.e., 60s test duration, measured utilization with atop using a 2 > second interval, > CPU was pegged for 2s). > > > One theory I had was that the virDomainObjListSearchName method could > > be a bottleneck, becaue that acquires a lock on every single VM. This > > is invoked when starting a VM, when we call virDomainObjListAddLocked. > > I tried removing this locking though & didn't see any performance > > benefit, so never persued this further. Before trying things like > > this again, I think we'd need to find a way to actually identify where > > the true bottlenecks are, rather than guesswork. > > Testing your hypothesis would be straightforward. I'll add some > instrumentation to > measure the time spent waiting for the locks and repeat my 20 VM experiment. > Or, > if there's some systematic lock profiling in place, then I can turn > that on and report > the results. There's no lock profiling support built-in to libvirt. I'm not sure of the best way introduce such support without it impacting the very thing we're trying to test. Suggestions welcome Perhaps a systemtap script would do a reasonable job at it though. eg record any stack traces associated with long futex_wait() system calls or something like that. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ongoing work on lock contention in qemu driver?
> How many CPU cores are you testing on ? That's a good improvement, > but I'd expect the improvement to be greater as # of core is larger. I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out software bottlenecks, I'm intentionally running fewer tasks (20 parallel creations) than the number of logical cores (24). The memory, disk and network are also well over provisioned. > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we > limit a single connection to only 5 RPC calls. Beyond that calls > queue up, even if libvirtd is otherwise idle. OpenStack uses a > single connection for everythin so will hit this. I suspect this > would be why virConnectGetLibVersion would appear to be slow. That > API does absolutely nothing of any consequence, so the only reason > I'd expect that to be slow is if you're hitting a libvirtd RPC > limit causing the API to be queued up. I hadn't tuned libvirtd.conf at all. I have just increased max_{clients,workers,requests,client_requests} to 50 and repeated my experiment. As you expected, virtConnectGetLibVersion is now very fast. Unfortunately, the median VM creation time didn't change. > I'm not actively doing anything in this area. Mostly because I've got not > clear data on where any remaining bottlenecks are. Unless there are other parameters to tweak, I believe I'm still hitting a bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt calls are virConnectDefineXML*: 13ms vs 4.5s virDomainCreateWithFlags*: 1.8s vs 20s * I had said that virConnectDefineXML wasn't serialized in my first email. I based that observation on a single trace I looked at :-) In the average case, virConnectDefineXML is affected by a bottleneck. Note that when I took these measurements, I also monitored CPU & disk utilization. During the 20 VM test, both CPU & disk were well below 100% for 97% of the test (i.e., 60s test duration, measured utilization with atop using a 2 second interval, CPU was pegged for 2s). > One theory I had was that the virDomainObjListSearchName method could > be a bottleneck, becaue that acquires a lock on every single VM. This > is invoked when starting a VM, when we call virDomainObjListAddLocked. > I tried removing this locking though & didn't see any performance > benefit, so never persued this further. Before trying things like > this again, I think we'd need to find a way to actually identify where > the true bottlenecks are, rather than guesswork. Testing your hypothesis would be straightforward. I'll add some instrumentation to measure the time spent waiting for the locks and repeat my 20 VM experiment. Or, if there's some systematic lock profiling in place, then I can turn that on and report the results. Thanks, Peter -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Ongoing work on lock contention in qemu driver?
On Thu, May 16, 2013 at 12:09:39PM -0400, Peter Feiner wrote: > Hello Daniel, > > I've been working on improving scalability in OpenStack on libvirt+kvm > for the last couple of months. I'm particularly interested in reducing > the time it takes to create VMs when many VMs are requested in > parallel. > > One apparent bottleneck during virtual machine creation is libvirt. As > more VMs are created in parallel, some libvirt calls (i.e., > virConnectGetLibVersion and virDomainCreateWithFlags) take longer > without a commensurate increase in hardware utilization. > > Thanks to your patches in libvirt-1.0.3, the situation has improved. > Some libvirt calls OpenStack makes during VM creation (i.e., > virConnectDefineXML) have no measurable slowdown when many VMs are > created in parallel. In turn, parallel VM creation in OpenStack is > significantly faster with libvirt-1.0.3. On my standard benchmark > (create 20 VMs in parallel, wait until the VM is ACTIVE, which is > essentially after virDomainCreateWithFlags returns), libvirt-1.0.3 > reduces the median creation time from 90s to 60s when compared to > libvirt-0.9.8. How many CPU cores are you testing on ? That's a good improvement, but I'd expect the improvement to be greater as # of core is larger. Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we limit a single connection to only 5 RPC calls. Beyond that calls queue up, even if libvirtd is otherwise idle. OpenStack uses a single connection for everythin so will hit this. I suspect this would be why virConnectGetLibVersion would appear to be slow. That API does absolutely nothing of any consequence, so the only reason I'd expect that to be slow is if you're hitting a libvirtd RPC limit causing the API to be queued up. > I'd like to know if your concurrency work in the qemu driver is > ongoing. If it isn't, I'd like to pick the work up myself and work on > further improvements. Any advice or insight would be appreciated. I'm not actively doing anything in this area. Mostly because I've got not clear data on where any remaining bottlenecks are. One theory I had was that the virDomainObjListSearchName method could be a bottleneck, becaue that acquires a lock on every single VM. This is invoked when starting a VM, when we call virDomainObjListAddLocked. I tried removing this locking though & didn't see any performance benefit, so never persued this further. Before trying things like this again, I think we'd need to find a way to actually identify where the true bottlenecks are, rather than guesswork. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Ongoing work on lock contention in qemu driver?
Hello Daniel, I've been working on improving scalability in OpenStack on libvirt+kvm for the last couple of months. I'm particularly interested in reducing the time it takes to create VMs when many VMs are requested in parallel. One apparent bottleneck during virtual machine creation is libvirt. As more VMs are created in parallel, some libvirt calls (i.e., virConnectGetLibVersion and virDomainCreateWithFlags) take longer without a commensurate increase in hardware utilization. Thanks to your patches in libvirt-1.0.3, the situation has improved. Some libvirt calls OpenStack makes during VM creation (i.e., virConnectDefineXML) have no measurable slowdown when many VMs are created in parallel. In turn, parallel VM creation in OpenStack is significantly faster with libvirt-1.0.3. On my standard benchmark (create 20 VMs in parallel, wait until the VM is ACTIVE, which is essentially after virDomainCreateWithFlags returns), libvirt-1.0.3 reduces the median creation time from 90s to 60s when compared to libvirt-0.9.8. I'd like to know if your concurrency work in the qemu driver is ongoing. If it isn't, I'd like to pick the work up myself and work on further improvements. Any advice or insight would be appreciated. Thanks! Peter Feiner -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list