Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-06-03 Thread Daniel P. Berrange
On Fri, May 24, 2013 at 11:37:04AM -0400, Peter Feiner wrote:
> On Wed, May 22, 2013 at 7:31 PM, Peter Feiner  wrote:
> > Since some security driver operations are costly, I think it's
> > worthwhile to reduce the scope of the security manager lock or
> > increase the granularity by introducing more locks. After a cursory
> > look, the security manager lock seems to have a much broader scope
> > than necessary. The system / library calls underlying the security
> > drivers are all thread safe (e.g., defining apparmor security profiles
> > or chowning disk files), so a global lock isn't strictly necessary.
> > Moreover, since most virSecurity calls are made whilst a virDomainObj
> > lock is held and the security calls are generally domain specific,
> > *most* of the security calls are probably thread safe in the absence
> > of the global security manager lock. Obviously some work will have to
> > be done to see where the security lock actually matters and some
> > finer-grained locks will have to be introduced to handle these
> > situations.
> 
> To verify that this is worthwhile, I disabled the apparmor driver
> entirely. My 20 VM creation test ran about 10s faster (down from 35s
> to 25s).
> 
> After giving this approach a little more thought, I think an
> incremental series of patches is a good way to go. The responsibility
> of locking could be pushed down into the security drivers. At first,
> all of the drivers would lock where their managers' locked. Then each
> driver could be updated to do more fine-grained locking. I'm going to
> work on a patch to push the locking down into the drivers, then I'm
> going to work on a patch for better locking in the apparmor driver.

Yep, that sounds like a sane approach to me. Previously the security
drivers had no locking at all, since they were relying on the global
lock at the QEMU driver level. When I introduced the lock into the
security manager module, I was pessimistic and used coarse locking.
As you say, we can clearly relax this somewhat, if we have the locking
in the individual security drivers.

> > I also think it's worthwhile to eliminate locking from the the
> > virDomainObjList lookups and traversals. Since virDomainObjLists are
> > accessed in a bunch of places, I think it's a good defensive idea to
> > decouple the performance of these accesses from virDomainObj locks,
> > which are held during potentially long-running operations like domain
> > creation. An easy way to divorce virDomainObjListSearchName from the
> > virDomainObj lock would be to keep a copy of the domain names in the
> > virDomainObjList and protect that list with the virDomainObjList lock.
> 
> After removing the security driver contention, this was still a
> substantial bottleneck: virConnectDefineXML could still take a few
> seconds. I removed the contention by keeping a copy of the domain
> definition's name in the domain object. Since the name is immutable
> and the domain object is protected by the list lock, the list
> traversal can read the name without taking any additional locks. This
> patch reduced virConnectDefineXML to tens of milliseconds.

Yep, I had a patch to add a security hash table to the domain object
list, hashing based on name, but I lost the code when a disk died.
I didn't find it made any difference, but agree we should just do it
anyway, since it'll almost certainly be a problem in some scenarios.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-24 Thread Peter Feiner
On Wed, May 22, 2013 at 7:31 PM, Peter Feiner  wrote:
> Since some security driver operations are costly, I think it's
> worthwhile to reduce the scope of the security manager lock or
> increase the granularity by introducing more locks. After a cursory
> look, the security manager lock seems to have a much broader scope
> than necessary. The system / library calls underlying the security
> drivers are all thread safe (e.g., defining apparmor security profiles
> or chowning disk files), so a global lock isn't strictly necessary.
> Moreover, since most virSecurity calls are made whilst a virDomainObj
> lock is held and the security calls are generally domain specific,
> *most* of the security calls are probably thread safe in the absence
> of the global security manager lock. Obviously some work will have to
> be done to see where the security lock actually matters and some
> finer-grained locks will have to be introduced to handle these
> situations.

To verify that this is worthwhile, I disabled the apparmor driver
entirely. My 20 VM creation test ran about 10s faster (down from 35s
to 25s).

After giving this approach a little more thought, I think an
incremental series of patches is a good way to go. The responsibility
of locking could be pushed down into the security drivers. At first,
all of the drivers would lock where their managers' locked. Then each
driver could be updated to do more fine-grained locking. I'm going to
work on a patch to push the locking down into the drivers, then I'm
going to work on a patch for better locking in the apparmor driver.

> I also think it's worthwhile to eliminate locking from the the
> virDomainObjList lookups and traversals. Since virDomainObjLists are
> accessed in a bunch of places, I think it's a good defensive idea to
> decouple the performance of these accesses from virDomainObj locks,
> which are held during potentially long-running operations like domain
> creation. An easy way to divorce virDomainObjListSearchName from the
> virDomainObj lock would be to keep a copy of the domain names in the
> virDomainObjList and protect that list with the virDomainObjList lock.

After removing the security driver contention, this was still a
substantial bottleneck: virConnectDefineXML could still take a few
seconds. I removed the contention by keeping a copy of the domain
definition's name in the domain object. Since the name is immutable
and the domain object is protected by the list lock, the list
traversal can read the name without taking any additional locks. This
patch reduced virConnectDefineXML to tens of milliseconds.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-22 Thread Peter Feiner
>> > > One theory I had was that the virDomainObjListSearchName method could
>> > > be a bottleneck, becaue that acquires a lock on every single VM. This
>> > > is invoked when starting a VM, when we call virDomainObjListAddLocked.
>> > > I tried removing this locking though & didn't see any performance
>> > > benefit, so never persued this further.  Before trying things like
>> > > this again, I think we'd need to find a way to actually identify where
>> > > the true bottlenecks are, rather than guesswork.
...
> Oh someone has already written such a systemtap script
>
> http://sourceware.org/systemtap/examples/process/mutex-contention.stp
>
> I think that is preferrable to trying to embed special code in
> libvirt for this task.
>
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Cool! The systemtap approach was very fruitful. BTW, at the time of
writing, the example script has a bug. See
http://sourceware.org/ml/systemtap/2013-q2/msg00169.html for the fix.

So the root cause of my bottleneck is the virSecurityManager lock.
>From this root cause a few other bottlenecks emerge. The interesting
parts of the mutex-contention.stp report are pasted at the end of this
email. Here's the summary & my analysis:

When a domain is created (domainCreateWithFlags), the domain object's
lock is held. During the domain creation, various virSecurity
functions are called, which all grab the security manager's lock.
Since the security manager's lock is global, some fraction of
domainCreateWithFlags is serialized by this lock. Since some
virSecurity functions can take a long time, such as
virSecurityManagerGenLabel for the apparmor security driver, which
takes around 1s, the serialization that the security manager lock
induces in domainCreateWithFlags is substantial. Since the domain's
object lock is held all of this time, virDomainObjListSearchName
blocks, thereby serializing virConnectDefineXML via
virDomainObjListAdd, as you suggested earlier. Moreover, since the
virDomainObjList lock is held while blocking in
virDomainObjListSearchName, there's measurable contention whilst
looking up domains during domainCreateWithFlags.

Since some security driver operations are costly, I think it's
worthwhile to reduce the scope of the security manager lock or
increase the granularity by introducing more locks. After a cursory
look, the security manager lock seems to have a much broader scope
than necessary. The system / library calls underlying the security
drivers are all thread safe (e.g., defining apparmor security profiles
or chowning disk files), so a global lock isn't strictly necessary.
Moreover, since most virSecurity calls are made whilst a virDomainObj
lock is held and the security calls are generally domain specific,
*most* of the security calls are probably thread safe in the absence
of the global security manager lock. Obviously some work will have to
be done to see where the security lock actually matters and some
finer-grained locks will have to be introduced to handle these
situations.

I also think it's worthwhile to eliminate locking from the the
virDomainObjList lookups and traversals. Since virDomainObjLists are
accessed in a bunch of places, I think it's a good defensive idea to
decouple the performance of these accesses from virDomainObj locks,
which are held during potentially long-running operations like domain
creation. An easy way to divorce virDomainObjListSearchName from the
virDomainObj lock would be to keep a copy of the domain names in the
virDomainObjList and protect that list with the virDomainObjList lock.

What do you think?

Peter

==
stack contended 4 times, 261325 avg usec, 576521 max usec, 1045301
total usec, at
__lll_lock_wait+0x1c [libpthread-2.15.so]
_L_lock_858+0xf [libpthread-2.15.so]
__pthread_mutex_lock+0x3a [libpthread-2.15.so]
virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4]
qemuDomainGetXMLDesc+0x48 [libvirt_driver_qemu.so]
virDomainGetXMLDesc+0xf5 [libvirt.so.0.1000.4]
remoteDispatchDomainGetXMLDescHelper+0xb6 [libvirtd]
virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4]
virNetServerProcessMsg+0x2a [libvirt.so.0.1000.4]
virNetServerHandleJob+0x73 [libvirt.so.0.1000.4]
virThreadPoolWorker+0x10e
==
stack contended 12 times, 128053 avg usec, 992567 max usec, 1536640
total usec, at
__lll_lock_wait+0x1c [libpthread-2.15.so]
_L_lock_858+0xf [libpthread-2.15.so]
__pthread_mutex_lock+0x3a [libpthread-2.15.so]
virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4]
qemuDomainStartWithFlags+0x5a [libvirt_driver_qemu.so]
virDomainCreateWithFlags+0xf5 [libvirt.so.0.1000.4]
remoteDispatchDomainCreateWithFlagsHelper+0xbe [libvirtd]
virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4]
v

Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 06:18:57PM +0100, Daniel P. Berrange wrote:
> On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
> > > How many CPU cores are you testing on ?  That's a good improvement,
> > > but I'd expect the improvement to be greater as # of core is larger.
> > 
> > I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
> > software bottlenecks, I'm intentionally running fewer tasks (20 parallel
> > creations) than the number of logical cores (24). The memory, disk and
> > network are also well over provisioned.
> > 
> > > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
> > > limit a single connection to only 5 RPC calls. Beyond that calls
> > > queue up, even if libvirtd is otherwise idle. OpenStack uses a
> > > single connection for everythin so will hit this. I suspect this
> > > would be why  virConnectGetLibVersion would appear to be slow. That
> > > API does absolutely nothing of any consequence, so the only reason
> > > I'd expect that to be slow is if you're hitting a libvirtd RPC
> > > limit causing the API to be queued up.
> > 
> > I hadn't tuned libvirtd.conf at all. I have just increased
> > max_{clients,workers,requests,client_requests} to 50 and repeated my
> > experiment. As you expected, virtConnectGetLibVersion is now very fast.
> > Unfortunately, the median VM creation time didn't change.
> > 
> > > I'm not actively doing anything in this area. Mostly because I've got not
> > > clear data on where any remaining bottlenecks are.
> > 
> > Unless there are other parameters to tweak, I believe I'm still hitting a
> > bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for 
> > libvirt
> > calls are
> > 
> > virConnectDefineXML*: 13ms vs 4.5s
> > virDomainCreateWithFlags*: 1.8s vs 20s
> > 
> > * I had said that virConnectDefineXML wasn't serialized in my first email. I
> >   based that observation on a single trace I looked at :-) In the average 
> > case,
> >   virConnectDefineXML is affected by a bottleneck.
> 
> virConnectDefineXML would at least hit the possible bottleneck on
> the virDomainObjListAddLocked method. In fact that's pretty much
> the only contended lock I'd expect it to hit. Nothing else that
> it runs has any serious locking involved.
> 
> > Note that when I took these measurements, I also monitored CPU & disk
> > utilization.
> > During the 20 VM test, both CPU & disk were well below 100% for 97% of the 
> > test
> > (i.e., 60s test duration, measured utilization with atop using a 2
> > second interval,
> > CPU was pegged for 2s).
> > 
> > > One theory I had was that the virDomainObjListSearchName method could
> > > be a bottleneck, becaue that acquires a lock on every single VM. This
> > > is invoked when starting a VM, when we call virDomainObjListAddLocked.
> > > I tried removing this locking though & didn't see any performance
> > > benefit, so never persued this further.  Before trying things like
> > > this again, I think we'd need to find a way to actually identify where
> > > the true bottlenecks are, rather than guesswork.
> > 
> > Testing your hypothesis would be straightforward. I'll add some
> > instrumentation to
> > measure the time spent waiting for the locks and repeat my 20 VM 
> > experiment. Or,
> > if there's some systematic lock profiling in place, then I can turn
> > that on and report
> > the results.
> 
> There's no lock profiling support built-in to libvirt. I'm not sure
> of the best way introduce such support without it impacting the very
> thing we're trying to test.  Suggestions welcome
> 
> Perhaps a systemtap script would do a reasonable job at it though.
> eg record any stack traces associated with long futex_wait() system
> calls or something like that.

Oh someone has already written such a systemtap script

http://sourceware.org/systemtap/examples/process/mutex-contention.stp

I think that is preferrable to trying to embed special code in
libvirt for this task.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Peter Feiner
On Thu, May 16, 2013 at 1:18 PM, Daniel P. Berrange  wrote:
> On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
>> > How many CPU cores are you testing on ?  That's a good improvement,
>> > but I'd expect the improvement to be greater as # of core is larger.
>>
>> I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
>> software bottlenecks, I'm intentionally running fewer tasks (20 parallel
>> creations) than the number of logical cores (24). The memory, disk and
>> network are also well over provisioned.
>>
>> > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
>> > limit a single connection to only 5 RPC calls. Beyond that calls
>> > queue up, even if libvirtd is otherwise idle. OpenStack uses a
>> > single connection for everythin so will hit this. I suspect this
>> > would be why  virConnectGetLibVersion would appear to be slow. That
>> > API does absolutely nothing of any consequence, so the only reason
>> > I'd expect that to be slow is if you're hitting a libvirtd RPC
>> > limit causing the API to be queued up.
>>
>> I hadn't tuned libvirtd.conf at all. I have just increased
>> max_{clients,workers,requests,client_requests} to 50 and repeated my
>> experiment. As you expected, virtConnectGetLibVersion is now very fast.
>> Unfortunately, the median VM creation time didn't change.
>>
>> > I'm not actively doing anything in this area. Mostly because I've got not
>> > clear data on where any remaining bottlenecks are.
>>
>> Unless there are other parameters to tweak, I believe I'm still hitting a
>> bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
>> calls are
>>
>> virConnectDefineXML*: 13ms vs 4.5s
>> virDomainCreateWithFlags*: 1.8s vs 20s
>>
>> * I had said that virConnectDefineXML wasn't serialized in my first email. I
>>   based that observation on a single trace I looked at :-) In the average 
>> case,
>>   virConnectDefineXML is affected by a bottleneck.
>
> virConnectDefineXML would at least hit the possible bottleneck on
> the virDomainObjListAddLocked method. In fact that's pretty much
> the only contended lock I'd expect it to hit. Nothing else that
> it runs has any serious locking involved.

Okay cool, I'll measure this. I'll also try to figure out what
virDomainCreateWithFlags is waiting on.

>> Note that when I took these measurements, I also monitored CPU & disk
>> utilization.
>> During the 20 VM test, both CPU & disk were well below 100% for 97% of the 
>> test
>> (i.e., 60s test duration, measured utilization with atop using a 2
>> second interval,
>> CPU was pegged for 2s).
>>
>> > One theory I had was that the virDomainObjListSearchName method could
>> > be a bottleneck, becaue that acquires a lock on every single VM. This
>> > is invoked when starting a VM, when we call virDomainObjListAddLocked.
>> > I tried removing this locking though & didn't see any performance
>> > benefit, so never persued this further.  Before trying things like
>> > this again, I think we'd need to find a way to actually identify where
>> > the true bottlenecks are, rather than guesswork.
>>
>> Testing your hypothesis would be straightforward. I'll add some
>> instrumentation to
>> measure the time spent waiting for the locks and repeat my 20 VM experiment. 
>> Or,
>> if there's some systematic lock profiling in place, then I can turn
>> that on and report
>> the results.
>
> There's no lock profiling support built-in to libvirt. I'm not sure
> of the best way introduce such support without it impacting the very
> thing we're trying to test.  Suggestions welcome

A straightforward way to keep lock statistics with low overhead and
w/out affecting concurrency would be to use thread local storage
(TLS). At the end of a run, or periodically, the stats could be
aggregated and reported. Since the stats don't have to be precise,
it's OK to do the aggregation racily.

Simple statistics to keep are

* For each lock L, the time spent waiting.
* For each lock L and callsite C, the time spent waiting.

It would probably be sufficient to identify L as the lock's parent
class name. If per-instance stats are necessary, then we could add the
address of the object to the identity of L.

So pseudo code would look something like this:

struct lock_stats {
map of (lock_class) to unsigned long: wait_time;
map of (lock_class, stack_trace) to unsigned long: callsite_wait_time;
};

__thread struct lock_stats * lock_stats;

thread_local_storage_init() {
lock_stats = new lock_stats;
}

/* return microseconds elapsed since some arbitrary start time */
unsigned long timestamp(void) {
struct timespec timespec;
clock_gettime(CLOCK_MONOTONIC, ×pec);
return timespec.tv_sec * 1e6 + timespec.tv_sec / 10;
}

void virObjectLock(void *anyobj) {
unsigned long start, elapsed;
virObjectLockablePtr obj = anyobj;

start = timestamp();
virMutexLock(&obj->lock);
elapsed = timestamp() - start;

lock_stats->wait_time[obj

Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
> > How many CPU cores are you testing on ?  That's a good improvement,
> > but I'd expect the improvement to be greater as # of core is larger.
> 
> I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
> software bottlenecks, I'm intentionally running fewer tasks (20 parallel
> creations) than the number of logical cores (24). The memory, disk and
> network are also well over provisioned.
> 
> > Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
> > limit a single connection to only 5 RPC calls. Beyond that calls
> > queue up, even if libvirtd is otherwise idle. OpenStack uses a
> > single connection for everythin so will hit this. I suspect this
> > would be why  virConnectGetLibVersion would appear to be slow. That
> > API does absolutely nothing of any consequence, so the only reason
> > I'd expect that to be slow is if you're hitting a libvirtd RPC
> > limit causing the API to be queued up.
> 
> I hadn't tuned libvirtd.conf at all. I have just increased
> max_{clients,workers,requests,client_requests} to 50 and repeated my
> experiment. As you expected, virtConnectGetLibVersion is now very fast.
> Unfortunately, the median VM creation time didn't change.
> 
> > I'm not actively doing anything in this area. Mostly because I've got not
> > clear data on where any remaining bottlenecks are.
> 
> Unless there are other parameters to tweak, I believe I'm still hitting a
> bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
> calls are
> 
> virConnectDefineXML*: 13ms vs 4.5s
> virDomainCreateWithFlags*: 1.8s vs 20s
> 
> * I had said that virConnectDefineXML wasn't serialized in my first email. I
>   based that observation on a single trace I looked at :-) In the average 
> case,
>   virConnectDefineXML is affected by a bottleneck.

virConnectDefineXML would at least hit the possible bottleneck on
the virDomainObjListAddLocked method. In fact that's pretty much
the only contended lock I'd expect it to hit. Nothing else that
it runs has any serious locking involved.

> Note that when I took these measurements, I also monitored CPU & disk
> utilization.
> During the 20 VM test, both CPU & disk were well below 100% for 97% of the 
> test
> (i.e., 60s test duration, measured utilization with atop using a 2
> second interval,
> CPU was pegged for 2s).
> 
> > One theory I had was that the virDomainObjListSearchName method could
> > be a bottleneck, becaue that acquires a lock on every single VM. This
> > is invoked when starting a VM, when we call virDomainObjListAddLocked.
> > I tried removing this locking though & didn't see any performance
> > benefit, so never persued this further.  Before trying things like
> > this again, I think we'd need to find a way to actually identify where
> > the true bottlenecks are, rather than guesswork.
> 
> Testing your hypothesis would be straightforward. I'll add some
> instrumentation to
> measure the time spent waiting for the locks and repeat my 20 VM experiment. 
> Or,
> if there's some systematic lock profiling in place, then I can turn
> that on and report
> the results.

There's no lock profiling support built-in to libvirt. I'm not sure
of the best way introduce such support without it impacting the very
thing we're trying to test.  Suggestions welcome

Perhaps a systemtap script would do a reasonable job at it though.
eg record any stack traces associated with long futex_wait() system
calls or something like that.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Peter Feiner
> How many CPU cores are you testing on ?  That's a good improvement,
> but I'd expect the improvement to be greater as # of core is larger.

I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
software bottlenecks, I'm intentionally running fewer tasks (20 parallel
creations) than the number of logical cores (24). The memory, disk and
network are also well over provisioned.

> Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
> limit a single connection to only 5 RPC calls. Beyond that calls
> queue up, even if libvirtd is otherwise idle. OpenStack uses a
> single connection for everythin so will hit this. I suspect this
> would be why  virConnectGetLibVersion would appear to be slow. That
> API does absolutely nothing of any consequence, so the only reason
> I'd expect that to be slow is if you're hitting a libvirtd RPC
> limit causing the API to be queued up.

I hadn't tuned libvirtd.conf at all. I have just increased
max_{clients,workers,requests,client_requests} to 50 and repeated my
experiment. As you expected, virtConnectGetLibVersion is now very fast.
Unfortunately, the median VM creation time didn't change.

> I'm not actively doing anything in this area. Mostly because I've got not
> clear data on where any remaining bottlenecks are.

Unless there are other parameters to tweak, I believe I'm still hitting a
bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
calls are

virConnectDefineXML*: 13ms vs 4.5s
virDomainCreateWithFlags*: 1.8s vs 20s

* I had said that virConnectDefineXML wasn't serialized in my first email. I
  based that observation on a single trace I looked at :-) In the average case,
  virConnectDefineXML is affected by a bottleneck.

Note that when I took these measurements, I also monitored CPU & disk
utilization.
During the 20 VM test, both CPU & disk were well below 100% for 97% of the test
(i.e., 60s test duration, measured utilization with atop using a 2
second interval,
CPU was pegged for 2s).

> One theory I had was that the virDomainObjListSearchName method could
> be a bottleneck, becaue that acquires a lock on every single VM. This
> is invoked when starting a VM, when we call virDomainObjListAddLocked.
> I tried removing this locking though & didn't see any performance
> benefit, so never persued this further.  Before trying things like
> this again, I think we'd need to find a way to actually identify where
> the true bottlenecks are, rather than guesswork.

Testing your hypothesis would be straightforward. I'll add some
instrumentation to
measure the time spent waiting for the locks and repeat my 20 VM experiment. Or,
if there's some systematic lock profiling in place, then I can turn
that on and report
the results.

Thanks,
Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 12:09:39PM -0400, Peter Feiner wrote:
> Hello Daniel,
> 
> I've been working on improving scalability in OpenStack on libvirt+kvm
> for the last couple of months. I'm particularly interested in reducing
> the time it takes to create VMs when many VMs are requested in
> parallel.
> 
> One apparent bottleneck during virtual machine creation is libvirt. As
> more VMs are created in parallel, some libvirt calls (i.e.,
> virConnectGetLibVersion and virDomainCreateWithFlags) take longer
> without a commensurate increase in hardware utilization.
> 
> Thanks to your patches in libvirt-1.0.3, the situation has improved.
> Some libvirt calls OpenStack makes during VM creation (i.e.,
> virConnectDefineXML) have no measurable slowdown when many VMs are
> created in parallel. In turn, parallel VM creation in OpenStack is
> significantly faster with libvirt-1.0.3. On my standard benchmark
> (create 20 VMs in parallel, wait until the VM is ACTIVE, which is
> essentially after virDomainCreateWithFlags returns), libvirt-1.0.3
> reduces the median creation time from 90s to 60s when compared to
> libvirt-0.9.8.

How many CPU cores are you testing on ?  That's a good improvement,
but I'd expect the improvement to be greater as # of core is larger.

Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
limit a single connection to only 5 RPC calls. Beyond that calls
queue up, even if libvirtd is otherwise idle. OpenStack uses a
single connection for everythin so will hit this. I suspect this
would be why  virConnectGetLibVersion would appear to be slow. That
API does absolutely nothing of any consequence, so the only reason
I'd expect that to be slow is if you're hitting a libvirtd RPC
limit causing the API to be queued up.

> I'd like to know if your concurrency work in the qemu driver is
> ongoing. If it isn't, I'd like to pick the work up myself and work on
> further improvements. Any advice or insight would be appreciated.

I'm not actively doing anything in this area. Mostly because I've got not
clear data on where any remaining bottlenecks are. 

One theory I had was that the virDomainObjListSearchName method could
be a bottleneck, becaue that acquires a lock on every single VM. This
is invoked when starting a VM, when we call virDomainObjListAddLocked.
I tried removing this locking though & didn't see any performance
benefit, so never persued this further.  Before trying things like
this again, I think we'd need to find a way to actually identify where
the true bottlenecks are, rather than guesswork.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Peter Feiner
Hello Daniel,

I've been working on improving scalability in OpenStack on libvirt+kvm
for the last couple of months. I'm particularly interested in reducing
the time it takes to create VMs when many VMs are requested in
parallel.

One apparent bottleneck during virtual machine creation is libvirt. As
more VMs are created in parallel, some libvirt calls (i.e.,
virConnectGetLibVersion and virDomainCreateWithFlags) take longer
without a commensurate increase in hardware utilization.

Thanks to your patches in libvirt-1.0.3, the situation has improved.
Some libvirt calls OpenStack makes during VM creation (i.e.,
virConnectDefineXML) have no measurable slowdown when many VMs are
created in parallel. In turn, parallel VM creation in OpenStack is
significantly faster with libvirt-1.0.3. On my standard benchmark
(create 20 VMs in parallel, wait until the VM is ACTIVE, which is
essentially after virDomainCreateWithFlags returns), libvirt-1.0.3
reduces the median creation time from 90s to 60s when compared to
libvirt-0.9.8.

I'd like to know if your concurrency work in the qemu driver is
ongoing. If it isn't, I'd like to pick the work up myself and work on
further improvements. Any advice or insight would be appreciated.

Thanks!
Peter Feiner

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list