Re: [libvirt] Entering freeze for libvirt-1.2.21

2015-10-30 Thread Daniel Veillard
  Hi Guido,

On Fri, Oct 30, 2015 at 10:00:41PM +0100, Guido Günther wrote:
> On Thu, Oct 29, 2015 at 03:28:51PM +0800, Daniel Veillard wrote:
> >   As pointed our on Tuesday it's time for a new release. I have tagged
> > the release candidate 1 in git and pushed signed tarball and rpms to
> > the usual place at:
> > 
> >ftp://libvirt.org/libvirt/
> > 
> >   Based on my limited testing this works just fine, but that's very limited
> > and doesn't test portability at all, so please give it a try !
> 
> I'm having trouble verifying the signature:
> 
> $ gpg --verify libvirt-1.2.21-rc1.tar.gz.pgp libvirt-1.2.21-rc1.tar.gz
> gpg: Signature made Do 29 Okt 2015 07:41:52 CET
> gpg:using DSA key 0x4606B8A5DE95BC1F
> gpg: please do a --check-trustdb
> gpg: BAD signature from "Daniel Veillard (Red Hat work email) 
> " [unknown]
> 
> while verifying e.g. 1.2.20 works as expected.

  Hum, where is libvirt-1.2.21-rc1.tar.gz.pgp coming from ? I only uploaded
libvirt-1.2.21-rc1.tar.gz.asc !

  that said indeed there is an issue with rc1 signing ...

[root@libvirt libvirt]# gpg2 --keyserver hkp://pgp.mit.edu --recv-keys 
DE95BC1Fgpg: requesting key DE95BC1F from hkp server pgp.mit.edu
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key DE95BC1F: public key "Daniel Veillard (Red Hat work email) 
" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg:   imported: 1
[root@libvirt libvirt]# gpg --verify libvirt-1.2.20.tar.gz.asc 
libvirt-1.2.20.tar.gz
gpg: Signature made Fri 02 Oct 2015 01:12:08 PM CEST using DSA key ID DE95BC1F
gpg: Good signature from "Daniel Veillard (Red Hat work email) 
"
gpg: aka "Daniel Veillard "
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: C744 15BA 7C9C 7F78 F02E  1DC3 4606 B8A5 DE95 BC1F
[root@libvirt libvirt]# gpg --verify libvirt-1.2.21-rc1.tar.gz.asc 
libvirt-1.2.21-rc1.tar.gz
gpg: Signature made Thu 29 Oct 2015 07:41:52 AM CET using DSA key ID DE95BC1F
gpg: BAD signature from "Daniel Veillard (Red Hat work email) 
"
[root@libvirt libvirt]#

  I verified, the libvirt-1.2.21-rc1.tar.gz.asc present on libvirt server is
the same that I have left in my working dir of the machine where I assembled
the release.
  On the other hand libvirt-1.2.21-rc1.tar.gz diverges

thinkpad2:~/libvirt -> sha256sum libvirt-1.2.21-rc1.tar.gz
3cc9f2882a145562ee41b8369a8c3d1cb0f383fe13c3e39ac923f712bf8614d0  
libvirt-1.2.21-rc1.tar.gz
thinkpad2:~/libvirt ->

and 

[root@libvirt libvirt]# sha256sum libvirt-1.2.21-rc1.tar.gz
00cce64d4eb906f294921effab7b0128dbded46da614f9d88681abdb80af0ae2  
libvirt-1.2.21-rc1.tar.gz
[root@libvirt libvirt]# 

  I remember that I interrupted the rsync when pushing the release and restarted
it this may have introduced that divergence, I reuploaded the rc1:

[root@libvirt libvirt]# sha256sum libvirt-1.2.21-rc1.tar.gz
3cc9f2882a145562ee41b8369a8c3d1cb0f383fe13c3e39ac923f712bf8614d0  
libvirt-1.2.21-rc1.tar.gz
[root@libvirt libvirt]# sha256sum libvirt-1.2.21-rc1.tar.gz.asc
9bfb1fe53c5d1457d5bc6a4f7ce4661ad925210f9ab2708bd0c523accf16f5e5  
libvirt-1.2.21-rc1.tar.gz.asc
[root@libvirt libvirt]# gpg --verify libvirt-1.2.21-rc1.tar.gz.asc 
libvirt-1.2.21-rc1.tar.gz
gpg: Signature made Thu 29 Oct 2015 07:41:52 AM CET using DSA key ID DE95BC1F
gpg: Good signature from "Daniel Veillard (Red Hat work email) 
"
gpg: aka "Daniel Veillard "
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: C744 15BA 7C9C 7F78 F02E  1DC3 4606 B8A5 DE95 BC1F
[root@libvirt libvirt]# 

  and that version is fine,

   thanks for the heads-up !

Daniel

-- 
Daniel Veillard  | Open Source and Standards, Red Hat
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Entering freeze for libvirt-1.2.21

2015-10-30 Thread Guido Günther
On Thu, Oct 29, 2015 at 03:28:51PM +0800, Daniel Veillard wrote:
>   As pointed our on Tuesday it's time for a new release. I have tagged
> the release candidate 1 in git and pushed signed tarball and rpms to
> the usual place at:
> 
>ftp://libvirt.org/libvirt/
> 
>   Based on my limited testing this works just fine, but that's very limited
> and doesn't test portability at all, so please give it a try !

I'm having trouble verifying the signature:

$ gpg --verify libvirt-1.2.21-rc1.tar.gz.pgp libvirt-1.2.21-rc1.tar.gz
gpg: Signature made Do 29 Okt 2015 07:41:52 CET
gpg:using DSA key 0x4606B8A5DE95BC1F
gpg: please do a --check-trustdb
gpg: BAD signature from "Daniel Veillard (Red Hat work email) 
" [unknown]

while verifying e.g. 1.2.20 works as expected.
Cheers,
 -- Guido

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Laine Stump

On 10/30/2015 01:44 PM, Tony Krowiak wrote:

On 10/29/2015 01:48 PM, Laine Stump wrote:

On 10/29/2015 12:49 PM, Tony Krowiak wrote:

For a guest domain defined with a large number of macvtap devices, it takes an 
exceedingly long time to boot the guest. In a test of a guest domain configured 
with 82 macvtap devices, it took over two minutes for the guest to boot. An 
strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl 
literally being invoked 3,403 times. I was able to isolate the source of the 
ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile*  function 
in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a 
counter variable, starting with zero, and appending the counter value to 
'macvtap'.


I've wondered ever since the first time I saw that code why it was 
done that way, and why there had never been any performance 
complaints. Lacking any complaints, I promptly forgot about it (until 
the next time I went past the code for some other tangentially 
related reason.)


Since you're the first to complain, you have the honor of fixing it :-)

Thank you for that honor.



With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS ioctl) 
to determine if a device with that name already exists, until a unique name is 
created. In the test case cited above, to create an interface name for the 82nd 
macvtap device, the*virNetDevExists*  function will be called for interface 
names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be 
used. So if N is the number of macvtap interfaces defined for a guest, the 
SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused 
macvtap device names. That's assuming only one guest is being started, who 
knows how many times the ioctl may have to be called in an installation running 
a large number of guests defined with macvtap devices.

I was able to reduce the amount of time for starting a guest domain defined 
with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping 
track of the interface name suffixes previously used. I defined two static bit 
maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a 
macvtap/macvlan device is created, the index of the next clear bit 
(virBitmapNextClearBit) is retrieved to create the name. If an interface with 
that name does not exist, the device is created and the bit at the index used 
to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan 
device is deleted, if the interface name has the pattern 'macvtap%d' or 
'macvlan%d', the suffix is parsed into a bit index and used to clear the 
(virBitMapClearBit) bit in the respective bitmap.


This sounds fine, as long as 1) you recreate the bitmap whenever 
libvirtd is restarted (while scanning through all the interfaces of 
every domain; there is already code being executed in exactly the 
right place - look for qemu_process.c:qemuProcessNotifyNets() and add 
appropriate code inside the loop there), and 2) you retry some number 
of times if a supposedly unused device name is actually in use (to 
account for processes other than libvirt using the same naming 
convention).



I am not sure that is the best design because there is no way to track 
interface names used to create macvtap devices outside of libvirt, for example 
using the ip command.


If you wanted to get *really* complicated, you could use netlink to 
get a list of all network devices, or even monitor netlink traffic to 
maintain your own cache, but I think that's serious overkill (until 
proven otherwise).
I agree, I think this would be overkill. I think it would require that 
we track the complete interface names as opposed to maintaining a 
bitmap of interface name suffixes.

  There may also be other issues I've not contemplated. I included a couple of 
additional ideas below and am looking for comments or other suggestions that I 
have not considered.

  * Define a global counter variable initialized to 0, that gets
incremented each time an interface name is created, to keep
track of the last used interface name suffix. At some maximum
value, the counter will be set back to 0.



There could be some merit to this, as it is simpler and likely 
faster. You would need to maintain the counter somewhere in 
persistent storage so it could be retrieved when libvirtd is 
restarted though.
I have a problem with this one, because certain scenarios could 
introduce performance issues, for example:


  * Guest1, defined with 1 macvtap device is started and the
'macvtap0' device is created
  * A plethora of guests are subsequently defined, such that there are
no gaps between interface names 'macvtap0' and 'macvtap5100'
  * Guest1 is deleted, thus removing the 'macvtap0' device
  * Additional guests are defined until the counter recycles back to 0
  * GuestN is defined with more than one macvtap device. When guestN
is started, the 'macvta

[libvirt] [PATCH 0/8] Few vfio related fixes

2015-10-30 Thread Shivaprasad G Bhat
The series fixes 3 issues which lead to host crash.
The actual fixes are in p3, p4, p7 and p8.

I'll fill in more details / descriptions in v2.
The make check fails on virpcitest now, which I am still debugging.
I anticipate some rework from review on p7 which I continue to
refine. Sending now, to get any feedback that might change things
drastically.
 
---

Shivaprasad G Bhat (8):
  Initialize the stubDriver of pci devices if bound to a valid one
  Add iommu group number info to virPCIDevice
  Refuse to reattach from vfio if the iommu group is in use by any domain
  Wait for vfio-pci device cleanups before reassinging the device to host 
driver
  Split reprobe action from the virPCIUnbindFromStub into a new function
  Pass activeDevs and inactiveDevs to virPCIDeviceUnbindFromStub and 
virPCIDeviceBindToStub
  Postpone reprobing till all the devices in iommu group are unbound from 
vfio
  Wait for iommmu device to go away before reprobing the host driver


 src/util/virhostdev.c |   18 
 src/util/virpci.c |  206 ++---
 2 files changed, 177 insertions(+), 47 deletions(-)

--
Signature

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Tony Krowiak

On 10/29/2015 01:48 PM, Laine Stump wrote:

On 10/29/2015 12:49 PM, Tony Krowiak wrote:

For a guest domain defined with a large number of macvtap devices, it takes an 
exceedingly long time to boot the guest. In a test of a guest domain configured 
with 82 macvtap devices, it took over two minutes for the guest to boot. An 
strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl 
literally being invoked 3,403 times. I was able to isolate the source of the 
ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile*  function 
in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a 
counter variable, starting with zero, and appending the counter value to 
'macvtap'.


I've wondered ever since the first time I saw that code why it was 
done that way, and why there had never been any performance 
complaints. Lacking any complaints, I promptly forgot about it (until 
the next time I went past the code for some other tangentially related 
reason.)


Since you're the first to complain, you have the honor of fixing it :-)

Thank you for that honor.



With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS ioctl) 
to determine if a device with that name already exists, until a unique name is 
created. In the test case cited above, to create an interface name for the 82nd 
macvtap device, the*virNetDevExists*  function will be called for interface 
names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be 
used. So if N is the number of macvtap interfaces defined for a guest, the 
SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused 
macvtap device names. That's assuming only one guest is being started, who 
knows how many times the ioctl may have to be called in an installation running 
a large number of guests defined with macvtap devices.

I was able to reduce the amount of time for starting a guest domain defined 
with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping 
track of the interface name suffixes previously used. I defined two static bit 
maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a 
macvtap/macvlan device is created, the index of the next clear bit 
(virBitmapNextClearBit) is retrieved to create the name. If an interface with 
that name does not exist, the device is created and the bit at the index used 
to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan 
device is deleted, if the interface name has the pattern 'macvtap%d' or 
'macvlan%d', the suffix is parsed into a bit index and used to clear the 
(virBitMapClearBit) bit in the respective bitmap.


This sounds fine, as long as 1) you recreate the bitmap whenever 
libvirtd is restarted (while scanning through all the interfaces of 
every domain; there is already code being executed in exactly the 
right place - look for qemu_process.c:qemuProcessNotifyNets() and add 
appropriate code inside the loop there), and 2) you retry some number 
of times if a supposedly unused device name is actually in use (to 
account for processes other than libvirt using the same naming 
convention).



I am not sure that is the best design because there is no way to track 
interface names used to create macvtap devices outside of libvirt, for example 
using the ip command.


If you wanted to get *really* complicated, you could use netlink to 
get a list of all network devices, or even monitor netlink traffic to 
maintain your own cache, but I think that's serious overkill (until 
proven otherwise).
I agree, I think this would be overkill. I think it would require that 
we track the complete interface names as opposed to maintaining a bitmap 
of interface name suffixes.

  There may also be other issues I've not contemplated. I included a couple of 
additional ideas below and am looking for comments or other suggestions that I 
have not considered.

  * Define a global counter variable initialized to 0, that gets
incremented each time an interface name is created, to keep track
of the last used interface name suffix. At some maximum value,
the counter will be set back to 0.



There could be some merit to this, as it is simpler and likely faster. 
You would need to maintain the counter somewhere in persistent storage 
so it could be retrieved when libvirtd is restarted though.
I have a problem with this one, because certain scenarios could 
introduce performance issues, for example:


 * Guest1, defined with 1 macvtap device is started and the 'macvtap0'
   device is created
 * A plethora of guests are subsequently defined, such that there are
   no gaps between interface names 'macvtap0' and 'macvtap5100'
 * Guest1 is deleted, thus removing the 'macvtap0' device
 * Additional guests are defined until the counter recycles back to 0
 * GuestN is defined with more than one macvtap device. When guestN is
   started, the 'macvtap0' device will get created for it right off the
   ba

Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Laine Stump

On 10/30/2015 11:56 AM, Tony Krowiak wrote:

On 10/30/2015 06:49 AM, Michal Privoznik wrote:

On 29.10.2015 18:48, Laine Stump wrote:

On 10/29/2015 12:49 PM, Tony Krowiak wrote:

For a guest domain defined with a large number of macvtap devices, it
takes an exceedingly long time to boot the guest. In a test of a guest
domain configured with 82 macvtap devices, it took over two minutes
for the guest to boot. An strace of the ioctl calls during guest start
up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
I was able to isolate the source of the ioctl calls to
the*virNetDevMacVLanCreateWithVPortProfile*  function
in*virnetdevmacvlan.c*. The macvtap interface name is created by
looping over a counter variable, starting with zero, and appending the
counter value to 'macvtap'.

I've wondered ever since the first time I saw that code why it was done
that way, and why there had never been any performance complaints.
Lacking any complaints, I promptly forgot about it (until the next time
I went past the code for some other tangentially related reason.)

Since you're the first to complain, you have the honor of fixing it :-)


With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
ioctl) to determine if a device with that name already exists, until a
unique name is created. In the test case cited above, to create an
interface name for the 82nd macvtap device, the*virNetDevExists*
function will be called for interface names 'macvtap0' to 'macvtap80'
before it is determined that 'mavtap81' can be used. So if N is the
number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
device names. That's assuming only one guest is being started, who
knows how many times the ioctl may have to be called in an
installation running a large number of guests defined with macvtap
devices.

Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
contained v1.2.2~32) if two threads were starting a domain concurrently,
they even competed with each other in that specific area of the code.


That's only for the veth devices used by lxc, which is a separate thing 
from the macvtap devices (rough description: a veth is a pair of netdevs 
where the gazinta of one is the gazouta of the other and vice versa, 
while a macvtap is a chardev on one side, and is a netdev permanently 
attached "to the side" of another netdev at a point where the two can't 
see each others' traffic, but can both see all traffic going in and out 
the original device).


veth creation does suffer from the same problem though - each time you 
want to create a veth pair, libvirt will check for an unused device name 
by calling virNetDevExists() starting at vnet0, so it could benefit from 
the same code if it was made agnostic of basename (note that all 
standard tap devices would need to be marked in such a bitmap, since 
they also use the name "vnet%d").


(An aside: I'm a bit surprised to see that we are creating veths using 
"ip link add", since the same thing can be done with a netlink message 
and would remove the need to exec an external binary. Not that I haven't 
done the same thing myself when it was expedient :-)



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/8] Initialize the stubDriver of pci devices if bound to a valid one

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 20:34 +0530, Shivaprasad bhat wrote:
> > > @@ -1556,6 +1563,8 @@ virPCIDeviceNew(unsigned int domain,
> > >  virPCIDevicePtr dev;
> > >  char *vendor = NULL;
> > >  char *product = NULL;
> > > +char *drvpath = NULL;
> > > +char *driver = NULL;
> > > 
> > >  if (VIR_ALLOC(dev) < 0)
> > >  return NULL;
> > > @@ -1603,9 +1612,16 @@ virPCIDeviceNew(unsigned int domain,
> > >  goto error;
> > >  }
> > > 
> > > +if (virPCIDeviceGetDriverPathAndName(dev, &drvpath, &driver) < 0)
> > > +goto cleanup;
> > > +
> > > +if (virPCIIsAKnownStub(driver))
> > > +dev->stubDriver = driver;
> > > +
> > >  VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
> > > 
> > >   cleanup:
> > > +VIR_FREE(drvpath);
> > >  VIR_FREE(product);
> > >  VIR_FREE(vendor);
> > >  return dev;
> > 
> > What are you doing this for? AFAICT you're using this so you
> > can, in Patch 7, do
> > 
> >   pci = virPCIDeviceNew(...);
> >   if (STREQ_NULLABLE(pci->stubDriver, "vfio-pci"))
> >   ...
> > 
> > Is that so, or is there another reason I'm missing?
> > 
> 
> Its used in P3 as well in virHostdevPCINodeDeviceReAttach().
> I want to keep that function as simple as it is now.
>  And as you pointed out, i am using it in P7 too.Hope its okay now.

I don't see how it's used in that function, as neither the
function itself nor the calls to virHostdevIsPCINodeDeviceUsed()
you've added with Patch 3 seem to touch dev->stubDriver...

Please walk me through it, I'm probably just missing it
because it's Friday :)

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] storage: Ignore block devices that fail format detection

2015-10-30 Thread John Ferlan
https://bugzilla.redhat.com/show_bug.cgi?id=1276198

Prior to commit id '98322052' failure to saferead the block device would
cause an error to be logged and the device to be skipped while attempting
to discover/create a stable target path for a new LUN (NPIV).

This was because virStorageBackendSCSIFindLUs ignored errors from
processLU and virStorageBackendSCSINewLun.

Ignoring the failure allowed a multipath device with an "active" and
"ghost" to be present on the host with the "ghost" block device being
ignored. This patch will return a -2 to the caller indicating the desire
to ignore the block device since it cannot be used directly rather than
fail the pool startup.

Additionally, it was found during some debugging that it was possible
for the virStorageBackendDetectBlockVolFormatFD to not detect a format,
which while not a probably - we probably should at least add some sort
of warning message.

Signed-off-by: John Ferlan 
---
 src/storage/storage_backend.c  | 4 
 src/storage/storage_backend_scsi.c | 7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c
index a375fe0..2de606f 100644
--- a/src/storage/storage_backend.c
+++ b/src/storage/storage_backend.c
@@ -1393,6 +1393,10 @@ 
virStorageBackendDetectBlockVolFormatFD(virStorageSourcePtr target,
 }
 }
 
+if (target->format == VIR_STORAGE_POOL_DISK_UNKNOWN)
+VIR_WARN("cannot determine the target format for '%s'",
+ target->path);
+
 return 0;
 }
 
diff --git a/src/storage/storage_backend_scsi.c 
b/src/storage/storage_backend_scsi.c
index a593a2b..d60473d 100644
--- a/src/storage/storage_backend_scsi.c
+++ b/src/storage/storage_backend_scsi.c
@@ -224,9 +224,14 @@ virStorageBackendSCSINewLun(virStoragePoolObjPtr pool,
 goto cleanup;
 }
 
+/* Failing to process the format is not fatal - we'll just skip
+ * this volume.
+ */
 if (virStorageBackendUpdateVolInfo(vol, true,
-   VIR_STORAGE_VOL_OPEN_DEFAULT) < 0)
+   VIR_STORAGE_VOL_OPEN_DEFAULT) < 0) {
+retval = -2;
 goto cleanup;
+}
 
 if (!(vol->key = virStorageBackendSCSISerial(vol->target.path)))
 goto cleanup;
-- 
2.1.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 8/8] Wait for iommmu device to go away before reprobing the host driver

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 05:04 +0530, Shivaprasad G Bhat wrote:
> Author: Shivaprasad G Bhat 

This line is redundant since the information is
already stored in git, plus you have the Signed-off-by
below. This applies to Patch 7 as well.

> 
> There could be a delay of 1 or 2 seconds before the vfio-pci driver
> is unbound and the device file /dev/vfio/ is actually
> removed. If the file exists, the host driver probing the device
> can lead to crash. So, wait and avoid the crash. Setting the
> timeout to 15 seconds for now.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |   39 +++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 425c1a7..6bf640d 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1097,6 +1097,42 @@ static bool virPCIIsAKnownStub(char *driver)
>  return ret;
>  }
>  
> +#define VFIO_UNBIND_TIMEOUT 15

There's one trailing space here, and git complains about
it when applying the patch. syntax-check complains too.

> +
> +/* It is not safe to initiate host driver probe if the vfio driver has not
> + * completely unbound the device.
> + * So, return if the unbind didn't complete in 15 seconds.
> + */
> +static int virPCIWaitForVfioUnbindCompletion(virPCIDevicePtr dev)

Return type on a separate line, please. VFIO is an
acronym, like PCI, and should always be all-caps.

> +{
> +int retry = 0;
> +int ret = -1;
> +char *path = NULL;
> +
> +if (!(path = virPCIDeviceGetIOMMUGroupDev(dev)))
> +goto cleanup;
> +
> +while (retry++ < VFIO_UNBIND_TIMEOUT) {
> +if (!virFileExists(path))
> +break;
> + sleep(1);

Do we want this to be in seconds, or would a higher
granularity be better?

> +}
> +
> +if (virFileExists(path)) {
> +virReportError(VIR_ERR_INTERNAL_ERROR,
> +   _("The VFIO unbind not completed even after %d 
> seconds for device %.4x:%.2x:%.2x.%.1x"),
> +   retry, dev->domain, dev->bus, dev->slot, 
> dev->function);

This line is very long, please split the error message into
two parts. I'd say "The" and "even" can be dropped. You should
use dev->name instead of formatting the name yourself.

> +goto cleanup;
> +}
> +
> +ret = 0;
> +cleanup :

Please remove the space between the label and the semicolon,
and add one in front of the label. I'd also add an empty
line before the label.

> +VIR_FREE(path);
> +return ret;
> +
> +}
> +
> +
>  static int virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, char *driver, 
> char *drvdir)
>  {
>  char *path = NULL;
> @@ -1203,6 +1239,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev,
>  goto cleanup;
>  }
>  
> +if (virPCIWaitForVfioUnbindCompletion(dev) < 0)
> +goto cleanup;
> +
>  while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) {
>  virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i);
>  if (dev->iommuGroup == pcidev->iommuGroup) {

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2/8] Add iommu group number info to virPCIDevice

2015-10-30 Thread Andrea Bolognani
n Fri, 2015-10-30 at 04:56 +0530, Shivaprasad G Bhat wrote:
> The iommu group number need not be fetched from the sysfs
> everytime as it remains constant. Fetch it once during
> allocation
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 5acf486..eba285a 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -75,6 +75,7 @@ struct _virPCIDevice {
>  bool  has_pm_reset;
>  bool  managed;
>  char  *stubDriver;
> +unsigned int  iommuGroup;

This can't be unsigned int, as
virPCIDeviceAddressGetIOMMUGroupNum() will return -2 if
there is no IOMMU group for the device.

>  
>  /* used by reattach function */
>  bool  unbind_from_stub;
> @@ -1565,6 +1566,8 @@ virPCIDeviceNew(unsigned int domain,
>  char *product = NULL;
>  char *drvpath = NULL;
>  char *driver = NULL;
> +virPCIDeviceAddress devAddr = { domain, bus,
> +slot, function };
>  
>  if (VIR_ALLOC(dev) < 0)
>  return NULL;
> @@ -1618,6 +1621,8 @@ virPCIDeviceNew(unsigned int domain,
>  if (virPCIIsAKnownStub(driver))
>  dev->stubDriver = driver;
>  
> +dev->iommuGroup = virPCIDeviceAddressGetIOMMUGroupNum(&devAddr);
> +
>  VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
>  
>   cleanup:

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/8] Initialize the stubDriver of pci devices if bound to a valid one

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 04:56 +0530, Shivaprasad G Bhat wrote:
> The stubDriver name can be used to make useful decisions if readily available.
> Set it if bound to a valid one during initialisation.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |   36 ++--
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 35b1459..5acf486 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1080,6 +1080,22 @@ static const char *virPCIKnownStubs[] = {
>  NULL
>  };
>  
> +static bool virPCIIsAKnownStub(char *driver)

Return type on a separate line, please.

> +{
> +const char **stubTest;
> +bool ret = false;
> +
> +for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) {
> +if (STREQ_NULLABLE(driver, *stubTest)) {
> +ret = true;
> +VIR_DEBUG("Found stub driver %s", *stubTest);
> +break;
> +}
> +}
> +
> +return ret;
> +}
> +
>  static int
>  virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  {
> @@ -1087,8 +1103,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  char *drvdir = NULL;
>  char *path = NULL;
>  char *driver = NULL;
> -const char **stubTest;
> -bool isStub = false;
>  
>  /* If the device is currently bound to one of the "well known"
>   * stub drivers, then unbind it, otherwise ignore it.
> @@ -1105,14 +1119,7 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  goto remove_slot;
>  
>  /* If the device isn't bound to a known stub, skip the unbind. */
> -for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) {
> -if (STREQ(driver, *stubTest)) {
> -isStub = true;
> -VIR_DEBUG("Found stub driver %s", *stubTest);
> -break;
> -}
> -}
> -if (!isStub)
> +if (!virPCIIsAKnownStub(driver))
>  goto remove_slot;
>  
>  if (virPCIDeviceUnbind(dev, dev->reprobe) < 0)

I would split this patch in two: everything above is just moving
an existing check into a separate function, without introducing
any functional change.

The code below, on the other hand, is changing the behavior of
virPCIDeviceNew() and as such should go in its own patch. But
see below.

> @@ -1556,6 +1563,8 @@ virPCIDeviceNew(unsigned int domain,
>  virPCIDevicePtr dev;
>  char *vendor = NULL;
>  char *product = NULL;
> +char *drvpath = NULL;
> +char *driver = NULL;
>  
>  if (VIR_ALLOC(dev) < 0)
>  return NULL;
> @@ -1603,9 +1612,16 @@ virPCIDeviceNew(unsigned int domain,
>  goto error;
>  }
>  
> +if (virPCIDeviceGetDriverPathAndName(dev, &drvpath, &driver) < 0)
> +goto cleanup;
> +
> +if (virPCIIsAKnownStub(driver))
> +dev->stubDriver = driver;
> +
>  VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
>  
>   cleanup:
> +VIR_FREE(drvpath);
>  VIR_FREE(product);
>  VIR_FREE(vendor);
>  return dev;

What are you doing this for? AFAICT you're using this so you
can, in Patch 7, do

  pci = virPCIDeviceNew(...);
  if (STREQ_NULLABLE(pci->stubDriver, "vfio-pci"))
  ...

Is that so, or is there another reason I'm missing?

If the former, I'd rather not overload the meaning of
dev->stubDriver and call virPCIDeviceGetDriverPathAndName()
explicitly in Patch 7, so that the intentions behind the
code are abundantly clear.

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Tony Krowiak

On 10/30/2015 06:49 AM, Michal Privoznik wrote:

On 29.10.2015 18:48, Laine Stump wrote:

On 10/29/2015 12:49 PM, Tony Krowiak wrote:

For a guest domain defined with a large number of macvtap devices, it
takes an exceedingly long time to boot the guest. In a test of a guest
domain configured with 82 macvtap devices, it took over two minutes
for the guest to boot. An strace of the ioctl calls during guest start
up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
I was able to isolate the source of the ioctl calls to
the*virNetDevMacVLanCreateWithVPortProfile*  function
in*virnetdevmacvlan.c*. The macvtap interface name is created by
looping over a counter variable, starting with zero, and appending the
counter value to 'macvtap'.

I've wondered ever since the first time I saw that code why it was done
that way, and why there had never been any performance complaints.
Lacking any complaints, I promptly forgot about it (until the next time
I went past the code for some other tangentially related reason.)

Since you're the first to complain, you have the honor of fixing it :-)


With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
ioctl) to determine if a device with that name already exists, until a
unique name is created. In the test case cited above, to create an
interface name for the 82nd macvtap device, the*virNetDevExists*
function will be called for interface names 'macvtap0' to 'macvtap80'
before it is determined that 'mavtap81' can be used. So if N is the
number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
device names. That's assuming only one guest is being started, who
knows how many times the ioctl may have to be called in an
installation running a large number of guests defined with macvtap
devices.

Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
contained v1.2.2~32) if two threads were starting a domain concurrently,
they even competed with each other in that specific area of the code.


I was able to reduce the amount of time for starting a guest domain
defined with 82 macvtap devices from over 2 minutes to about 14
seconds by keeping track of the interface name suffixes previously
used. I defined two static bit maps (virBitmap), one each for macvtap
and macvlan device name suffixes. When a macvtap/macvlan device is
created, the index of the next clear bit (virBitmapNextClearBit) is
retrieved to create the name. If an interface with that name does not
exist, the device is created and the bit at the index used to create
the interface name is set (virBitmapSetBit). When a macvtap/macvlan
device is deleted, if the interface name has the pattern 'macvtap%d'
or 'macvlan%d', the suffix is parsed into a bit index and used to
clear the (virBitMapClearBit) bit in the respective bitmap.

This sounds fine, as long as 1) you recreate the bitmap whenever
libvirtd is restarted (while scanning through all the interfaces of
every domain; there is already code being executed in exactly the right
place - look for qemu_process.c:qemuProcessNotifyNets() and add
appropriate code inside the loop there), and 2) you retry some number of
times if a supposedly unused device name is actually in use (to account
for processes other than libvirt using the same naming convention).

How about re-using the approach we have for virPortAllocator? We
maintain a bitmap of ports. On acquiring new port, we try to bind() to
it. If we succeeded, we set the corresponding bit in the bitmap. Of
course it may happen that a port in the host is already taken but our
bitmap does not think so. That's okay. We just leave the corresponding
bit alone => if we would set it as used, nobody will ever unset it.
Moreover, we will try the port next time, and it may be free.

Moreover, the bitmap is not saved anywhere, nor restored on daemon
restart - this could be changed though.

So what am I saying is practically the same as Laine, just extending his
thoughts and giving you an example how to proceed further :)
I appreciate the input. This is similar to the first solution I 
proposed, which I actually implemented and tested. It is described above.


Michal



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 7/8] Postpone reprobing till all the devices in iommu group are unbound from vfio

2015-10-30 Thread Andrea Bolognani
I know you're already working on a v2 of this, just a
couple of quick remarks below.

On Fri, 2015-10-30 at 05:01 +0530, Shivaprasad G Bhat wrote:
> Author: Shivaprasad G Bhat 
> 
> The host will crash if a device is bound to host driver when the device
> belonging to same iommu group is in use by any of the guests. So,
> do the host driver probe only after all the devices in the iommu group
> have unbound from the vfio.
> 
> The patch fixes
> https://bugzilla.redhat.com/show_bug.cgi?id=1272300
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |   50 +-
>  1 file changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 6c24a81..425c1a7 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1128,6 +1128,23 @@ static int 
> virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, char *driver, char
>  return result;
>  }
>  
> +static int virPCIDeviceBoundToVFIODriver(virPCIDeviceAddressPtr devAddr, 
> void *opaque ATTRIBUTE_UNUSED)

Return type on a separate line, please.

> +{
> +int ret = 0;
> +virPCIDevicePtr pci = NULL;
> +
> +if (!(pci = virPCIDeviceNew(devAddr->domain, devAddr->bus,
> +devAddr->slot, devAddr->function)))
> +goto cleanup;
> +
> +if (STREQ_NULLABLE(pci->stubDriver, "vfio-pci"))
> +ret = -1;

As mentioned in the comment for Patch 1, I think this is
fairly obscure: if I had not read throught the whole
series, I'd assume this checks whether the device is
configured to use vfio-pci as stub driver, not whether
it is currently bound to it, and it would not be
immediately clear to me how this check fits in a function
called virPCIDeviceBoundToVFIODriver().

I think it would be much cleaner to query the driver
explicitly using virPCIDeviceGetDriverPathAndName() and
remove that call from virPCIDeviceNew().

> +
> + cleanup:
> +virPCIDeviceFree(pci);
> +return ret;
> +}
> +
>  static int
>  virPCIDeviceUnbindFromStub(virPCIDevicePtr dev,
> virPCIDeviceListPtr activeDevs ATTRIBUTE_UNUSED,
> @@ -1177,11 +1194,34 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev,
>  dev->remove_slot = false;
>  
>   reprobe:
> -if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0)
> -goto cleanup;
> -/* Steal the dev from list inactiveDevs */
> -if (inactiveDevs)
> -virPCIDeviceListDel(inactiveDevs, dev);
> +if (STREQ_NULLABLE(dev->stubDriver, "vfio-pci")) {
> +size_t i = 0;
> +virPCIDeviceAddress devAddr = { dev->domain, dev->bus,
> +dev->slot, dev->function };
> +if (virPCIDeviceAddressIOMMUGroupIterate(&devAddr, 
> virPCIDeviceBoundToVFIODriver, NULL) < 0) {
> +result = 0;
> +goto cleanup;
> +}
> +
> +while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) {
> +virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i);
> +if (dev->iommuGroup == pcidev->iommuGroup) {
> +if (virPCIDeviceReprobeHostDriver(pcidev, driver, drvdir) < 
> 0)
> +   goto cleanup;
> +/* Steal the dev from list inactiveDevs */
> +virPCIDeviceListDel(inactiveDevs, pcidev);
> +continue;
> +}
> +i++;
> +}
> +} else {
> +if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0)
> +goto cleanup;
> +/* Steal the dev from list inactiveDevs */
> +
> +if (inactiveDevs)
> +virPCIDeviceListDel(inactiveDevs, dev);

Either put the empty line before the comment or just
get rid of it.

> +}
>  
>  result = 0;

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 6/8] Pass activeDevs and inactiveDevs to virPCIDeviceUnbindFromStub and virPCIDeviceBindToStub

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 05:00 +0530, Shivaprasad G Bhat wrote:
> The inactiveDevs need to be selectively altered for more than one
> device in case of vfio devices. So, pass the whole list.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |   38 +-
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 2709ddd..6c24a81 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1129,7 +1129,9 @@ static int 
> virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, char *driver, char
>  }
>  
>  static int
> -virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
> +virPCIDeviceUnbindFromStub(virPCIDevicePtr dev,
> +   virPCIDeviceListPtr activeDevs ATTRIBUTE_UNUSED,
> +   virPCIDeviceListPtr inactiveDevs)
>  {
>  int result = -1;
>  char *drvdir = NULL;
> @@ -1177,6 +1179,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>   reprobe:
>  if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0)
>  goto cleanup;
> +/* Steal the dev from list inactiveDevs */
> +if (inactiveDevs)
> +virPCIDeviceListDel(inactiveDevs, dev);
>  
>  result = 0;
>  
> @@ -1195,7 +1200,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  
>  static int
>  virPCIDeviceBindToStub(virPCIDevicePtr dev,
> -   const char *stubDriverName)
> +   const char *stubDriverName,
> +   virPCIDeviceListPtr activeDevs,
> +   virPCIDeviceListPtr inactiveDevs)
>  {
>  int result = -1;
>  bool reprobe = false;
> @@ -1317,6 +1324,14 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev,
>  goto cleanup;
>  }
>  
> +/* Add *a copy of* the dev into list inactiveDevs, if
> + * it's not already there.
> + */

Just close the comment in the previous line.

> +if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) &&
> +virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) {

Really weird alignment here...

> +result = -1;
> +}
> +
>   cleanup:
>  VIR_FREE(stubDriverPath);
>  VIR_FREE(driverLink);
> @@ -1324,7 +1339,7 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev,
>  
>  if (result < 0) {
>  VIR_FREE(newDriverName);
> -virPCIDeviceUnbindFromStub(dev);
> +virPCIDeviceUnbindFromStub(dev, activeDevs, NULL);

Why pass NULL instead of inactiveDevs here?

>  } else {
>  VIR_FREE(dev->stubDriver);
>  dev->stubDriver = newDriverName;
> @@ -1371,16 +1386,9 @@ virPCIDeviceDetach(virPCIDevicePtr dev,
>  return -1;
>  }
>  
> -if (virPCIDeviceBindToStub(dev, dev->stubDriver) < 0)
> -return -1;
> -
> -/* Add *a copy of* the dev into list inactiveDevs, if
> - * it's not already there.
> - */
> -if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) &&
> -virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) {
> +if (virPCIDeviceBindToStub(dev, dev->stubDriver,
> +   activeDevs, inactiveDevs) < 0)
>  return -1;
> -}
>  
>  return 0;
>  }
> @@ -1396,13 +1404,9 @@ virPCIDeviceReattach(virPCIDevicePtr dev,
>  return -1;
>  }
>  
> -if (virPCIDeviceUnbindFromStub(dev) < 0)
> +if (virPCIDeviceUnbindFromStub(dev, activeDevs, inactiveDevs) < 0)
>  return -1;
>  
> -/* Steal the dev from list inactiveDevs */
> -if (inactiveDevs)
> -virPCIDeviceListDel(inactiveDevs, dev);
> -
>  return 0;
>  }

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 5/8] Split reprobe action from the virPCIUnbindFromStub into a new function

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 04:59 +0530, Shivaprasad G Bhat wrote:
> The reprobe needs to be called multiple times for vfio devices for each
> device in the iommu group in future patch. So split the reprobe into a
> new function. No functional change.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virpci.c |   54 
> +++--
>  1 file changed, 32 insertions(+), 22 deletions(-)
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index eba285a..2709ddd 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1097,6 +1097,37 @@ static bool virPCIIsAKnownStub(char *driver)
>  return ret;
>  }
>  
> +static int virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, char *driver, 
> char *drvdir)
> +{
> +char *path = NULL;
> +int result = -1;
> +
> +if (!dev->reprobe)
> +return 0;
> +
> +/* Trigger a re-probe of the device is not in the stub's dynamic
> + * ID table. If the stub is available, but 'remove_id' isn't
> + * available, then re-probing would just cause the device to be
> + * re-bound to the stub.
> + */
> +if (driver && !(path = virPCIDriverFile(driver, "remove_id")))
> +goto cleanup;
> +
> +if (!driver || !virFileExists(drvdir) || virFileExists(path)) {
> +if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) {
> +virReportSystemError(errno,
> + _("Failed to trigger a re-probe for PCI 
> device '%s'"),
> + dev->name);
> +goto cleanup;
> +}
> +}
> +result = 0;
> + cleanup:
> +VIR_FREE(path);
> +dev->reprobe = false;
> +return result;
> +}
> +
>  static int
>  virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  {
> @@ -1144,28 +1175,8 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  dev->remove_slot = false;
>  
>   reprobe:
> -if (!dev->reprobe) {
> -result = 0;
> +if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0)
>  goto cleanup;
> -}

This actually introduces a flow change in the function:
before this patch, when dev->reprobe is false we set
result = 0 and jumped to cleanup, while now
virPCIDeviceReprobeHostDriver() returns 0 and the rest
of virPCIDeviceUnbindFromStub() is executed.

I would leave the check for the value of dev->reprobe
and setting it to false in cleanup out of
virPCIDeviceReprobeHostDriver().

> -
> -/* Trigger a re-probe of the device is not in the stub's dynamic
> - * ID table. If the stub is available, but 'remove_id' isn't
> - * available, then re-probing would just cause the device to be
> - * re-bound to the stub.
> - */
> -VIR_FREE(path);
> -if (driver && !(path = virPCIDriverFile(driver, "remove_id")))
> -goto cleanup;
> -
> -if (!driver || !virFileExists(drvdir) || virFileExists(path)) {
> -if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) {
> -virReportSystemError(errno,
> - _("Failed to trigger a re-probe for PCI 
> device '%s'"),
> - dev->name);
> -goto cleanup;
> -}
> -}
>  
>  result = 0;
>  
> @@ -1173,7 +1184,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>  /* do not do it again */
>  dev->unbind_from_stub = false;
>  dev->remove_slot = false;
> -dev->reprobe = false;
>  
>  VIR_FREE(drvdir);
>  VIR_FREE(path);

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/8] Wait for vfio-pci device cleanups before reassinging the device to host driver

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 04:58 +0530, Shivaprasad G Bhat wrote:
> Before unbind from stub driver libvirt should be sure the guest is not using
> the device anymore. The libvirt today waits for pci-stub driver alone in 
> /proc/iommu.
> The same wait is needed for vfio devices too.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virhostdev.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> index 91f28e9..6247477 100644
> --- a/src/util/virhostdev.c
> +++ b/src/util/virhostdev.c
> @@ -756,6 +756,13 @@ virHostdevReattachPCIDevice(virPCIDevicePtr dev, 
> virHostdevManagerPtr mgr)
>  usleep(100*1000);
>  retries--;
>  }
> +} else if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) {
> +int retries = 100;
> +while (virPCIDeviceWaitForCleanup(dev, "vfio-pci")
> +   && retries) {
> +usleep(100*1000);
> +retries--;
> +}
>  }
>  
>  if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,

This looks good.

Weak ACK, we need somebody who's more familiar with this area
of libvirt to go over the whole series anyway.

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/8] Refuse to reattach from vfio if the iommu group is in use by any domain

2015-10-30 Thread Andrea Bolognani
On Fri, 2015-10-30 at 04:57 +0530, Shivaprasad G Bhat wrote:
> It is incorrect to attempt the device reattach of a function,
> when some other domain is using some functions belonging to the same iommu
> group.
> 
> Signed-off-by: Shivaprasad G Bhat 
> ---
>  src/util/virhostdev.c |   11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> index de029a0..91f28e9 100644
> --- a/src/util/virhostdev.c
> +++ b/src/util/virhostdev.c
> @@ -1590,6 +1590,7 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr 
> hostdev_mgr,
>  virPCIDevicePtr pci)
>  {
>  virPCIDeviceAddressPtr devAddr = NULL;
> +bool usesVfio = STREQ_NULLABLE(virPCIDeviceGetStubDriver(pci), 
> "vfio-pci");
>  struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL,
>   false};
>  int ret = -1;
> @@ -1600,8 +1601,16 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr 
> hostdev_mgr,
>  if (!(devAddr = virPCIDeviceGetAddress(pci)))
>  goto out;
>  
> -if (virHostdevIsPCINodeDeviceUsed(devAddr, &data))
> +if (usesVfio) {
> +/* Doesn't matter which function. If any domain is actively using the
> +   iommu group, refuse to reattach */

Please indent this comment properly. The second line should
start with * as well.

> +if (virPCIDeviceAddressIOMMUGroupIterate(devAddr,
> + 
> virHostdevIsPCINodeDeviceUsed,
> + &data) < 0)
> +goto out;
> +} else if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) {
>  goto out;
> +}
>  
>  virPCIDeviceReattachInit(pci);

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/8] Initialize the stubDriver of pci devices if bound to a valid one

2015-10-30 Thread Shivaprasad bhat
Thanks for the comments Andrea.

On Fri, Oct 30, 2015 at 8:27 PM, Andrea Bolognani  wrote:
> On Fri, 2015-10-30 at 04:56 +0530, Shivaprasad G Bhat wrote:
>> The stubDriver name can be used to make useful decisions if readily 
>> available.
>> Set it if bound to a valid one during initialisation.
>>
>> Signed-off-by: Shivaprasad G Bhat 
>> ---
>>  src/util/virpci.c |   36 ++--
>>  1 file changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/util/virpci.c b/src/util/virpci.c
>> index 35b1459..5acf486 100644
>> --- a/src/util/virpci.c
>> +++ b/src/util/virpci.c
>> @@ -1080,6 +1080,22 @@ static const char *virPCIKnownStubs[] = {
>>  NULL
>>  };
>>
>> +static bool virPCIIsAKnownStub(char *driver)
>
> Return type on a separate line, please.
>
>> +{
>> +const char **stubTest;
>> +bool ret = false;
>> +
>> +for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) {
>> +if (STREQ_NULLABLE(driver, *stubTest)) {
>> +ret = true;
>> +VIR_DEBUG("Found stub driver %s", *stubTest);
>> +break;
>> +}
>> +}
>> +
>> +return ret;
>> +}
>> +
>>  static int
>>  virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>>  {
>> @@ -1087,8 +1103,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>>  char *drvdir = NULL;
>>  char *path = NULL;
>>  char *driver = NULL;
>> -const char **stubTest;
>> -bool isStub = false;
>>
>>  /* If the device is currently bound to one of the "well known"
>>   * stub drivers, then unbind it, otherwise ignore it.
>> @@ -1105,14 +1119,7 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
>>  goto remove_slot;
>>
>>  /* If the device isn't bound to a known stub, skip the unbind. */
>> -for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) {
>> -if (STREQ(driver, *stubTest)) {
>> -isStub = true;
>> -VIR_DEBUG("Found stub driver %s", *stubTest);
>> -break;
>> -}
>> -}
>> -if (!isStub)
>> +if (!virPCIIsAKnownStub(driver))
>>  goto remove_slot;
>>
>>  if (virPCIDeviceUnbind(dev, dev->reprobe) < 0)
>
> I would split this patch in two: everything above is just moving
> an existing check into a separate function, without introducing
> any functional change.
>

Sure.. Will do that.

> The code below, on the other hand, is changing the behavior of
> virPCIDeviceNew() and as such should go in its own patch. But
> see below.
>
>> @@ -1556,6 +1563,8 @@ virPCIDeviceNew(unsigned int domain,
>>  virPCIDevicePtr dev;
>>  char *vendor = NULL;
>>  char *product = NULL;
>> +char *drvpath = NULL;
>> +char *driver = NULL;
>>
>>  if (VIR_ALLOC(dev) < 0)
>>  return NULL;
>> @@ -1603,9 +1612,16 @@ virPCIDeviceNew(unsigned int domain,
>>  goto error;
>>  }
>>
>> +if (virPCIDeviceGetDriverPathAndName(dev, &drvpath, &driver) < 0)
>> +goto cleanup;
>> +
>> +if (virPCIIsAKnownStub(driver))
>> +dev->stubDriver = driver;
>> +
>>  VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
>>
>>   cleanup:
>> +VIR_FREE(drvpath);
>>  VIR_FREE(product);
>>  VIR_FREE(vendor);
>>  return dev;
>
> What are you doing this for? AFAICT you're using this so you
> can, in Patch 7, do
>
>   pci = virPCIDeviceNew(...);
>   if (STREQ_NULLABLE(pci->stubDriver, "vfio-pci"))
>   ...
>
> Is that so, or is there another reason I'm missing?
>

Its used in P3 as well in virHostdevPCINodeDeviceReAttach().
I want to keep that function as simple as it is now.
 And as you pointed out, i am using it in P7 too.Hope its okay now.

> If the former, I'd rather not overload the meaning of
> dev->stubDriver and call virPCIDeviceGetDriverPathAndName()
> explicitly in Patch 7, so that the intentions behind the
> code are abundantly clear.
>
> Cheers.
>
> --
> Andrea Bolognani
> Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [sandbox] Weird apparmor problems

2015-10-30 Thread Cedric Bosdonnat
On Fri, 2015-10-30 at 09:15 +0900, Daniel P. Berrange wrote:
> So, yes, it is normal for libvirt_lxc to access /dev/ptmx to create
> a new master PTY and to read/write to /dev/pts/NN associated with
> the file descriptor retrieved from /dev/ptmx.

After some more debugging and help from jjohansen, the problem happens
to be this commit:

http://libvirt.org/git/?p=libvirt.git;a=commit;h=d0d4b8ad76d3e8a859ee90701a21a3f003a22c1f

When having the not-so-silly idea to mount the host / readonly in a qemu
guest (like what virt-sandbox is doing), we are adding a "deny /** w"
rule taking precedence over all rules giving write access to files
inside that path.

Would there be a clean solution for that problem? I can already teach
virt-sandbox to add the host / mount only if there is nothing else to be
mounted as /, but that wouldn't cover all cases.

--
Cedric

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm

2015-10-30 Thread Igor Mammedov
On Thu, 29 Oct 2015 16:16:57 -0200
Eduardo Habkost  wrote:

> (CCing Michal and libvir-list, so libvirt team is aware of this
> restriction)
> 
> On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
> > On Tue, 27 Oct 2015 14:36:35 -0200
> > Eduardo Habkost  wrote:
> > 
> > > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
> > > > On Tue, 27 Oct 2015 10:53:08 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
> > > > > > On Tue, 27 Oct 2015 10:31:21 +0200
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > 
> > > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov wrote:
> > > > > > > > Yep it's workaround but it works around QEMU's broken virtio
> > > > > > > > implementation in a simple way without need for guest side 
> > > > > > > > changes.
> > > > > > > > 
> > > > > > > > Without foreseeable virtio fix it makes memory hotplug unusable 
> > > > > > > > and even
> > > > > > > > more so if there were a virtio fix it won't fix old guests 
> > > > > > > > since you've
> > > > > > > > said that virtio fix would require changes of both QEMU and 
> > > > > > > > guest sides.
> > > > > > > 
> > > > > > > What makes it not foreseeable?
> > > > > > > Apparently only the fact that we have a work-around in place so 
> > > > > > > no one
> > > > > > > works on it.  I can code it up pretty quickly, but I'm flat out 
> > > > > > > of time
> > > > > > > for testing as I'm going on vacation soon, and hard freeze is 
> > > > > > > pretty
> > > > > > > close.
> > > > > > I can lend a hand for testing part.
> > > > > > 
> > > > > > > 
> > > > > > > GPA space is kind of cheap, but wasting it in chunks of 512M
> > > > > > > seems way too aggressive.
> > > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so we 
> > > > > > aren't
> > > > > > actually wasting anything here.
> > > > > >
> > > > > 
> > > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
> > > > > It's too much either way.
> > > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
> > > > backend's natural alignment (i.e. 1Gb).
> > > 
> > > Is backend configuration even allowed to affect the machine ABI? We need
> > > to be able to change backend configuration when migrating the VM to
> > > another host.
> > for now, one has to use the same type of backend on both sides
> > i.e. if source uses 1Gb huge pages backend then target also
> > need to use it.
> > 
> 
> The page size of the backend don't even depend on QEMU arguments, but on
> the kernel command-line or hugetlbfs mount options. So it's possible to
> have exactly the same QEMU command-line on source and destination (with
> an explicit versioned machine-type), and get a VM that can't be
> migrated? That means we are breaking our guarantees about migration and
> guest ABI.
> 
> 
> > We could change this for the next machine type to always force
> > max alignment (1Gb), then it would be possible to change
> > between backends with different alignments.
> 
> I'm not sure what's the best solution here. If always using 1GB is too
> aggressive, we could require management to ask for an explicit alignment
> as a -machine option if they know they will need a specific backend page
> size.
> 
> BTW, are you talking about the behavior introduced by
> aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
> DIMM's GPA") only, or the backend page size was already affecting GPA
> allocation before that commit?
backend alignment was there since beginning,
we always over-reserve 1GB per slot since we don't know in advance what
alignment hotplugged backend would require.



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Daniel P. Berrange
On Fri, Oct 30, 2015 at 11:49:12AM +0100, Michal Privoznik wrote:
> On 29.10.2015 18:48, Laine Stump wrote:
> > On 10/29/2015 12:49 PM, Tony Krowiak wrote:
> >> For a guest domain defined with a large number of macvtap devices, it
> >> takes an exceedingly long time to boot the guest. In a test of a guest
> >> domain configured with 82 macvtap devices, it took over two minutes
> >> for the guest to boot. An strace of the ioctl calls during guest start
> >> up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
> >> I was able to isolate the source of the ioctl calls to
> >> the*virNetDevMacVLanCreateWithVPortProfile*  function
> >> in*virnetdevmacvlan.c*. The macvtap interface name is created by
> >> looping over a counter variable, starting with zero, and appending the
> >> counter value to 'macvtap'.
> > 
> > I've wondered ever since the first time I saw that code why it was done
> > that way, and why there had never been any performance complaints.
> > Lacking any complaints, I promptly forgot about it (until the next time
> > I went past the code for some other tangentially related reason.)
> > 
> > Since you're the first to complain, you have the honor of fixing it :-)
> > 
> >> With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
> >> ioctl) to determine if a device with that name already exists, until a
> >> unique name is created. In the test case cited above, to create an
> >> interface name for the 82nd macvtap device, the*virNetDevExists* 
> >> function will be called for interface names 'macvtap0' to 'macvtap80'
> >> before it is determined that 'mavtap81' can be used. So if N is the
> >> number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
> >> ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
> >> device names. That's assuming only one guest is being started, who
> >> knows how many times the ioctl may have to be called in an
> >> installation running a large number of guests defined with macvtap
> >> devices.
> 
> Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
> contained v1.2.2~32) if two threads were starting a domain concurrently,
> they even competed with each other in that specific area of the code.
> 
> >>
> >> I was able to reduce the amount of time for starting a guest domain
> >> defined with 82 macvtap devices from over 2 minutes to about 14
> >> seconds by keeping track of the interface name suffixes previously
> >> used. I defined two static bit maps (virBitmap), one each for macvtap
> >> and macvlan device name suffixes. When a macvtap/macvlan device is
> >> created, the index of the next clear bit (virBitmapNextClearBit) is
> >> retrieved to create the name. If an interface with that name does not
> >> exist, the device is created and the bit at the index used to create
> >> the interface name is set (virBitmapSetBit). When a macvtap/macvlan
> >> device is deleted, if the interface name has the pattern 'macvtap%d'
> >> or 'macvlan%d', the suffix is parsed into a bit index and used to
> >> clear the (virBitMapClearBit) bit in the respective bitmap.
> > 
> > This sounds fine, as long as 1) you recreate the bitmap whenever
> > libvirtd is restarted (while scanning through all the interfaces of
> > every domain; there is already code being executed in exactly the right
> > place - look for qemu_process.c:qemuProcessNotifyNets() and add
> > appropriate code inside the loop there), and 2) you retry some number of
> > times if a supposedly unused device name is actually in use (to account
> > for processes other than libvirt using the same naming convention).
> 
> How about re-using the approach we have for virPortAllocator? We
> maintain a bitmap of ports. On acquiring new port, we try to bind() to
> it. If we succeeded, we set the corresponding bit in the bitmap. Of
> course it may happen that a port in the host is already taken but our
> bitmap does not think so. That's okay. We just leave the corresponding
> bit alone => if we would set it as used, nobody will ever unset it.
> Moreover, we will try the port next time, and it may be free.
> 
> Moreover, the bitmap is not saved anywhere, nor restored on daemon
> restart - this could be changed though.
> 
> So what am I saying is practically the same as Laine, just extending his
> thoughts and giving you an example how to proceed further :)

Yeah, I think maintaining a bitmap of used device indexes or names is
a fine idea for this.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 3/3] virsh.pod: update and improve a attach-interface section

2015-10-30 Thread Pavel Hrdina
On Thu, Oct 29, 2015 at 10:52:00AM -0400, John Ferlan wrote:
> 
> 
> On 10/29/2015 04:42 AM, Pavel Hrdina wrote:
> > On Tue, Oct 27, 2015 at 06:53:55PM -0400, John Ferlan wrote:
> >>
> >>
> >> On 10/27/2015 11:01 AM, Pavel Hrdina wrote:
> >>> Rewrite the attach-interface section in man page to be more readable and
> >>> add the new hostdev functionality.
> >>>
> >>> Signed-off-by: Pavel Hrdina 
> >>> ---
> >>>  tools/virsh.pod | 82 
> >>> +++--
> >>>  1 file changed, 50 insertions(+), 32 deletions(-)
> >>>
> >>
> >> Why a separate patch? It's related to the previous one and if anyone
> >> ever (ahem) backed ported the other one, they could miss this one... No
> >> strong feeling either way - just curious.
> > 
> > It's a rewrite of the attach-interface part of the man page, so I thought, 
> > that
> > would be better to do it in a separate patch.  Maybe I can at first do the
> > rewrite without adding anything about the new feature and than have the 
> > update
> > of man page together with previous patch.
> > 
> 
> Sure that works too - although I would think mostly unnecessary, but I
> know that's the norm to separate rewrite from feature/bug fix.

It's not that hard to split the patch, so I'll do it.

> 
> >>
> >>> diff --git a/tools/virsh.pod b/tools/virsh.pod
> >>> index 0212e7a..843c293 100644
> >>> --- a/tools/virsh.pod
> >>> +++ b/tools/virsh.pod
> >>> @@ -2507,51 +2507,69 @@ Likewise, I<--shareable> is an alias for I<--mode 
> >>> shareable>.
> >>>  [[[I<--live>] [I<--config>] | [I<--current>]] | [I<--persistent>]]
> >>>  [I<--target target>] [I<--mac mac>] [I<--script script>] [I<--model 
> >>> model>]
> >>>  [I<--inbound average,peak,burst,floor>] [I<--outbound 
> >>> average,peak,burst>]
> >>> -[I<--print-xml>]
> >>> -
> >>> -Attach a new network interface to the domain.  I can be
> >>> -I to indicate connection via a libvirt virtual network, or
> >>> -I to indicate connection via a bridge device on the host, or
> >>> -I to indicate connection directly to one of the host's network
> >>> -interfaces or bridges.  I indicates the source of the
> >>> -connection (the name of a network, or of a bridge device, or the
> >>> -host's network interfaces or bridges).  I is used to specify
> >>> -the tap/macvtap device to be used to connect the domain to the
> >>> -source. Names starting with 'vnet' are considered as auto-generated
> >>> -and are blanked out/regenerated each time the interface is attached.
> >>> -I specifies the MAC address of the network interface; if a MAC
> >>> +[I<--managed>] [I<--print-xml>]
> >>> +
> >>> +Attach a new network interface to the domain.
> >>> +
> >>> +B<--type> can be one of the: I to indicate connection via a 
> >>> libvirt
> >>> +virtual network, I to indicate connection via a bridge device
> >>> +on the host, I to indicate connection directly to one of the 
> >>> host's
> >>> +network interfaces or bridges, I to indicate connection using a
> >>> +passthrough of PCI device on the host.
> >>
> >> Using a ':' I think is unnecessary unless you somehow generate a real
> >> list such as via "=item * " entries.  In that case you'd have the
> >> following:
> >>
> > 
> > I've considered this format before writing the patch and it used a lot of 
> > space.
> > I agree, that it would be better arranged.  I'll update it.
> > 
> 
> I contend virsh.pod is a conglomeration of writing and formatting
> styles. I like your use of B<> instead of I<>, but that is "different".
> I think separating switches and better/longer descriptions are better,
> but that's not always the case. The whole file could use some amount of
> adjustment for consistency in format/style.
> 

I agree, the man page is a mess and really hard to read and understand.

Pavel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 2/3] virsh-domain: update attach-interface to support type=hostdev

2015-10-30 Thread Pavel Hrdina
On Thu, Oct 29, 2015 at 10:33:51AM -0400, John Ferlan wrote:
> 
> 
> On 10/29/2015 03:08 AM, Pavel Hrdina wrote:
> > On Tue, Oct 27, 2015 at 06:16:33PM -0400, John Ferlan wrote:
> >>
> 
> [...]
> 
> >>
> >> What happens if someone provides --managed with some other 'typ'?
> >>
> >> IOW: If it's only being supported by , then after the switch, a
> >> check should probably occur for "if (managed && typ !=
> >> VIR_DOMAIN_NET_TYPE_HOSTDEV), message, and fail.
> > 
> > The check was there, but then I removed it because other arguments doesn't 
> > check
> > the usability too.  We don't need to error out, because libvirt just ignore
> > the "managed='yes'" in the XML.
> > 
> >>
> >> I'm not fully clear after reading the bz and the previous review whether
> >> this patch resolves the bz - something to be worked out in the bz for
> >> history's sake though
> > 
> > I think, that the main issue with the BZ is that there was no easy way how 
> > to
> > attach *hostdev* interface without manually creating the XML for that 
> > interface.
> > We established with Laine, that there is not need for translating netdev 
> > name to
> > PCI address.
> > 
> >>
> >> I think with the adjustment to check whether managed is supplied for non
> >> hostdev, then you have an ACK for what's changed here.
> > 
> > Reconsider the 'managed' check, we can be strict and check whether it's used
> > only with hosted type or not, but it's not necessary.
> > 
> 
> As I read the docs/code, I see managed is only parsed for 
> types, so yes from that aspect you're correct. I usually err on the side
> of the extra check so that if some day the parsing code gets changed you
> don't run into issues.  The formatting code certainly only writes out
> managed='yes' if type is hostdev, so we're safe from the issue of
> managed='yes' being written into the guest XML... I guess it's the
> longer way of say ACK for what's here unless you want to be extra paranoid.
> 

Thanks, I'll push it after freeze.

Pavel

> 
> John
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Vasiliy Tolstov
2015-10-30 13:00 GMT+03:00 Michal Privoznik :
> Oh right, libvirt only know how to deal with channels. ISA serials are
> ignored when it comes to qemu-ga. The problem is that from looking at
>  we don't know which one is suppose to be for the agent. For
> instance:
>
> 
>   
>   
>   
> 
>
> 
>   
>   
>   
> 
>
> which one of these should be agent listening to? And subsequently which
> one should libvirt connect to?
>
> We can add an attribute somewhere to denote that fact, but you'd still
> need to configure the guest agent inside the guest to run properly.

This is not problem - i'm create fallback (if virtio-serial is absent,
try to isa-serial)
Now i'm need only libvirt side, may be use alias name org.qemu to
determine on which channel works qemu-ga ?


-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Vasiliy Tolstov
2015-10-30 13:57 GMT+03:00 Michal Privoznik :
> Right now, qemu-ga has -p switch, that allows you to pass the path it
> should listen to. In this case:
>
> qemu-ga -m isa-serial -p /dev/ttyS0
>
> should do. I'm no *BSD-ian, but I bet it can be configured in their
> startup scripts.
>
> So the only question is, how to mark the correct  in the domain
> XML.


Yes, so if we add name to serial - all works fine (simply need to add
isa_serial type to one if.
If that can'be done - i need to iterate over aliases or something like this...

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Michal Privoznik
On 30.10.2015 11:46, Martin Kletzander wrote:
> On Fri, Oct 30, 2015 at 01:32:34PM +0300, Vasiliy Tolstov wrote:
>> 2015-10-30 13:08 GMT+03:00 Michal Privoznik :
>>> Yeah, something like that could work. Mind proposing patch?
>>
>>
>> Maybe...
>> Does it possible to get something like this:
>>
>> 
>>  
>>  
>>  
>> 
>>
>> i'm not understand does target for serial can have name.
>> qemuFindAgentConfig uses if (STREQ_NULLABLE(channel->target.name,
>> "org.qemu.guest_agent.0")) {
>>
> 
> That doesn't make sense to me.  virtio channel can have a name, but
> isa serial cannot, that name wouldn't make sense. 
>

That's the reason why libvirt just doesn't deal with isa-serials as an
interface to guest agent. My point is, we need a way in a array of
 to find the correct one that qemu-ga is going to listen to.


> Also, how would you
> identify to which serial port to attach in the guest-agent?
>

Right now, qemu-ga has -p switch, that allows you to pass the path it
should listen to. In this case:

qemu-ga -m isa-serial -p /dev/ttyS0

should do. I'm no *BSD-ian, but I bet it can be configured in their
startup scripts.

So the only question is, how to mark the correct  in the domain
XML.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Vasiliy Tolstov
2015-10-30 13:46 GMT+03:00 Martin Kletzander :
> That doesn't make sense to me.  virtio channel can have a name, but
> isa serial cannot, that name wouldn't make sense.  Also, how would you
> identify to which serial port to attach in the guest-agent?


Hm.. interesting question =) On guest syde i can try all ttyS* and
check for guest-sync message, when i'm receive it on the ttyS it can
be used to communication.


-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Improve performance of macvtap device creation

2015-10-30 Thread Michal Privoznik
On 29.10.2015 18:48, Laine Stump wrote:
> On 10/29/2015 12:49 PM, Tony Krowiak wrote:
>> For a guest domain defined with a large number of macvtap devices, it
>> takes an exceedingly long time to boot the guest. In a test of a guest
>> domain configured with 82 macvtap devices, it took over two minutes
>> for the guest to boot. An strace of the ioctl calls during guest start
>> up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
>> I was able to isolate the source of the ioctl calls to
>> the*virNetDevMacVLanCreateWithVPortProfile*  function
>> in*virnetdevmacvlan.c*. The macvtap interface name is created by
>> looping over a counter variable, starting with zero, and appending the
>> counter value to 'macvtap'.
> 
> I've wondered ever since the first time I saw that code why it was done
> that way, and why there had never been any performance complaints.
> Lacking any complaints, I promptly forgot about it (until the next time
> I went past the code for some other tangentially related reason.)
> 
> Since you're the first to complain, you have the honor of fixing it :-)
> 
>> With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
>> ioctl) to determine if a device with that name already exists, until a
>> unique name is created. In the test case cited above, to create an
>> interface name for the 82nd macvtap device, the*virNetDevExists* 
>> function will be called for interface names 'macvtap0' to 'macvtap80'
>> before it is determined that 'mavtap81' can be used. So if N is the
>> number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
>> ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
>> device names. That's assuming only one guest is being started, who
>> knows how many times the ioctl may have to be called in an
>> installation running a large number of guests defined with macvtap
>> devices.

Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
contained v1.2.2~32) if two threads were starting a domain concurrently,
they even competed with each other in that specific area of the code.

>>
>> I was able to reduce the amount of time for starting a guest domain
>> defined with 82 macvtap devices from over 2 minutes to about 14
>> seconds by keeping track of the interface name suffixes previously
>> used. I defined two static bit maps (virBitmap), one each for macvtap
>> and macvlan device name suffixes. When a macvtap/macvlan device is
>> created, the index of the next clear bit (virBitmapNextClearBit) is
>> retrieved to create the name. If an interface with that name does not
>> exist, the device is created and the bit at the index used to create
>> the interface name is set (virBitmapSetBit). When a macvtap/macvlan
>> device is deleted, if the interface name has the pattern 'macvtap%d'
>> or 'macvlan%d', the suffix is parsed into a bit index and used to
>> clear the (virBitMapClearBit) bit in the respective bitmap.
> 
> This sounds fine, as long as 1) you recreate the bitmap whenever
> libvirtd is restarted (while scanning through all the interfaces of
> every domain; there is already code being executed in exactly the right
> place - look for qemu_process.c:qemuProcessNotifyNets() and add
> appropriate code inside the loop there), and 2) you retry some number of
> times if a supposedly unused device name is actually in use (to account
> for processes other than libvirt using the same naming convention).

How about re-using the approach we have for virPortAllocator? We
maintain a bitmap of ports. On acquiring new port, we try to bind() to
it. If we succeeded, we set the corresponding bit in the bitmap. Of
course it may happen that a port in the host is already taken but our
bitmap does not think so. That's okay. We just leave the corresponding
bit alone => if we would set it as used, nobody will ever unset it.
Moreover, we will try the port next time, and it may be free.

Moreover, the bitmap is not saved anywhere, nor restored on daemon
restart - this could be changed though.

So what am I saying is practically the same as Laine, just extending his
thoughts and giving you an example how to proceed further :)

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Martin Kletzander

On Fri, Oct 30, 2015 at 01:32:34PM +0300, Vasiliy Tolstov wrote:

2015-10-30 13:08 GMT+03:00 Michal Privoznik :

Yeah, something like that could work. Mind proposing patch?



Maybe...
Does it possible to get something like this:


 
 
 


i'm not understand does target for serial can have name.
qemuFindAgentConfig uses if (STREQ_NULLABLE(channel->target.name,
"org.qemu.guest_agent.0")) {



That doesn't make sense to me.  virtio channel can have a name, but
isa serial cannot, that name wouldn't make sense.  Also, how would you
identify to which serial port to attach in the guest-agent?


--
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Vasiliy Tolstov
2015-10-30 13:08 GMT+03:00 Michal Privoznik :
> Yeah, something like that could work. Mind proposing patch?


Maybe...
Does it possible to get something like this:


  
  
  


i'm not understand does target for serial can have name.
qemuFindAgentConfig uses if (STREQ_NULLABLE(channel->target.name,
"org.qemu.guest_agent.0")) {

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Michal Privoznik
On 30.10.2015 11:04, Vasiliy Tolstov wrote:
> 2015-10-30 13:00 GMT+03:00 Michal Privoznik :
>> Oh right, libvirt only know how to deal with channels. ISA serials are
>> ignored when it comes to qemu-ga. The problem is that from looking at
>>  we don't know which one is suppose to be for the agent. For
>> instance:
>>
>> 
>>   
>>   
>>   
>> 
>>
>> 
>>   
>>   
>>   
>> 
>>
>> which one of these should be agent listening to? And subsequently which
>> one should libvirt connect to?
>>
>> We can add an attribute somewhere to denote that fact, but you'd still
>> need to configure the guest agent inside the guest to run properly.
> 
> This is not problem - i'm create fallback (if virtio-serial is absent,
> try to isa-serial)
> Now i'm need only libvirt side, may be use alias name org.qemu to
> determine on which channel works qemu-ga ?
> 

Yeah, something like that could work. Mind proposing patch?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Michal Privoznik
On 30.10.2015 08:32, Vasiliy Tolstov wrote:
> 2015-10-29 18:40 GMT+03:00 Michal Privoznik :
>> You need to have a virtio channel whose target name is
>> "org.qemu.guest_agent.0". The source does not matter to libvirt - we can
>> connect both to an unix socket or pty.
> 
> 
> Can you helps me via providing xml for this?
> Freebsd does not supports virtio-serial, so inside guest i need
> isa-serial (ttyS0 or ttyS1), and from host side something like unix
> socket with name org.qemu.guest_agent.0..
> 

Oh right, libvirt only know how to deal with channels. ISA serials are
ignored when it comes to qemu-ga. The problem is that from looking at
 we don't know which one is suppose to be for the agent. For
instance:


  
  
  



  
  
  


which one of these should be agent listening to? And subsequently which
one should libvirt connect to?

We can add an attribute somewhere to denote that fact, but you'd still
need to configure the guest agent inside the guest to run properly.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2/3] Qemu: add CMT support

2015-10-30 Thread Ren, Qiaowei

> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Thursday, October 29, 2015 10:56 PM
> To: Ren, Qiaowei
> Cc: libvir-list@redhat.com; Feng, Shaohe
> Subject: Re: [PATCH 2/3] Qemu: add CMT support
> 
> On Thu, Oct 29, 2015 at 02:02:29PM +0800, Qiaowei Ren wrote:
> > One RFC in
> > https://www.redhat.com/archives/libvir-list/2015-June/msg01509.html
> >
> > CMT (Cache Monitoring Technology) can be used to measure the usage of
> > cache by VM running on the host. This patch will extend the bulk stats
> > API (virDomainListGetStats) to add this field. Applications based on
> > libvirt can use this API to achieve cache usage of VM. Because CMT
> > implementation in Linux kernel is based on perf mechanism, this patch
> > will enable perf event for CMT when VM is created and disable it when
> > VM is destroyed.
> >
> > Signed-off-by: Qiaowei Ren 
> 
> Thanks for re-sending this patchset, it has reminded me of the concerns /
> questions I had around this previously.
> 
> Just ignoring the code for a minute, IIUC the design is
> 
>  - Open a file handle to the kernel perf system for each running VM
>  - Associate that perf event file handle with the QEMU VM PID
>  - Enable recording of the CMT perf event on that file handle
>  - Report the CMT event values in the virDomainGetStats() API
>call when VIR_DOMAIN_STATS_CACHE is requested
> 
> My two primary concerns are
> 
>  1. Do we want to have a perf event FD open for every running
> VM all the time.
>  2. Is the virDomainGetStats() integration the right API approach
> 
> For item 1, my concern is that the CMT event is only ever going to be consumed
> by OpenStack, and even then, only OpenStack installs which have the schedular
> plugin that cares about the CMT event data. It feels undesirable to have this 
> perf
> system enabled for all libvirt VMs, when perhaps < 1 % of libvirt users 
> actually
> want this data. It feels like we need some mechanism to decide when this event
> is enabled
> 
> For item 2, my concern is first when virDomainGetStats is the right API. I 
> think it
> probably *is* the right API, since I can't think of a better way.
> 
> Should we however, be having a very special case VIR_DOMAIN_STATS_CACHE
> group, or should we have something more generic.
> 
> For example, if I run 'perf event' I see
> 
> List of pre-defined events (to be used in -e):
> 
>   branch-instructions OR branches[Hardware event]
>   branch-misses  [Hardware event]
>   bus-cycles [Hardware event]
>   cache-misses   [Hardware event]
>   cache-references   [Hardware event]
>   cpu-cycles OR cycles   [Hardware event]
>   instructions   [Hardware event]
>   ref-cycles [Hardware event]
>   stalled-cycles-frontend OR idle-cycles-frontend[Hardware event]
> 
>   alignment-faults   [Software event]
>   context-switches OR cs [Software event]
>   cpu-clock  [Software event]
>   cpu-migrations OR migrations   [Software event]
>   dummy  [Software event]
>   emulation-faults   [Software event]
>   major-faults   [Software event]
>   minor-faults   [Software event]
>   ...any many many more...
> 
> 
> Does it make sense to extend the virDomainStats API to *only* deal with
> reporting of 1 specific perf event that you care about right now. It feels 
> like it
> might be better if we did something more general purpose.
> 
> eg what if something wants to get 'major-faults' data in future ?
> So we add a VIR_DOMAIN_STATS_MAJOR_FAULT enum item, etc.
> 
> Combining these two concerns, I think we might need 2 things
> 
>  - A new API to turn on/off collection of specific perf events
> 
> This could be something like
> 
>virDomainGetPerfEvents(virDOmainPtr dom,
>   virTypedParameter params);
> 
> This would fill virTypedParameters with one entry for each perf event, using 
> the
> VIR_TYPED_PARAM_BOOLEAN type. A 'true'
> value would indicate that event is enabled for the VM. A corresponding
> 
>virDomainSetPerfEvents(virDOmainPtr dom,
>   virTypedParameter params);
> 
> would enable you to toggle the flag, to enable/disable the particular list of 
> perf
> events you care about.
> 
> With that, we could have a 'VIR_DOMAIN_STATS_PERF_EVENT' enum item for
> virDomainStats which causes reporting of all previously enabled perf events
> 
> This would avoid us needing to have the perf event enabled for all VMs all the
> time. Only applications using libvirt which 

Re: [libvirt] [sandbox] Weird apparmor problems

2015-10-30 Thread Cedric Bosdonnat
On Fri, 2015-10-30 at 09:15 +0900, Daniel P. Berrange wrote:
> NB in containers we have two PTYs involved.  The libvirt_lxc process
> opens one pty in the host context and that is used to communicate
> between virsh console & libvirt_lxc.  The libvirt_lxc process opens
> one pty in the guest context and that is used to commnuicate between
> libvirt_lxc and the container master console. Libvirt_lxc forwards
> data between the two PTYs.
> 
> So, yes, it is normal for libvirt_lxc to access /dev/ptmx to create
> a new master PTY and to read/write to /dev/pts/NN associated with
> the file descriptor retrieved from /dev/ptmx.

After checking more carefully, all rules are already in the profile...
and are concerning the qemu builder. I haven't checked if it happens
with lxc yet.

The question now is why does it happen with virt-sandbox and not with a
normal libvirt qemu domain.

--
Cedric

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] qemu-agent-command via isa-serial for freebsd

2015-10-30 Thread Vasiliy Tolstov
2015-10-29 18:40 GMT+03:00 Michal Privoznik :
> You need to have a virtio channel whose target name is
> "org.qemu.guest_agent.0". The source does not matter to libvirt - we can
> connect both to an unix socket or pty.


Can you helps me via providing xml for this?
Freebsd does not supports virtio-serial, so inside guest i need
isa-serial (ttyS0 or ttyS1), and from host side something like unix
socket with name org.qemu.guest_agent.0..

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list