Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

2018-02-05 Thread Paolo Bonzini
On 29/01/2018 14:03, Michal Privoznik wrote:
> On 01/26/2018 04:07 AM, John Ferlan wrote:
>>
>>
>> On 01/18/2018 11:04 AM, Michal Privoznik wrote:
>>> This is a definition that holds information on SCSI persistent
>>> reservation settings. The XML part looks like this:
>>>
>>>   
>>> 
>>>   
>>>
>>> If @managed is set to 'yes' then the  is not parsed.
>>> This design was agreed on here:
>>>
>>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>>>
>>> Signed-off-by: Michal Privoznik 
>>> ---
>>>  docs/formatdomain.html.in  |  25 +++-
>>>  docs/schemas/domaincommon.rng  |  19 +--
>>>  docs/schemas/storagecommon.rng |  34 +
>>>  src/conf/domain_conf.c |  36 +
>>>  src/libvirt_private.syms   |   3 +
>>>  src/util/virstoragefile.c  | 148 
>>> +
>>>  src/util/virstoragefile.h  |  15 +++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++
>>>  .../disk-virtio-scsi-reservations.xml  |  38 ++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
>>>  .../disk-virtio-scsi-reservations.xml  |   1 +
>>>  tests/qemuxml2xmltest.c|   4 +
>>>  12 files changed, 348 insertions(+), 16 deletions(-)
>>>  create mode 100644 
>>> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>>>  create mode 12 
>>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 12 
>>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>>>
>>
>> Before digging too deep into this...
>>
>>  - I assume we're avoiding  iSCSI mainly because those
>> reservations would take place elsewhere, safe assumption?
> 
> I believe so, but I'll let Paolo answer that. The way I understand
> reservations is that qemu needs to issue 'privileged' SCSI commands and
> thus for regular SCSI (which for purpose of this argument involves iSCSI
> emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
> to which it'll pass the FD and which will issue the 'privileged' SCSI
> commands on qemu's behalf.

Yes.  There are two reasons for QEMU to access the helper.  First, in
order to be able to issue the command without CAP_SYS_RAWIO.  Second, in
order to access /dev/mapper/control and issue the command to all targets
in a multipath setup.

iSCSI in kernel, including multipath over iSCSI is included.  iSCSI in
userspace does not need qemu-pr-manager because QEMU 1) can just send
the command down a TCP socket without needing CAP_SYS_RAWIO 2) does not
support multipath for iSCSI in userspace.

>>  - What about using lun's from a storage pool (and what could become
>> your favorite, NPIV devices ;-))
>>
>>
>>  
>>  
>>  
>>
> 
> These should work too with my patches (not tested though - I don't have
> any real SCSI machine).
>
>>  - What about 's?
>>
>>
>>
>>but not iSCSI or vHost hostdev's. I think that creates the SCSI
>> generic LUN, but it's been a while since I've thought about the
>> terminology used for hostdev's...
> 
> I think these don't need the feature since qemu can access the device
> directly.

They actually need the feature, but it can be added later.

>> And finally... I assume there is one qemu-pr-manager (qemu.conf changes
>> eventually)... Eventually there's magic that allows/adds per domain
>> *and* per LUN some socket path. If libvirt provided it's generated via
>> the domain temporary directory; however, what's not really clear is how
>> that unmanaged path really works.  Need a virtual whiteboard...
> 
> So, in case of unmanaged path, here are the assumptions that my patches
> are built on:
> 
> 1) unmanaged helper process (UHP) is spawned by somebody else's than
> libvirtd (hence unmanaged) - it doesn't have to be user, it can be
> systemd for instance.
> 
> 2) path to UHP's socket has to be labeled correctly - libvirt doesn't
> touch that
> 
> 3) in future, when UHP dies, libvirt will NOT spawn it again. It's
> unmanaged after all. It's user/sysadmin responsibility to spawn it
> again.

Correct.

> Now, for the managed helper process (MHP) the assumptions are:
> 
> 1) there's one MHP per domain (all SCSI disks in the domain share the
> same MHP).
> 
> 2) the MHP runs as root, but is placed into the same CGroup, mount
> namespace as qemu process it serves
> 
> 3) MHP is lives and dies with the domain it is associated with.

Correct, with the caveat that QEMU must provide the MHP state and death
event for this to be complete.

Thanks,

Paolo

> The code might be complicated more than needed - it is prepared to have
> one MHP per disk rather than domain (should we ever need it). Therefore
> instead of storing one pid_t, we store them in a hash table 

Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

2018-02-01 Thread John Ferlan


On 01/29/2018 08:03 AM, Michal Privoznik wrote:
> On 01/26/2018 04:07 AM, John Ferlan wrote:
>>
>>
>> On 01/18/2018 11:04 AM, Michal Privoznik wrote:
>>> This is a definition that holds information on SCSI persistent
>>> reservation settings. The XML part looks like this:
>>>
>>>   
>>> 
>>>   
>>>
>>> If @managed is set to 'yes' then the  is not parsed.
>>> This design was agreed on here:
>>>
>>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>>>
>>> Signed-off-by: Michal Privoznik 
>>> ---
>>>  docs/formatdomain.html.in  |  25 +++-
>>>  docs/schemas/domaincommon.rng  |  19 +--
>>>  docs/schemas/storagecommon.rng |  34 +
>>>  src/conf/domain_conf.c |  36 +
>>>  src/libvirt_private.syms   |   3 +
>>>  src/util/virstoragefile.c  | 148 
>>> +
>>>  src/util/virstoragefile.h  |  15 +++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++
>>>  .../disk-virtio-scsi-reservations.xml  |  38 ++
>>>  .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
>>>  .../disk-virtio-scsi-reservations.xml  |   1 +
>>>  tests/qemuxml2xmltest.c|   4 +
>>>  12 files changed, 348 insertions(+), 16 deletions(-)
>>>  create mode 100644 
>>> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>>>  create mode 12 
>>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>>>  create mode 12 
>>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>>>
>>
>> Before digging too deep into this...
>>
>>  - I assume we're avoiding  iSCSI mainly because those
>> reservations would take place elsewhere, safe assumption?
> 
> I believe so, but I'll let Paolo answer that. The way I understand
> reservations is that qemu needs to issue 'privileged' SCSI commands and
> thus for regular SCSI (which for purpose of this argument involves iSCSI
> emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
> to which it'll pass the FD and which will issue the 'privileged' SCSI
> commands on qemu's behalf.
> 

Just trying to cover all the bases...  Disks, Hostdevs, Storage Pools,
SCSI, iSCSI, NPIV, etc.  A few different ways to figure LUNs.

I think you're right regarding the privileged command - it's what I have
a recollection of when dealing with PR before with the rawio and sgio
properties.

>>
>>  - What about using lun's from a storage pool (and what could become
>> your favorite, NPIV devices ;-))
>>
>>
>>  
>>  
>>  
>>
> 
> These should work too with my patches (not tested though - I don't have
> any real SCSI machine).
> 

I have access to the NPIV environment I use and can let you use it as
well...  So that part is not a problem. IIRC, this was a two pronged
question, but I neglected to go back and be more explicit.. The second
half of the question is when using a storage pool for your LUN, then how
is the PR added? The  are only being processed from one
very specific disk source type. For reference, see how many times
encryption is added to various RNG definitions - including
diskSourceVolume which is what I think you'd want to process a 'srcpool'
definition.

The virStorageTranslateDiskSourcePool handles the magic of generating
the disk like structure used when building the command line.  There's
also a 'mode' attribute where if set to 'host' for an iSCSI LUN's will
result in using the "local" filesystem path to the LUN rather than going
through the iSCSI server. I would think for this case, then one would
have to have the PR set up (if configured of course).

>>
>>  - What about 's?
>>
>>
>>
>>but not iSCSI or vHost hostdev's. I think that creates the SCSI
>> generic LUN, but it's been a while since I've thought about the
>> terminology used for hostdev's...
> 
> I think these don't need the feature since qemu can access the device
> directly.
> 

Well that'll make life a bit easier. IIRC, this creates a /dev/sgN on
the guest.

>>
>>I also have this faint recollection of PR related to sgio filtering
>> and perhaps even rawio, but dredging that back up could send me down the
>> path of some "downstream only" type bz's. Although searching on just
>> VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.
>>
>> And finally... I assume there is one qemu-pr-manager (qemu.conf changes
>> eventually)... Eventually there's magic that allows/adds per domain
>> *and* per LUN some socket path. If libvirt provided it's generated via
>> the domain temporary directory; however, what's not really clear is how
>> that unmanaged path really works.  Need a virtual whiteboard...
> 
> So, in case of unmanaged path, here are the assumptions that my patches
> are built 

Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

2018-01-29 Thread Michal Privoznik
On 01/26/2018 04:07 AM, John Ferlan wrote:
> 
> 
> On 01/18/2018 11:04 AM, Michal Privoznik wrote:
>> This is a definition that holds information on SCSI persistent
>> reservation settings. The XML part looks like this:
>>
>>   
>> 
>>   
>>
>> If @managed is set to 'yes' then the  is not parsed.
>> This design was agreed on here:
>>
>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>>
>> Signed-off-by: Michal Privoznik 
>> ---
>>  docs/formatdomain.html.in  |  25 +++-
>>  docs/schemas/domaincommon.rng  |  19 +--
>>  docs/schemas/storagecommon.rng |  34 +
>>  src/conf/domain_conf.c |  36 +
>>  src/libvirt_private.syms   |   3 +
>>  src/util/virstoragefile.c  | 148 
>> +
>>  src/util/virstoragefile.h  |  15 +++
>>  .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++
>>  .../disk-virtio-scsi-reservations.xml  |  38 ++
>>  .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
>>  .../disk-virtio-scsi-reservations.xml  |   1 +
>>  tests/qemuxml2xmltest.c|   4 +
>>  12 files changed, 348 insertions(+), 16 deletions(-)
>>  create mode 100644 
>> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>>  create mode 12 
>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>>  create mode 12 
>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>>
> 
> Before digging too deep into this...
> 
>  - I assume we're avoiding  iSCSI mainly because those
> reservations would take place elsewhere, safe assumption?

I believe so, but I'll let Paolo answer that. The way I understand
reservations is that qemu needs to issue 'privileged' SCSI commands and
thus for regular SCSI (which for purpose of this argument involves iSCSI
emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
to which it'll pass the FD and which will issue the 'privileged' SCSI
commands on qemu's behalf.

> 
>  - What about using lun's from a storage pool (and what could become
> your favorite, NPIV devices ;-))
> 
>
>  
>  
>  
>

These should work too with my patches (not tested though - I don't have
any real SCSI machine).

> 
>  - What about 's?
> 
>
> 
>but not iSCSI or vHost hostdev's. I think that creates the SCSI
> generic LUN, but it's been a while since I've thought about the
> terminology used for hostdev's...

I think these don't need the feature since qemu can access the device
directly.

> 
>I also have this faint recollection of PR related to sgio filtering
> and perhaps even rawio, but dredging that back up could send me down the
> path of some "downstream only" type bz's. Although searching on just
> VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.
> 
> And finally... I assume there is one qemu-pr-manager (qemu.conf changes
> eventually)... Eventually there's magic that allows/adds per domain
> *and* per LUN some socket path. If libvirt provided it's generated via
> the domain temporary directory; however, what's not really clear is how
> that unmanaged path really works.  Need a virtual whiteboard...

So, in case of unmanaged path, here are the assumptions that my patches
are built on:

1) unmanaged helper process (UHP) is spawned by somebody else's than
libvirtd (hence unmanaged) - it doesn't have to be user, it can be
systemd for instance.

2) path to UHP's socket has to be labeled correctly - libvirt doesn't
touch that, because it knows nothing about usage scenario, whether sys
admin intended one UHP per whole host and thus configured label that
way, or it is spawned by mgmt app (or systemd, or whomever) per one
domain, or even disk. Therefore, we can do nothing more than shrug
shoulders and require users to label the socket correctly. Or use
managed helper.

3) in future, when UHP dies, libvirt will NOT spawn it again. It's
unmanaged after all. It's user/sysadmin responsibility to spawn it
again.

Now, for the managed helper process (MHP) the assumptions are:

1) there's one MHP per domain (all SCSI disks in the domain share the
same MHP).

2) the MHP runs as root, but is placed into the same CGroup, mount
namespace as qemu process it serves

3) MHP is lives and dies with the domain it is associated with.

The code might be complicated more than needed - it is prepared to have
one MHP per disk rather than domain (should we ever need it). Therefore
instead of storing one pid_t, we store them in a hash table where more
can be stored.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

2018-01-25 Thread John Ferlan


On 01/18/2018 11:04 AM, Michal Privoznik wrote:
> This is a definition that holds information on SCSI persistent
> reservation settings. The XML part looks like this:
> 
>   
> 
>   
> 
> If @managed is set to 'yes' then the  is not parsed.
> This design was agreed on here:
> 
> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
> 
> Signed-off-by: Michal Privoznik 
> ---
>  docs/formatdomain.html.in  |  25 +++-
>  docs/schemas/domaincommon.rng  |  19 +--
>  docs/schemas/storagecommon.rng |  34 +
>  src/conf/domain_conf.c |  36 +
>  src/libvirt_private.syms   |   3 +
>  src/util/virstoragefile.c  | 148 
> +
>  src/util/virstoragefile.h  |  15 +++
>  .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++
>  .../disk-virtio-scsi-reservations.xml  |  38 ++
>  .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
>  .../disk-virtio-scsi-reservations.xml  |   1 +
>  tests/qemuxml2xmltest.c|   4 +
>  12 files changed, 348 insertions(+), 16 deletions(-)
>  create mode 100644 
> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>  create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>  create mode 12 
> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>  create mode 12 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
> 

Before digging too deep into this...

 - I assume we're avoiding  iSCSI mainly because those
reservations would take place elsewhere, safe assumption?

 - What about using lun's from a storage pool (and what could become
your favorite, NPIV devices ;-))

   
 
 
 
   

 - What about 's?

   

   but not iSCSI or vHost hostdev's. I think that creates the SCSI
generic LUN, but it's been a while since I've thought about the
terminology used for hostdev's...

   I also have this faint recollection of PR related to sgio filtering
and perhaps even rawio, but dredging that back up could send me down the
path of some "downstream only" type bz's. Although searching on just
VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.

And finally... I assume there is one qemu-pr-manager (qemu.conf changes
eventually)... Eventually there's magic that allows/adds per domain
*and* per LUN some socket path. If libvirt provided it's generated via
the domain temporary directory; however, what's not really clear is how
that unmanaged path really works.  Need a virtual whiteboard...

I only commented at this first patch for now. I did briefly scan the
others, but too many questions here to dig deep into those. I did run
the entire series through my Coverity checker - there's 3 things that
get tagged in later patches.

John

> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index d272cc1ba..23c5f071e 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -2457,7 +2457,10 @@
>/disk
>disk type='block' device='lun'
>  driver name='qemu' type='raw'/
> -source dev='/dev/sda'/
> +source dev='/dev/sda'
> +  reservations enabled='yes' managed='no'
> +source type='unix' path='/path/to/qemu-pr-helper' 
> mode='client'/
> +  /reservations
>  target dev='sda' bus='scsi'/
>  address type='drive' controller='0' bus='0' target='3' unit='0'/
>/disk
> @@ -2819,6 +2822,26 @@
>  Storage Encryption
>  page for more information.
>
> +  reservations
> +  Since libvirt 3.10.0, the

Will be 4.1.0 at least...

> +reservations can be a sub-element of the
> +source element for storage sources (QEMU driver 
> only).
> +If present (and enabled) it enabled persistent reservations for

Doesn't read well - even with the second enabled changed to enables.

Furthermore what's the purpose of enabled=no? Isn't that just like
removing ? If so, then I think enabled just gets removed
and the "managed" is the only required attribute. Besides you don't have
a "enabled='no'" example.  And, no, sorry I didn't go back through the
entire "New QEMU daemon for persistent reservations" discussion that
started in August to see if this was discussed there.

Looking at virStoragePRDefFormat when enabled='no' anything anyone set
for managed would be lost (e.g. not printed). IOW: by changing from
enabled=yes and managed=no to enabled=no, then everything in the
 subelement is lost.

> +SCSI based disks. The element has one mandatory attribute
> +enabled with accepted values yes and

NB: Other uses go with "yes" and "no" not the code wrapped tag.

> +no. If the feature is enabled, then there's another
> +mandatory attribute managed (accepted values 

[libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef

2018-01-18 Thread Michal Privoznik
This is a definition that holds information on SCSI persistent
reservation settings. The XML part looks like this:

  

  

If @managed is set to 'yes' then the  is not parsed.
This design was agreed on here:

https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html

Signed-off-by: Michal Privoznik 
---
 docs/formatdomain.html.in  |  25 +++-
 docs/schemas/domaincommon.rng  |  19 +--
 docs/schemas/storagecommon.rng |  34 +
 src/conf/domain_conf.c |  36 +
 src/libvirt_private.syms   |   3 +
 src/util/virstoragefile.c  | 148 +
 src/util/virstoragefile.h  |  15 +++
 .../disk-virtio-scsi-reservations-not-managed.xml  |  40 ++
 .../disk-virtio-scsi-reservations.xml  |  38 ++
 .../disk-virtio-scsi-reservations-not-managed.xml  |   1 +
 .../disk-virtio-scsi-reservations.xml  |   1 +
 tests/qemuxml2xmltest.c|   4 +
 12 files changed, 348 insertions(+), 16 deletions(-)
 create mode 100644 
tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
 create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
 create mode 12 
tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
 create mode 12 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index d272cc1ba..23c5f071e 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2457,7 +2457,10 @@
   /disk
   disk type='block' device='lun'
 driver name='qemu' type='raw'/
-source dev='/dev/sda'/
+source dev='/dev/sda'
+  reservations enabled='yes' managed='no'
+source type='unix' path='/path/to/qemu-pr-helper' 
mode='client'/
+  /reservations
 target dev='sda' bus='scsi'/
 address type='drive' controller='0' bus='0' target='3' unit='0'/
   /disk
@@ -2819,6 +2822,26 @@
 Storage Encryption
 page for more information.
   
+  reservations
+  Since libvirt 3.10.0, the
+reservations can be a sub-element of the
+source element for storage sources (QEMU driver only).
+If present (and enabled) it enabled persistent reservations for
+SCSI based disks. The element has one mandatory attribute
+enabled with accepted values yes and
+no. If the feature is enabled, then there's another
+mandatory attribute managed (accepted values are the
+same as for enabled) that enables or disables libvirt
+spawning any helper process (should one be needed). However, if
+libvirt is not enabled spawning helper process (i.e. hypervisor
+should just use one which is already running), path to its socket
+must be provided in child element source, which
+currently accepts only the following attributes: type
+with one value unix, path with path the
+socket, and finally mode which accepts either
+server or client and specifies the role
+of hypervisor.
+  
 
 
 
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index f22c932f6..884081bd7 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -1498,6 +1498,9 @@
 
   
 
+
+  
+
 
   
 
@@ -2447,21 +2450,7 @@
 vhostuser
   
   
-
-  
-unix
-  
-  
-
-  
-  
-
-  server
-  client
-
-  
-  
-
+
 
   
 
diff --git a/docs/schemas/storagecommon.rng b/docs/schemas/storagecommon.rng
index edee1b084..9071667b0 100644
--- a/docs/schemas/storagecommon.rng
+++ b/docs/schemas/storagecommon.rng
@@ -39,6 +39,40 @@
 
   
 
+  
+
+  
+unix
+  
+  
+
+  
+  
+
+  server
+  client
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
   
 
   
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index a1c25060f..222c6dd15 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -5163,6 +5163,13 @@ virDomainDiskDefValidate(const virDomainDiskDef *disk)
 }
 }
 
+if (disk->src->pr &&
+disk->device != VIR_DOMAIN_DISK_DEVICE_LUN) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _(" allowed only for lun