Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef
On 29/01/2018 14:03, Michal Privoznik wrote: > On 01/26/2018 04:07 AM, John Ferlan wrote: >> >> >> On 01/18/2018 11:04 AM, Michal Privoznik wrote: >>> This is a definition that holds information on SCSI persistent >>> reservation settings. The XML part looks like this: >>> >>> >>> >>> >>> >>> If @managed is set to 'yes' then the is not parsed. >>> This design was agreed on here: >>> >>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html >>> >>> Signed-off-by: Michal Privoznik>>> --- >>> docs/formatdomain.html.in | 25 +++- >>> docs/schemas/domaincommon.rng | 19 +-- >>> docs/schemas/storagecommon.rng | 34 + >>> src/conf/domain_conf.c | 36 + >>> src/libvirt_private.syms | 3 + >>> src/util/virstoragefile.c | 148 >>> + >>> src/util/virstoragefile.h | 15 +++ >>> .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ >>> .../disk-virtio-scsi-reservations.xml | 38 ++ >>> .../disk-virtio-scsi-reservations-not-managed.xml | 1 + >>> .../disk-virtio-scsi-reservations.xml | 1 + >>> tests/qemuxml2xmltest.c| 4 + >>> 12 files changed, 348 insertions(+), 16 deletions(-) >>> create mode 100644 >>> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml >>> create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml >>> create mode 12 >>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml >>> create mode 12 >>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml >>> >> >> Before digging too deep into this... >> >> - I assume we're avoiding iSCSI mainly because those >> reservations would take place elsewhere, safe assumption? > > I believe so, but I'll let Paolo answer that. The way I understand > reservations is that qemu needs to issue 'privileged' SCSI commands and > thus for regular SCSI (which for purpose of this argument involves iSCSI > emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process > to which it'll pass the FD and which will issue the 'privileged' SCSI > commands on qemu's behalf. Yes. There are two reasons for QEMU to access the helper. First, in order to be able to issue the command without CAP_SYS_RAWIO. Second, in order to access /dev/mapper/control and issue the command to all targets in a multipath setup. iSCSI in kernel, including multipath over iSCSI is included. iSCSI in userspace does not need qemu-pr-manager because QEMU 1) can just send the command down a TCP socket without needing CAP_SYS_RAWIO 2) does not support multipath for iSCSI in userspace. >> - What about using lun's from a storage pool (and what could become >> your favorite, NPIV devices ;-)) >> >> >> >> >> >> > > These should work too with my patches (not tested though - I don't have > any real SCSI machine). > >> - What about 's? >> >> >> >>but not iSCSI or vHost hostdev's. I think that creates the SCSI >> generic LUN, but it's been a while since I've thought about the >> terminology used for hostdev's... > > I think these don't need the feature since qemu can access the device > directly. They actually need the feature, but it can be added later. >> And finally... I assume there is one qemu-pr-manager (qemu.conf changes >> eventually)... Eventually there's magic that allows/adds per domain >> *and* per LUN some socket path. If libvirt provided it's generated via >> the domain temporary directory; however, what's not really clear is how >> that unmanaged path really works. Need a virtual whiteboard... > > So, in case of unmanaged path, here are the assumptions that my patches > are built on: > > 1) unmanaged helper process (UHP) is spawned by somebody else's than > libvirtd (hence unmanaged) - it doesn't have to be user, it can be > systemd for instance. > > 2) path to UHP's socket has to be labeled correctly - libvirt doesn't > touch that > > 3) in future, when UHP dies, libvirt will NOT spawn it again. It's > unmanaged after all. It's user/sysadmin responsibility to spawn it > again. Correct. > Now, for the managed helper process (MHP) the assumptions are: > > 1) there's one MHP per domain (all SCSI disks in the domain share the > same MHP). > > 2) the MHP runs as root, but is placed into the same CGroup, mount > namespace as qemu process it serves > > 3) MHP is lives and dies with the domain it is associated with. Correct, with the caveat that QEMU must provide the MHP state and death event for this to be complete. Thanks, Paolo > The code might be complicated more than needed - it is prepared to have > one MHP per disk rather than domain (should we ever need it). Therefore > instead of storing one pid_t, we store them in a hash table
Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef
On 01/29/2018 08:03 AM, Michal Privoznik wrote: > On 01/26/2018 04:07 AM, John Ferlan wrote: >> >> >> On 01/18/2018 11:04 AM, Michal Privoznik wrote: >>> This is a definition that holds information on SCSI persistent >>> reservation settings. The XML part looks like this: >>> >>> >>> >>> >>> >>> If @managed is set to 'yes' then the is not parsed. >>> This design was agreed on here: >>> >>> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html >>> >>> Signed-off-by: Michal Privoznik>>> --- >>> docs/formatdomain.html.in | 25 +++- >>> docs/schemas/domaincommon.rng | 19 +-- >>> docs/schemas/storagecommon.rng | 34 + >>> src/conf/domain_conf.c | 36 + >>> src/libvirt_private.syms | 3 + >>> src/util/virstoragefile.c | 148 >>> + >>> src/util/virstoragefile.h | 15 +++ >>> .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ >>> .../disk-virtio-scsi-reservations.xml | 38 ++ >>> .../disk-virtio-scsi-reservations-not-managed.xml | 1 + >>> .../disk-virtio-scsi-reservations.xml | 1 + >>> tests/qemuxml2xmltest.c| 4 + >>> 12 files changed, 348 insertions(+), 16 deletions(-) >>> create mode 100644 >>> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml >>> create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml >>> create mode 12 >>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml >>> create mode 12 >>> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml >>> >> >> Before digging too deep into this... >> >> - I assume we're avoiding iSCSI mainly because those >> reservations would take place elsewhere, safe assumption? > > I believe so, but I'll let Paolo answer that. The way I understand > reservations is that qemu needs to issue 'privileged' SCSI commands and > thus for regular SCSI (which for purpose of this argument involves iSCSI > emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process > to which it'll pass the FD and which will issue the 'privileged' SCSI > commands on qemu's behalf. > Just trying to cover all the bases... Disks, Hostdevs, Storage Pools, SCSI, iSCSI, NPIV, etc. A few different ways to figure LUNs. I think you're right regarding the privileged command - it's what I have a recollection of when dealing with PR before with the rawio and sgio properties. >> >> - What about using lun's from a storage pool (and what could become >> your favorite, NPIV devices ;-)) >> >> >> >> >> >> > > These should work too with my patches (not tested though - I don't have > any real SCSI machine). > I have access to the NPIV environment I use and can let you use it as well... So that part is not a problem. IIRC, this was a two pronged question, but I neglected to go back and be more explicit.. The second half of the question is when using a storage pool for your LUN, then how is the PR added? The are only being processed from one very specific disk source type. For reference, see how many times encryption is added to various RNG definitions - including diskSourceVolume which is what I think you'd want to process a 'srcpool' definition. The virStorageTranslateDiskSourcePool handles the magic of generating the disk like structure used when building the command line. There's also a 'mode' attribute where if set to 'host' for an iSCSI LUN's will result in using the "local" filesystem path to the LUN rather than going through the iSCSI server. I would think for this case, then one would have to have the PR set up (if configured of course). >> >> - What about 's? >> >> >> >>but not iSCSI or vHost hostdev's. I think that creates the SCSI >> generic LUN, but it's been a while since I've thought about the >> terminology used for hostdev's... > > I think these don't need the feature since qemu can access the device > directly. > Well that'll make life a bit easier. IIRC, this creates a /dev/sgN on the guest. >> >>I also have this faint recollection of PR related to sgio filtering >> and perhaps even rawio, but dredging that back up could send me down the >> path of some "downstream only" type bz's. Although searching on just >> VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO. >> >> And finally... I assume there is one qemu-pr-manager (qemu.conf changes >> eventually)... Eventually there's magic that allows/adds per domain >> *and* per LUN some socket path. If libvirt provided it's generated via >> the domain temporary directory; however, what's not really clear is how >> that unmanaged path really works. Need a virtual whiteboard... > > So, in case of unmanaged path, here are the assumptions that my patches > are built
Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef
On 01/26/2018 04:07 AM, John Ferlan wrote: > > > On 01/18/2018 11:04 AM, Michal Privoznik wrote: >> This is a definition that holds information on SCSI persistent >> reservation settings. The XML part looks like this: >> >> >> >> >> >> If @managed is set to 'yes' then the is not parsed. >> This design was agreed on here: >> >> https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html >> >> Signed-off-by: Michal Privoznik>> --- >> docs/formatdomain.html.in | 25 +++- >> docs/schemas/domaincommon.rng | 19 +-- >> docs/schemas/storagecommon.rng | 34 + >> src/conf/domain_conf.c | 36 + >> src/libvirt_private.syms | 3 + >> src/util/virstoragefile.c | 148 >> + >> src/util/virstoragefile.h | 15 +++ >> .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ >> .../disk-virtio-scsi-reservations.xml | 38 ++ >> .../disk-virtio-scsi-reservations-not-managed.xml | 1 + >> .../disk-virtio-scsi-reservations.xml | 1 + >> tests/qemuxml2xmltest.c| 4 + >> 12 files changed, 348 insertions(+), 16 deletions(-) >> create mode 100644 >> tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml >> create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml >> create mode 12 >> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml >> create mode 12 >> tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml >> > > Before digging too deep into this... > > - I assume we're avoiding iSCSI mainly because those > reservations would take place elsewhere, safe assumption? I believe so, but I'll let Paolo answer that. The way I understand reservations is that qemu needs to issue 'privileged' SCSI commands and thus for regular SCSI (which for purpose of this argument involves iSCSI emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process to which it'll pass the FD and which will issue the 'privileged' SCSI commands on qemu's behalf. > > - What about using lun's from a storage pool (and what could become > your favorite, NPIV devices ;-)) > > > > > > These should work too with my patches (not tested though - I don't have any real SCSI machine). > > - What about 's? > > > >but not iSCSI or vHost hostdev's. I think that creates the SCSI > generic LUN, but it's been a while since I've thought about the > terminology used for hostdev's... I think these don't need the feature since qemu can access the device directly. > >I also have this faint recollection of PR related to sgio filtering > and perhaps even rawio, but dredging that back up could send me down the > path of some "downstream only" type bz's. Although searching on just > VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO. > > And finally... I assume there is one qemu-pr-manager (qemu.conf changes > eventually)... Eventually there's magic that allows/adds per domain > *and* per LUN some socket path. If libvirt provided it's generated via > the domain temporary directory; however, what's not really clear is how > that unmanaged path really works. Need a virtual whiteboard... So, in case of unmanaged path, here are the assumptions that my patches are built on: 1) unmanaged helper process (UHP) is spawned by somebody else's than libvirtd (hence unmanaged) - it doesn't have to be user, it can be systemd for instance. 2) path to UHP's socket has to be labeled correctly - libvirt doesn't touch that, because it knows nothing about usage scenario, whether sys admin intended one UHP per whole host and thus configured label that way, or it is spawned by mgmt app (or systemd, or whomever) per one domain, or even disk. Therefore, we can do nothing more than shrug shoulders and require users to label the socket correctly. Or use managed helper. 3) in future, when UHP dies, libvirt will NOT spawn it again. It's unmanaged after all. It's user/sysadmin responsibility to spawn it again. Now, for the managed helper process (MHP) the assumptions are: 1) there's one MHP per domain (all SCSI disks in the domain share the same MHP). 2) the MHP runs as root, but is placed into the same CGroup, mount namespace as qemu process it serves 3) MHP is lives and dies with the domain it is associated with. The code might be complicated more than needed - it is prepared to have one MHP per disk rather than domain (should we ever need it). Therefore instead of storing one pid_t, we store them in a hash table where more can be stored. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef
On 01/18/2018 11:04 AM, Michal Privoznik wrote: > This is a definition that holds information on SCSI persistent > reservation settings. The XML part looks like this: > > > > > > If @managed is set to 'yes' then the is not parsed. > This design was agreed on here: > > https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html > > Signed-off-by: Michal Privoznik> --- > docs/formatdomain.html.in | 25 +++- > docs/schemas/domaincommon.rng | 19 +-- > docs/schemas/storagecommon.rng | 34 + > src/conf/domain_conf.c | 36 + > src/libvirt_private.syms | 3 + > src/util/virstoragefile.c | 148 > + > src/util/virstoragefile.h | 15 +++ > .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ > .../disk-virtio-scsi-reservations.xml | 38 ++ > .../disk-virtio-scsi-reservations-not-managed.xml | 1 + > .../disk-virtio-scsi-reservations.xml | 1 + > tests/qemuxml2xmltest.c| 4 + > 12 files changed, 348 insertions(+), 16 deletions(-) > create mode 100644 > tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml > create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml > create mode 12 > tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml > create mode 12 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml > Before digging too deep into this... - I assume we're avoiding iSCSI mainly because those reservations would take place elsewhere, safe assumption? - What about using lun's from a storage pool (and what could become your favorite, NPIV devices ;-)) - What about 's? but not iSCSI or vHost hostdev's. I think that creates the SCSI generic LUN, but it's been a while since I've thought about the terminology used for hostdev's... I also have this faint recollection of PR related to sgio filtering and perhaps even rawio, but dredging that back up could send me down the path of some "downstream only" type bz's. Although searching on just VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO. And finally... I assume there is one qemu-pr-manager (qemu.conf changes eventually)... Eventually there's magic that allows/adds per domain *and* per LUN some socket path. If libvirt provided it's generated via the domain temporary directory; however, what's not really clear is how that unmanaged path really works. Need a virtual whiteboard... I only commented at this first patch for now. I did briefly scan the others, but too many questions here to dig deep into those. I did run the entire series through my Coverity checker - there's 3 things that get tagged in later patches. John > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index d272cc1ba..23c5f071e 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -2457,7 +2457,10 @@ >/disk >disk type='block' device='lun' > driver name='qemu' type='raw'/ > -source dev='/dev/sda'/ > +source dev='/dev/sda' > + reservations enabled='yes' managed='no' > +source type='unix' path='/path/to/qemu-pr-helper' > mode='client'/ > + /reservations > target dev='sda' bus='scsi'/ > address type='drive' controller='0' bus='0' target='3' unit='0'/ >/disk > @@ -2819,6 +2822,26 @@ > Storage Encryption > page for more information. > > + reservations > + Since libvirt 3.10.0, the Will be 4.1.0 at least... > +reservations can be a sub-element of the > +source element for storage sources (QEMU driver > only). > +If present (and enabled) it enabled persistent reservations for Doesn't read well - even with the second enabled changed to enables. Furthermore what's the purpose of enabled=no? Isn't that just like removing ? If so, then I think enabled just gets removed and the "managed" is the only required attribute. Besides you don't have a "enabled='no'" example. And, no, sorry I didn't go back through the entire "New QEMU daemon for persistent reservations" discussion that started in August to see if this was discussed there. Looking at virStoragePRDefFormat when enabled='no' anything anyone set for managed would be lost (e.g. not printed). IOW: by changing from enabled=yes and managed=no to enabled=no, then everything in the subelement is lost. > +SCSI based disks. The element has one mandatory attribute > +enabled with accepted values yes and NB: Other uses go with "yes" and "no" not the code wrapped tag. > +no. If the feature is enabled, then there's another > +mandatory attribute managed (accepted values
[libvirt] [PATCH 01/14] virstoragefile: Introduce virStoragePRDef
This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this: If @managed is set to 'yes' then the is not parsed. This design was agreed on here: https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html Signed-off-by: Michal Privoznik--- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 + src/conf/domain_conf.c | 36 + src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 + src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ .../disk-virtio-scsi-reservations.xml | 38 ++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c| 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 12 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 12 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index d272cc1ba..23c5f071e 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2457,7 +2457,10 @@ /disk disk type='block' device='lun' driver name='qemu' type='raw'/ -source dev='/dev/sda'/ +source dev='/dev/sda' + reservations enabled='yes' managed='no' +source type='unix' path='/path/to/qemu-pr-helper' mode='client'/ + /reservations target dev='sda' bus='scsi'/ address type='drive' controller='0' bus='0' target='3' unit='0'/ /disk @@ -2819,6 +2822,26 @@ Storage Encryption page for more information. + reservations + Since libvirt 3.10.0, the +reservations can be a sub-element of the +source element for storage sources (QEMU driver only). +If present (and enabled) it enabled persistent reservations for +SCSI based disks. The element has one mandatory attribute +enabled with accepted values yes and +no. If the feature is enabled, then there's another +mandatory attribute managed (accepted values are the +same as for enabled) that enables or disables libvirt +spawning any helper process (should one be needed). However, if +libvirt is not enabled spawning helper process (i.e. hypervisor +should just use one which is already running), path to its socket +must be provided in child element source, which +currently accepts only the following attributes: type +with one value unix, path with path the +socket, and finally mode which accepts either +server or client and specifies the role +of hypervisor. + diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f22c932f6..884081bd7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1498,6 +1498,9 @@ + + + @@ -2447,21 +2450,7 @@ vhostuser - - -unix - - - - - - - server - client - - - - + diff --git a/docs/schemas/storagecommon.rng b/docs/schemas/storagecommon.rng index edee1b084..9071667b0 100644 --- a/docs/schemas/storagecommon.rng +++ b/docs/schemas/storagecommon.rng @@ -39,6 +39,40 @@ + + + +unix + + + + + + + server + client + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a1c25060f..222c6dd15 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5163,6 +5163,13 @@ virDomainDiskDefValidate(const virDomainDiskDef *disk) } } +if (disk->src->pr && +disk->device != VIR_DOMAIN_DISK_DEVICE_LUN) { +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _(" allowed only for lun