Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
On 09/13/2016 05:48 PM, Eric Farman wrote: On 09/13/2016 04:49 PM, John Ferlan wrote: [...] Thanks for the review. (Silent ACK on a lot of the above comments.) I'll try to get these all in place for a v3 with a little runway before the next freeze. OK - hopefully I won't be neck deep in some other firefight Thanks for taking a few minutes away from those to look over this little pile. I appreciate the fires that pop up and get in the way of the usual todo list. :) John, I had hoped to get a v3 off to the list before the 2.3 freeze, but that obviously didn't happen. FYI, I'm off on another tangent for a couple of weeks. With any luck, I'll have this back to the list before 2.4. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
On 09/13/2016 04:49 PM, John Ferlan wrote: [...] I took a peek at the v1, but suffice to say didn't follow it that closely since there's a number of changes in v2 related to patch ordering. You could get 3 different opinions on that matter too! Hopefully a consensus before too long. :) yeah - v1 was lots of little steps that you were asked to combine. OTOH there are those that like to combine multiple things in one patch which can make it really difficult to follow. The order here was "followable" - my suggestion was more to try and get the guts (qemu) done first, then the fluff (domain/xml). Agreed. v1 was perhaps overkill in breaking things down so small, because I was picking up an idea with a bit of dust on it. Consider your patch order to be making progress towards some sustainable feature/bug fix - each step along the way being able to pass "make check syntax-check" so that future attempts to git bisect can point the fickle finger of fate (at someone else!). If a patch adds something that will be external that a followup patch will use - then great. If it's testable, even better. For example, adding virSCSIOpenVhost as a separate patch with the tests/qemuxml2argvmock.c is fine. One more tidbit here - it's a very faint memory, but check out tests/commandtest.c and the lengthy comment in mymain() regarding "expected" fd values. I think you're "safe" in what you've done - although your mock should mimic whatever changes you make to the primary function. [...] A wwpn doesn't generally have the "naa." prefix on it. So, why not just add that to the command line instead? That is a wwpn is a 16 hex digit value. I'm assuming that in order to support this feature a "naa." is prefixed and sent to qemu. Does that necessarily mean some other hypervisor would choose to place the "naa." prefix or would then code need to be added to strip it off for those other hypervisors? Looking at the qemu code - it seems fairly specific. Fairly certain that it's more a vhost thing of which the hypervisor utilizes. Per a recommendation made on this list years ago, I used targetcli to do the configuration rather than direct manipulation of sysfs, and according to its man page: All WWNs are prefaced by their type, such as "iqn", "naa", or "ib", and may be given with or without colons. I don't use targetcli so I'm not familiar with it. Still not convinced using naa.# in the XML is the best way to go, but I'm not against it. I guess a lot depends on how the sysfs views things. As to whether other hypervisors follow the same behavior, I do not know. I was just trying to conform to what it would allow in the case of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn" here since as you say that's 16 hex digits, and should probably say just "wwn" ? I guess I've always thought of a wwn/wwpn/wwnn as that string of bytes... I always thought WWPN and WWNN as types of WWN... Usually WWN/WWNN have been interchangeable, while WWPN is a WWN assigned to a port in a Fibre Channel fabric ...but have seen it this way also. Either way makes sense to me. FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA will list a wwnn, wwpn, and fabric_wwn. The fabric_wwn's will match. The wwnn/wwpn for the vport is what was used when running the HBA's/vport_create function. It's a tangled web. Wait until vGPU's show up (that's another black magic trick). [...] It would be nice to tell a user how to find that wwpn value via some command (virsh or otherwise) in order to fill in the wwpn attribute. Although there's always varying opinions on this since many times it's distro dependent. That's why I suggest virsh - at least if it's supportable we can display the wwpn there using nodedev-dumpxml. Hrm... nodedev-dumpxml is new to me, neat! I'll go down that bunny trail, but it seems like it'd be a bit of work to add the smarts for this. Hours of all sorts of fun The 'nodedev-{list|dumpxml}' are essentially the libvirt mechanism to list/describe the various rabbit holes that is/are the sysfs tree and especially branches. The nodedev-list --tree will probably be of most interest, but so is virsh nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net} What/where is the 'vhost_scsi_wwpn'? IOW: It's not something defined yet - I'm assuming there's some output somewhere where that's pulled from. As near as I can tell, it's not anything that can be autogenerated off the bat. Somebody somewhere needs to first establish the sysfs entries for the vhost target, which can then be stored and loaded on a next (host) boot. The end result looks something like this: AFAIK systemd handles the sysfs creation (ok at least for my purposes). This is what I was concerned about - we really need a way to tell someone how to "easily" find the wwpn that's to be used. Although I don't have vhost-scsi on my laptop ;-) Me neither. Of course, I'm using an s390, so that's a
Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
[...] >>> >> I took a peek at the v1, but suffice to say didn't follow it that >> closely since there's a number of changes in v2 related to patch >> ordering. You could get 3 different opinions on that matter too! > > Hopefully a consensus before too long. :) > yeah - v1 was lots of little steps that you were asked to combine. OTOH there are those that like to combine multiple things in one patch which can make it really difficult to follow. The order here was "followable" - my suggestion was more to try and get the guts (qemu) done first, then the fluff (domain/xml). Consider your patch order to be making progress towards some sustainable feature/bug fix - each step along the way being able to pass "make check syntax-check" so that future attempts to git bisect can point the fickle finger of fate (at someone else!). If a patch adds something that will be external that a followup patch will use - then great. If it's testable, even better. For example, adding virSCSIOpenVhost as a separate patch with the tests/qemuxml2argvmock.c is fine. One more tidbit here - it's a very faint memory, but check out tests/commandtest.c and the lengthy comment in mymain() regarding "expected" fd values. I think you're "safe" in what you've done - although your mock should mimic whatever changes you make to the primary function. [...] >> A wwpn doesn't generally have the "naa." prefix on it. So, why not just >> add that to the command line instead? That is a wwpn is a 16 hex digit >> value. I'm assuming that in order to support this feature a "naa." is >> prefixed and sent to qemu. Does that necessarily mean some other >> hypervisor would choose to place the "naa." prefix or would then code >> need to be added to strip it off for those other hypervisors? Looking >> at the qemu code - it seems fairly specific. > > Fairly certain that it's more a vhost thing of which the hypervisor > utilizes. Per a recommendation made on this list years ago, I used > targetcli to do the configuration rather than direct manipulation of > sysfs, and according to its man page: > > All WWNs are prefaced by their type, such as "iqn", "naa", or "ib", > and may be given with or without colons. > I don't use targetcli so I'm not familiar with it. Still not convinced using naa.# in the XML is the best way to go, but I'm not against it. I guess a lot depends on how the sysfs views things. > As to whether other hypervisors follow the same behavior, I do not > know. I was just trying to conform to what it would allow in the case > of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn" > here since as you say that's 16 hex digits, and should probably say just > "wwn" ? > I guess I've always thought of a wwn/wwpn/wwnn as that string of bytes... Usually WWN/WWNN have been interchangeable, while WWPN is a WWN assigned to a port in a Fibre Channel fabric FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA will list a wwnn, wwpn, and fabric_wwn. The fabric_wwn's will match. The wwnn/wwpn for the vport is what was used when running the HBA's/vport_create function. It's a tangled web. Wait until vGPU's show up (that's another black magic trick). [...] >> It would be nice to tell a user how to find that wwpn value via some >> command (virsh or otherwise) in order to fill in the wwpn attribute. >> Although there's always varying opinions on this since many times it's >> distro dependent. That's why I suggest virsh - at least if it's >> supportable we can display the wwpn there using nodedev-dumpxml. > > Hrm... nodedev-dumpxml is new to me, neat! I'll go down that bunny > trail, but it seems like it'd be a bit of work to add the smarts for this. > Hours of all sorts of fun The 'nodedev-{list|dumpxml}' are essentially the libvirt mechanism to list/describe the various rabbit holes that is/are the sysfs tree and especially branches. The nodedev-list --tree will probably be of most interest, but so is virsh nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net} >> >> What/where is the 'vhost_scsi_wwpn'? IOW: It's not something defined >> yet - I'm assuming there's some output somewhere where that's pulled >> from. > > As near as I can tell, it's not anything that can be autogenerated off > the bat. Somebody somewhere needs to first establish the sysfs entries > for the vhost target, which can then be stored and loaded on a next > (host) boot. The end result looks something like this: > AFAIK systemd handles the sysfs creation (ok at least for my purposes). This is what I was concerned about - we really need a way to tell someone how to "easily" find the wwpn that's to be used. Although I don't have vhost-scsi on my laptop ;-) - I have to figure that by using 'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that is an HBA will show a that path would then be a starting point... Googling "scsi_host HBA wwn" shows me some interesting results including ones
Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
On 09/12/2016 05:34 PM, John Ferlan wrote: On 09/06/2016 08:58 AM, Eric Farman wrote: We already have a "scsi" hostdev type, which refers to a single LUN that is passed through to a guest. But what of things where multiple LUNs are passed through via a single SCSI HBA, such as with the vhost-scsi target? Create a new hostdev type that will carry this, and its associated documentation and XML schema information. I assume from the review of v1 this is both HBA and vHBA? That is a vHBA would be useful for the NPIV support to pass through the entire vHBA rather than individual LUN's (something recently asked for at KVM Forum). I wasn't at KVM Forum, so I'm in the dark here. :( The vHBA is created when one has a vport capable scsi_host (see http://wiki.libvirt.org/page/NPIV_in_libvirt). The "result" is a 'scsi_hostM' using a vport capable parent scsi_hostN. Now if a vHBA cannot (or should not) be passed through, then we'll need to be sure to test and avoid it. Unfortunately we had a lab issue and the system I normally use for my vHBA/NPIV work was affected - trying to get it resurected remotely... My experience with NPIV has been on s390, so I'm gonna plead some ignorance on NPIV for other platforms. Considering the lack of controls on the qemu side, I would imagine that it should work. Signed-off-by: Eric Farman--- docs/formatdomain.html.in | 24 docs/schemas/domaincommon.rng | 23 +++ src/conf/domain_audit.c | 2 ++ src/conf/domain_conf.c | 56 +++-- src/conf/domain_conf.h | 17 +++ src/qemu/qemu_domain_address.c | 10 +++ src/qemu/qemu_hotplug.c | 1 + src/security/security_dac.c | 2 ++ tests/domaincapsschemadata/full.xml | 1 + 9 files changed, 134 insertions(+), 2 deletions(-) I took a peek at the v1, but suffice to say didn't follow it that closely since there's a number of changes in v2 related to patch ordering. You could get 3 different opinions on that matter too! Hopefully a consensus before too long. :) Typically I'll try to add the qemu code first (capabilities first, then command in one patch and hotplug in the next one). Once that's in place, then I add the domain bits, docs, etc. Since you're extending the domain XML, you will need to add an xml2xml test in this patch. Use the XML from your patch 5 as a basis/guide. You'll need a "tests/qemuxml2argvdata/" file and a "tests/qemuxml2xmloutdata/" file as well as a changed to qemuxml2xmltest.c. If the data file is the same as the outdata file, then you can create a soft link to the data file in the outdata file directory. diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index feaeaa2..b79da95 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3655,6 +3655,17 @@ /devices ... +or: + + + ... + devices +hostdev mode='subsystem' type='scsi_host' + source protocol='vhost' wwpn='naa.50014057667280d8'/ +/hostdev + /devices + ... + A wwpn doesn't generally have the "naa." prefix on it. So, why not just add that to the command line instead? That is a wwpn is a 16 hex digit value. I'm assuming that in order to support this feature a "naa." is prefixed and sent to qemu. Does that necessarily mean some other hypervisor would choose to place the "naa." prefix or would then code need to be added to strip it off for those other hypervisors? Looking at the qemu code - it seems fairly specific. Fairly certain that it's more a vhost thing of which the hypervisor utilizes. Per a recommendation made on this list years ago, I used targetcli to do the configuration rather than direct manipulation of sysfs, and according to its man page: All WWNs are prefaced by their type, such as "iqn", "naa", or "ib", and may be given with or without colons. As to whether other hypervisors follow the same behavior, I do not know. I was just trying to conform to what it would allow in the case of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn" here since as you say that's 16 hex digits, and should probably say just "wwn" ? hostdev The hostdev element is the main container for describing @@ -3693,6 +3704,12 @@ If a disk lun in the domain already has the rawio capability, then this setting not required. + scsi_host + since 2.2.0For SCSI devices, user +is responsible to make sure the device is not used by host. This +type passes all LUNs presented by a single HBA to +the guest. + It'll be at least 2.3.0 Ugh, yeah. When I rebased after 2.2's release I forgot about the doc hits. My apologies. I see from the qemu code there's support for attributes 'num_queues', 'max_sectors', and 'cmd_per_lun' - since
Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
On 09/06/2016 08:58 AM, Eric Farman wrote: > We already have a "scsi" hostdev type, which refers to a single LUN > that is passed through to a guest. But what of things where multiple > LUNs are passed through via a single SCSI HBA, such as with the > vhost-scsi target? Create a new hostdev type that will carry > this, and its associated documentation and XML schema information. > I assume from the review of v1 this is both HBA and vHBA? That is a vHBA would be useful for the NPIV support to pass through the entire vHBA rather than individual LUN's (something recently asked for at KVM Forum). The vHBA is created when one has a vport capable scsi_host (see http://wiki.libvirt.org/page/NPIV_in_libvirt). The "result" is a 'scsi_hostM' using a vport capable parent scsi_hostN. Now if a vHBA cannot (or should not) be passed through, then we'll need to be sure to test and avoid it. Unfortunately we had a lab issue and the system I normally use for my vHBA/NPIV work was affected - trying to get it resurected remotely... > Signed-off-by: Eric Farman> --- > docs/formatdomain.html.in | 24 > docs/schemas/domaincommon.rng | 23 +++ > src/conf/domain_audit.c | 2 ++ > src/conf/domain_conf.c | 56 > +++-- > src/conf/domain_conf.h | 17 +++ > src/qemu/qemu_domain_address.c | 10 +++ > src/qemu/qemu_hotplug.c | 1 + > src/security/security_dac.c | 2 ++ > tests/domaincapsschemadata/full.xml | 1 + > 9 files changed, 134 insertions(+), 2 deletions(-) > I took a peek at the v1, but suffice to say didn't follow it that closely since there's a number of changes in v2 related to patch ordering. You could get 3 different opinions on that matter too! Typically I'll try to add the qemu code first (capabilities first, then command in one patch and hotplug in the next one). Once that's in place, then I add the domain bits, docs, etc. Since you're extending the domain XML, you will need to add an xml2xml test in this patch. Use the XML from your patch 5 as a basis/guide. You'll need a "tests/qemuxml2argvdata/" file and a "tests/qemuxml2xmloutdata/" file as well as a changed to qemuxml2xmltest.c. If the data file is the same as the outdata file, then you can create a soft link to the data file in the outdata file directory. > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index feaeaa2..b79da95 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -3655,6 +3655,17 @@ >/devices >... > > +or: > + > + > + ... > + devices > +hostdev mode='subsystem' type='scsi_host' > + source protocol='vhost' wwpn='naa.50014057667280d8'/ > +/hostdev > + /devices > + ... > + A wwpn doesn't generally have the "naa." prefix on it. So, why not just add that to the command line instead? That is a wwpn is a 16 hex digit value. I'm assuming that in order to support this feature a "naa." is prefixed and sent to qemu. Does that necessarily mean some other hypervisor would choose to place the "naa." prefix or would then code need to be added to strip it off for those other hypervisors? Looking at the qemu code - it seems fairly specific. > >hostdev >The hostdev element is the main container for > describing > @@ -3693,6 +3704,12 @@ > If a disk lun in the domain already has the rawio capability, > then this setting not required. > > + scsi_host > + since 2.2.0For SCSI devices, user > +is responsible to make sure the device is not used by host. This > +type passes all LUNs presented by a single HBA to > +the guest. > + It'll be at least 2.3.0 I see from the qemu code there's support for attributes 'num_queues', 'max_sectors', and 'cmd_per_lun' - since this is passing the scsi_host through and not a 'scsi' LUN with a 'scsi' controller, should those attributes be added as well? Not "required" for this version, but I would hope we could add caps and attributes for that as well because I can almost guarantee someone will ask. > > >Note: The managed attribute is only used with PCI > devices > @@ -3756,6 +3773,13 @@ > credentials to the iSCSI server. > > > + scsi_host > + Since 2.2.0, multiple LUNs behind a 2.3 > +single SCSI HBI are described by a protocol s/HBI/HBA > +attribute set to "vhost" and a wwpn attribute that > +is the vhost_scsi wwpn (16 hexadecimal digits with a prefix of > +"naa.") established in the host configfs. > + I don't think we mention the "naa." prefix... It would be nice to tell a user how to find that wwpn value via some command (virsh or otherwise) in order to fill in the wwpn attribute.