Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

2016-10-05 Thread Eric Farman



On 09/13/2016 05:48 PM, Eric Farman wrote:



On 09/13/2016 04:49 PM, John Ferlan wrote:



[...]


Thanks for the review.  (Silent ACK on a lot of the above comments.)
I'll try to get these all in place for a v3 with a little runway before
the next freeze.


OK - hopefully I won't be neck deep in some other firefight


Thanks for taking a few minutes away from those to look over this 
little pile.  I appreciate the fires that pop up and get in the way of 
the usual todo list.  :)


John,

I had hoped to get a v3 off to the list before the 2.3 freeze, but that 
obviously didn't happen.  FYI, I'm off on another tangent for a couple 
of weeks.  With any luck, I'll have this back to the list before 2.4.


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

2016-09-13 Thread Eric Farman



On 09/13/2016 04:49 PM, John Ferlan wrote:

[...]


I took a peek at the v1, but suffice to say didn't follow it that
closely since there's a number of changes in v2 related to patch
ordering. You could get 3 different opinions on that matter too!

Hopefully a consensus before too long.  :)


yeah - v1 was lots of little steps that you were asked to combine. OTOH
there are those that like to combine multiple things in one patch which
can make it really difficult to follow. The order here was "followable"
- my suggestion was more to try and get the guts (qemu) done first, then
the fluff (domain/xml).


Agreed.  v1 was perhaps overkill in breaking things down so small, 
because I was picking up an idea with a bit of dust on it.




Consider your patch order to be making progress towards some sustainable
feature/bug fix - each step along the way being able to pass "make check
syntax-check" so that future attempts to git bisect can point the fickle
finger of fate (at someone else!).  If a patch adds something that will
be external that a followup patch will use - then great. If it's
testable, even better. For example, adding virSCSIOpenVhost as a
separate patch with the tests/qemuxml2argvmock.c is fine. One more
tidbit here - it's a very faint memory, but check out
tests/commandtest.c and the lengthy comment in mymain() regarding
"expected" fd values. I think you're "safe" in what you've done -
although your mock should mimic whatever changes you make to the primary
function.

[...]


A wwpn doesn't generally have the "naa." prefix on it. So, why not just
add that to the command line instead? That is a wwpn is a 16 hex digit
value. I'm assuming that in order to support this feature a "naa." is
prefixed and sent to qemu. Does that necessarily mean some other
hypervisor would choose to place the "naa." prefix or would then code
need to be added to strip it off for those other hypervisors?  Looking
at the qemu code - it seems fairly specific.

Fairly certain that it's more a vhost thing of which the hypervisor
utilizes.  Per a recommendation made on this list years ago, I used
targetcli to do the configuration rather than direct manipulation of
sysfs, and according to its man page:

 All WWNs are prefaced by their type, such as "iqn", "naa", or "ib",
and may be given with or without colons.


I don't use targetcli so I'm not familiar with it.

Still not convinced using naa.# in the XML is the best way to go, but
I'm not against it. I guess a lot depends on how the sysfs views things.


As to whether other hypervisors follow the same behavior, I do not
know.  I was just trying to conform to what it would allow in the case
of vhost-scsi.  Now, I suppose I'm confusing matters by saying "wwpn"
here since as you say that's 16 hex digits, and should probably say just
"wwn"  ?


I guess I've always thought of a wwn/wwpn/wwnn as that string of
bytes...


I always thought WWPN and WWNN as types of WWN...


Usually WWN/WWNN have been interchangeable, while WWPN is a
WWN assigned to a port in a Fibre Channel fabric


...but have seen it this way also.  Either way makes sense to me.



FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
will list a wwnn, wwpn, and fabric_wwn.  The fabric_wwn's will match.
The wwnn/wwpn for the vport is what was used when running the
HBA's/vport_create function.  It's a tangled web.  Wait until vGPU's
show up (that's another black magic trick).


[...]


It would be nice to tell a user how to find that wwpn value via some
command (virsh or otherwise) in order to fill in the wwpn attribute.
Although there's always varying opinions on this since many times it's
distro dependent. That's why I suggest virsh - at least if it's
supportable we can display the wwpn there using nodedev-dumpxml.

Hrm...  nodedev-dumpxml is new to me, neat!  I'll go down that bunny
trail, but it seems like it'd be a bit of work to add the smarts for this.


Hours of all sorts of fun The 'nodedev-{list|dumpxml}' are
essentially the libvirt mechanism to list/describe the various rabbit
holes that is/are the sysfs tree and especially branches. The
nodedev-list --tree will probably be of most interest, but so is virsh
nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}


What/where is the 'vhost_scsi_wwpn'?  IOW: It's not something defined
yet - I'm assuming there's some output somewhere where that's pulled
from.

As near as I can tell, it's not anything that can be autogenerated off
the bat.  Somebody somewhere needs to first establish the sysfs entries
for the vhost target, which can then be stored and loaded on a next
(host) boot.  The end result looks something like this:


AFAIK systemd handles the sysfs creation (ok at least for my purposes).

This is what I was concerned about - we really need a way to tell
someone how to "easily" find the wwpn that's to be used. Although I
don't have vhost-scsi on my laptop ;-)


Me neither.  Of course, I'm using an s390, so that's a 

Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

2016-09-13 Thread John Ferlan
[...]

>>>
>> I took a peek at the v1, but suffice to say didn't follow it that
>> closely since there's a number of changes in v2 related to patch
>> ordering. You could get 3 different opinions on that matter too!
> 
> Hopefully a consensus before too long.  :)
> 

yeah - v1 was lots of little steps that you were asked to combine. OTOH
there are those that like to combine multiple things in one patch which
can make it really difficult to follow. The order here was "followable"
- my suggestion was more to try and get the guts (qemu) done first, then
the fluff (domain/xml).

Consider your patch order to be making progress towards some sustainable
feature/bug fix - each step along the way being able to pass "make check
syntax-check" so that future attempts to git bisect can point the fickle
finger of fate (at someone else!).  If a patch adds something that will
be external that a followup patch will use - then great. If it's
testable, even better. For example, adding virSCSIOpenVhost as a
separate patch with the tests/qemuxml2argvmock.c is fine. One more
tidbit here - it's a very faint memory, but check out
tests/commandtest.c and the lengthy comment in mymain() regarding
"expected" fd values. I think you're "safe" in what you've done -
although your mock should mimic whatever changes you make to the primary
function.

[...]

>> A wwpn doesn't generally have the "naa." prefix on it. So, why not just
>> add that to the command line instead? That is a wwpn is a 16 hex digit
>> value. I'm assuming that in order to support this feature a "naa." is
>> prefixed and sent to qemu. Does that necessarily mean some other
>> hypervisor would choose to place the "naa." prefix or would then code
>> need to be added to strip it off for those other hypervisors?  Looking
>> at the qemu code - it seems fairly specific.
> 
> Fairly certain that it's more a vhost thing of which the hypervisor
> utilizes.  Per a recommendation made on this list years ago, I used
> targetcli to do the configuration rather than direct manipulation of
> sysfs, and according to its man page:
> 
> All WWNs are prefaced by their type, such as "iqn", "naa", or "ib",
> and may be given with or without colons.
> 

I don't use targetcli so I'm not familiar with it.

Still not convinced using naa.# in the XML is the best way to go, but
I'm not against it. I guess a lot depends on how the sysfs views things.

> As to whether other hypervisors follow the same behavior, I do not
> know.  I was just trying to conform to what it would allow in the case
> of vhost-scsi.  Now, I suppose I'm confusing matters by saying "wwpn"
> here since as you say that's 16 hex digits, and should probably say just
> "wwn"  ?
> 

I guess I've always thought of a wwn/wwpn/wwnn as that string of
bytes...  Usually WWN/WWNN have been interchangeable, while WWPN is a
WWN assigned to a port in a Fibre Channel fabric

FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
will list a wwnn, wwpn, and fabric_wwn.  The fabric_wwn's will match.
The wwnn/wwpn for the vport is what was used when running the
HBA's/vport_create function.  It's a tangled web.  Wait until vGPU's
show up (that's another black magic trick).


[...]

>> It would be nice to tell a user how to find that wwpn value via some
>> command (virsh or otherwise) in order to fill in the wwpn attribute.
>> Although there's always varying opinions on this since many times it's
>> distro dependent. That's why I suggest virsh - at least if it's
>> supportable we can display the wwpn there using nodedev-dumpxml.
> 
> Hrm...  nodedev-dumpxml is new to me, neat!  I'll go down that bunny
> trail, but it seems like it'd be a bit of work to add the smarts for this.
> 

Hours of all sorts of fun The 'nodedev-{list|dumpxml}' are
essentially the libvirt mechanism to list/describe the various rabbit
holes that is/are the sysfs tree and especially branches. The
nodedev-list --tree will probably be of most interest, but so is virsh
nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}

>>
>> What/where is the 'vhost_scsi_wwpn'?  IOW: It's not something defined
>> yet - I'm assuming there's some output somewhere where that's pulled
>> from.
> 
> As near as I can tell, it's not anything that can be autogenerated off
> the bat.  Somebody somewhere needs to first establish the sysfs entries
> for the vhost target, which can then be stored and loaded on a next
> (host) boot.  The end result looks something like this:
> 

AFAIK systemd handles the sysfs creation (ok at least for my purposes).

This is what I was concerned about - we really need a way to tell
someone how to "easily" find the wwpn that's to be used. Although I
don't have vhost-scsi on my laptop ;-) - I have to figure that by using
'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that
is an HBA will show a  that path would then be a starting point...

Googling "scsi_host HBA wwn" shows me some interesting results including
ones 

Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

2016-09-13 Thread Eric Farman



On 09/12/2016 05:34 PM, John Ferlan wrote:

On 09/06/2016 08:58 AM, Eric Farman wrote:

We already have a "scsi" hostdev type, which refers to a single LUN
that is passed through to a guest.  But what of things where multiple
LUNs are passed through via a single SCSI HBA, such as with the
vhost-scsi target?  Create a new hostdev type that will carry
this, and its associated documentation and XML schema information.


I assume from the review of v1 this is both HBA and vHBA?  That is a
vHBA would be useful for the NPIV support to pass through the entire
vHBA rather than individual LUN's (something recently asked for at KVM
Forum).


I wasn't at KVM Forum, so I'm in the dark here.  :(


The vHBA is created when one has a vport capable scsi_host (see
http://wiki.libvirt.org/page/NPIV_in_libvirt). The "result" is a
'scsi_hostM' using a vport capable parent scsi_hostN.  Now if a vHBA
cannot (or should not) be passed through, then we'll need to be sure to
test and avoid it.  Unfortunately we had a lab issue and the system I
normally use for my vHBA/NPIV work was affected - trying to get it
resurected remotely...


My experience with NPIV has been on s390, so I'm gonna plead some 
ignorance on NPIV for other platforms.  Considering the lack of controls 
on the qemu side, I would imagine that it should work.





Signed-off-by: Eric Farman 
---
  docs/formatdomain.html.in   | 24 
  docs/schemas/domaincommon.rng   | 23 +++
  src/conf/domain_audit.c |  2 ++
  src/conf/domain_conf.c  | 56 +++--
  src/conf/domain_conf.h  | 17 +++
  src/qemu/qemu_domain_address.c  | 10 +++
  src/qemu/qemu_hotplug.c |  1 +
  src/security/security_dac.c |  2 ++
  tests/domaincapsschemadata/full.xml |  1 +
  9 files changed, 134 insertions(+), 2 deletions(-)


I took a peek at the v1, but suffice to say didn't follow it that
closely since there's a number of changes in v2 related to patch
ordering. You could get 3 different opinions on that matter too!


Hopefully a consensus before too long.  :)


Typically I'll try to add the qemu code first (capabilities first, then
command in one patch and hotplug in the next one). Once that's in place,
then I add the domain bits, docs, etc.

Since you're extending the domain XML, you will need to add an xml2xml
test in this patch. Use the XML from your patch 5 as a basis/guide.
You'll need a "tests/qemuxml2argvdata/" file and a
"tests/qemuxml2xmloutdata/" file as well as a changed to
qemuxml2xmltest.c. If the data file is the same as the outdata file,
then you can create a soft link to the data file in the outdata file
directory.



diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index feaeaa2..b79da95 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -3655,6 +3655,17 @@
/devices
...
  
+or:

+
+
+  ...
+  devices
+hostdev mode='subsystem' type='scsi_host'
+  source protocol='vhost' wwpn='naa.50014057667280d8'/
+/hostdev
+  /devices
+  ...
+

A wwpn doesn't generally have the "naa." prefix on it. So, why not just
add that to the command line instead? That is a wwpn is a 16 hex digit
value. I'm assuming that in order to support this feature a "naa." is
prefixed and sent to qemu. Does that necessarily mean some other
hypervisor would choose to place the "naa." prefix or would then code
need to be added to strip it off for those other hypervisors?  Looking
at the qemu code - it seems fairly specific.


Fairly certain that it's more a vhost thing of which the hypervisor 
utilizes.  Per a recommendation made on this list years ago, I used 
targetcli to do the configuration rather than direct manipulation of 
sysfs, and according to its man page:


All WWNs are prefaced by their type, such as "iqn", "naa", or "ib", 
and may be given with or without colons.


As to whether other hypervisors follow the same behavior, I do not 
know.  I was just trying to conform to what it would allow in the case 
of vhost-scsi.  Now, I suppose I'm confusing matters by saying "wwpn" 
here since as you say that's 16 hex digits, and should probably say just 
"wwn"  ?






  
hostdev
The hostdev element is the main container for 
describing
@@ -3693,6 +3704,12 @@
  If a disk lun in the domain already has the rawio capability,
  then this setting not required.

+  scsi_host
+  since 2.2.0For SCSI devices, user
+is responsible to make sure the device is not used by host. This
+type passes all LUNs presented by a single HBA to
+the guest.
+  

It'll be at least 2.3.0


Ugh, yeah.  When I rebased after 2.2's release I forgot about the doc 
hits.  My apologies.




I see from the qemu code there's support for attributes 'num_queues',
'max_sectors', and 'cmd_per_lun' - since 

Re: [libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

2016-09-12 Thread John Ferlan

On 09/06/2016 08:58 AM, Eric Farman wrote:
> We already have a "scsi" hostdev type, which refers to a single LUN
> that is passed through to a guest.  But what of things where multiple
> LUNs are passed through via a single SCSI HBA, such as with the
> vhost-scsi target?  Create a new hostdev type that will carry
> this, and its associated documentation and XML schema information.
> 

I assume from the review of v1 this is both HBA and vHBA?  That is a
vHBA would be useful for the NPIV support to pass through the entire
vHBA rather than individual LUN's (something recently asked for at KVM
Forum). The vHBA is created when one has a vport capable scsi_host (see
http://wiki.libvirt.org/page/NPIV_in_libvirt). The "result" is a
'scsi_hostM' using a vport capable parent scsi_hostN.  Now if a vHBA
cannot (or should not) be passed through, then we'll need to be sure to
test and avoid it.  Unfortunately we had a lab issue and the system I
normally use for my vHBA/NPIV work was affected - trying to get it
resurected remotely...

> Signed-off-by: Eric Farman 
> ---
>  docs/formatdomain.html.in   | 24 
>  docs/schemas/domaincommon.rng   | 23 +++
>  src/conf/domain_audit.c |  2 ++
>  src/conf/domain_conf.c  | 56 
> +++--
>  src/conf/domain_conf.h  | 17 +++
>  src/qemu/qemu_domain_address.c  | 10 +++
>  src/qemu/qemu_hotplug.c |  1 +
>  src/security/security_dac.c |  2 ++
>  tests/domaincapsschemadata/full.xml |  1 +
>  9 files changed, 134 insertions(+), 2 deletions(-)
> 

I took a peek at the v1, but suffice to say didn't follow it that
closely since there's a number of changes in v2 related to patch
ordering. You could get 3 different opinions on that matter too!
Typically I'll try to add the qemu code first (capabilities first, then
command in one patch and hotplug in the next one). Once that's in place,
then I add the domain bits, docs, etc.

Since you're extending the domain XML, you will need to add an xml2xml
test in this patch. Use the XML from your patch 5 as a basis/guide.
You'll need a "tests/qemuxml2argvdata/" file and a
"tests/qemuxml2xmloutdata/" file as well as a changed to
qemuxml2xmltest.c. If the data file is the same as the outdata file,
then you can create a soft link to the data file in the outdata file
directory.


> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index feaeaa2..b79da95 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -3655,6 +3655,17 @@
>/devices
>...
>  
> +or:
> +
> +
> +  ...
> +  devices
> +hostdev mode='subsystem' type='scsi_host'
> +  source protocol='vhost' wwpn='naa.50014057667280d8'/
> +/hostdev
> +  /devices
> +  ...
> +

A wwpn doesn't generally have the "naa." prefix on it. So, why not just
add that to the command line instead? That is a wwpn is a 16 hex digit
value. I'm assuming that in order to support this feature a "naa." is
prefixed and sent to qemu. Does that necessarily mean some other
hypervisor would choose to place the "naa." prefix or would then code
need to be added to strip it off for those other hypervisors?  Looking
at the qemu code - it seems fairly specific.


>  
>hostdev
>The hostdev element is the main container for 
> describing
> @@ -3693,6 +3704,12 @@
>  If a disk lun in the domain already has the rawio capability,
>  then this setting not required.
>
> +  scsi_host
> +  since 2.2.0For SCSI devices, user
> +is responsible to make sure the device is not used by host. This
> +type passes all LUNs presented by a single HBA to
> +the guest.
> +  

It'll be at least 2.3.0

I see from the qemu code there's support for attributes 'num_queues',
'max_sectors', and 'cmd_per_lun' - since this is passing the scsi_host
through and not a 'scsi' LUN with a 'scsi' controller, should those
attributes be added as well? Not "required" for this version, but I
would hope we could add caps and attributes for that as well because I
can almost guarantee someone will ask.

>  
>  
>Note: The managed attribute is only used with PCI 
> devices
> @@ -3756,6 +3773,13 @@
>  credentials to the iSCSI server.
>  
>
> +  scsi_host
> +  Since 2.2.0, multiple LUNs behind a

2.3

> +single SCSI HBI are described by a protocol

s/HBI/HBA

> +attribute set to "vhost" and a wwpn attribute that
> +is the vhost_scsi wwpn (16 hexadecimal digits with a prefix of
> +"naa.") established in the host configfs.
> +  

I don't think we mention the "naa." prefix...

It would be nice to tell a user how to find that wwpn value via some
command (virsh or otherwise) in order to fill in the wwpn attribute.