Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-29 Thread Alexey Kardashevskiy
On 05/29/2014 07:58 AM, Benjamin Herrenschmidt wrote:
 On Wed, 2014-05-28 at 22:49 +1000, Gavin Shan wrote:

 I will remove those address related macros in next revision because it's
 user-level bussiness, not related to host kernel any more.

 If the user is QEMU + guest, we need the address to identify the PE though 
 PHB
 BUID could be used as same purpose to get PHB, which is one-to-one mapping 
 with
 IOMMU group on sPAPR platform. However, once the PE address is built and 
 returned
 to guest, guest will use the PE address as input parameter in subsequent RTAS
 calls.

 If the user is some kind of application who just uses the ioctl() without 
 supporting
 RTAS calls. We don't need care about PE address. 
 
 I am a bit reluctant with that PE==PHB equation we seem to be introducing.

 This isn't the case in HW.

It is pseries, not real HW. Does phyp allow multiple real host PEs on the
same virtual PHB?


 It's possible that this is how we handle VFIO *today*
 in qemu but it doesn't have to be does it ?
 
 It also paints us into a corner if we want to start implementing some kind of
 emulated EEH for selected emulated devices and/or virtio.




-- 
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-29 Thread Benjamin Herrenschmidt
On Fri, 2014-05-30 at 13:44 +1000, Alexey Kardashevskiy wrote:
 
 It is pseries, not real HW. Does phyp allow multiple real host PEs on
 the same virtual PHB?

I don't know how recent pHyp versions do it. I think they pretty much
have a PE == 1 virtual PHB so you might be right there.

On older HW they definitely used P2P bridges as PEs.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 28.05.14 02:57, Alex Williamson wrote:

On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:

On 28.05.14 02:39, Alex Williamson wrote:

On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:

On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
Documentation/vfio.txt  | 92 
-
drivers/vfio/pci/Makefile   |  1 +
drivers/vfio/pci/vfio_pci.c | 20 +---
drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
drivers/vfio/pci/vfio_pci_private.h |  5 ++
drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
include/uapi/linux/vfio.h   | 66 ++
7 files changed, 308 insertions(+), 7 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


+
+   return ret;
+}
+
static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
{
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(container-lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?

I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,

Yes, that's the point. I by far prefer to have you push back on anyone
who introduces useless ioctls rather than have a separate EEH number
space that people can just throw anything in they like ;).

Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,


Sure, that's what patch review is about. I'm really more concerned about 
whose court the number space is in - you or Gavin. If we're talking 
about top level ioctls you will care a lot more.


But I'm not religious about this. You're the VFIO maintainer, so it's 
your call. I just personally cringe when I see an ioctl that gets an 
opcode and a parameter argument where the parameter argument is a 
union with one struct for each opcode.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 28.05.14 02:55, Gavin Shan wrote:

On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  Documentation/vfio.txt  | 92 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c


[...]


diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..c5fac36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,72 @@ struct vfio_iommu_spapr_tce_info {
  
  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
  
+/*

+ * EEH functionality can be enabled or disabled on one specific device.
+ * Also, the DMA or IO frozen state can be removed from the frozen PE
+ * if required.
+ */
+struct vfio_eeh_pe_set_option {
+   __u32 argsz;
+   __u32 flags;
+   __u32 option;
+#define VFIO_EEH_PE_SET_OPT_DISABLE0   /* Disable EEH  */
+#define VFIO_EEH_PE_SET_OPT_ENABLE 1   /* Enable EEH   */
+#define VFIO_EEH_PE_SET_OPT_IO 2   /* Enable IO*/
+#define VFIO_EEH_PE_SET_OPT_DMA3   /* Enable DMA   */

This is more of a command than an option isn't it?  Each of these
probably needs a more significant description.


Yeah, it would be regarded as opcode and I'll add more description about
them in next revision.


Please just call them commands.




+};
+
+#define VFIO_EEH_PE_SET_OPTION _IO(VFIO_TYPE, VFIO_BASE + 21)
+
+/*
+ * Each EEH PE should have unique address to be identified. PE's
+ * sharing mode is also useful information as well.
+ */
+#define VFIO_EEH_PE_GET_ADDRESS0   /* Get address  */
+#define VFIO_EEH_PE_GET_MODE   1   /* Query mode   */
+#define VFIO_EEH_PE_MODE_NONE  0   /* Not a PE */
+#define VFIO_EEH_PE_MODE_NOT_SHARED1   /* Exclusive*/
+#define VFIO_EEH_PE_MODE_SHARED2   /* Shared mode  */
+
+/*
+ * EEH PE might have been frozen because of PCI errors. Also, it might
+ * be experiencing reset for error revoery. The following command helps
+ * to get the state.
+ */
+struct vfio_eeh_pe_get_state {
+   __u32 argsz;
+   __u32 flags;
+   __u32 state;
+};

Should state be a union to better describe the value returned?  What
exactly is the address and why does the user need to know it?  Does this
need user input or could we just return the address and mode regardless?


Ok. I think you want enum (not union) for state. I'll have macros for the
state in next revision as I did that for other cases.

Those macros defined for address just for ABI stuff as Alex.G mentioned.
There isn't corresponding ioctl command for host to get address any more
because QEMU (user) will have to figure it out by himself. The address
here means PE address and user has to figure it out according to PE
segmentation.


Why would the user ever need the address?


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Gavin Shan
On Wed, May 28, 2014 at 01:41:35PM +0200, Alexander Graf wrote:

On 28.05.14 02:55, Gavin Shan wrote:
On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  Documentation/vfio.txt  | 92 
 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 
 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..c5fac36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,72 @@ struct vfio_iommu_spapr_tce_info {
  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
+/*
+ * EEH functionality can be enabled or disabled on one specific device.
+ * Also, the DMA or IO frozen state can be removed from the frozen PE
+ * if required.
+ */
+struct vfio_eeh_pe_set_option {
+   __u32 argsz;
+   __u32 flags;
+   __u32 option;
+#define VFIO_EEH_PE_SET_OPT_DISABLE0   /* Disable EEH  */
+#define VFIO_EEH_PE_SET_OPT_ENABLE 1   /* Enable EEH   */
+#define VFIO_EEH_PE_SET_OPT_IO 2   /* Enable IO*/
+#define VFIO_EEH_PE_SET_OPT_DMA3   /* Enable DMA   */
This is more of a command than an option isn't it?  Each of these
probably needs a more significant description.

Yeah, it would be regarded as opcode and I'll add more description about
them in next revision.

Please just call them commands.


Ok. I guess you want me to change the macro names like this ?

#define VFIO_EEH_CMD_DISABLE0   /* Disable EEH functionality*/
#define VFIO_EEH_CMD_ENABLE 1   /* Enable EEH functionality */
#define VFIO_EEH_CMD_ENABLE_IO  2   /* Enable IO for frozen PE  */
#define VFIO_EEH_CMD_ENABLE_DMA 3   /* Enable DMA for frozen PE */


+};
+
+#define VFIO_EEH_PE_SET_OPTION _IO(VFIO_TYPE, VFIO_BASE + 21)
+
+/*
+ * Each EEH PE should have unique address to be identified. PE's
+ * sharing mode is also useful information as well.
+ */
+#define VFIO_EEH_PE_GET_ADDRESS0   /* Get address  */
+#define VFIO_EEH_PE_GET_MODE   1   /* Query mode   */
+#define VFIO_EEH_PE_MODE_NONE  0   /* Not a PE */
+#define VFIO_EEH_PE_MODE_NOT_SHARED1   /* Exclusive*/
+#define VFIO_EEH_PE_MODE_SHARED2   /* Shared mode  */
+
+/*
+ * EEH PE might have been frozen because of PCI errors. Also, it might
+ * be experiencing reset for error revoery. The following command helps
+ * to get the state.
+ */
+struct vfio_eeh_pe_get_state {
+   __u32 argsz;
+   __u32 flags;
+   __u32 state;
+};
Should state be a union to better describe the value returned?  What
exactly is the address and why does the user need to know it?  Does this
need user input or could we just return the address and mode regardless?

Ok. I think you want enum (not union) for state. I'll have macros for the
state in next revision as I did that for other cases.

Those macros defined for address just for ABI stuff as Alex.G mentioned.
There isn't corresponding ioctl command for host to get address any more
because QEMU (user) will have to figure it out by himself. The address
here means PE address and user has to figure it out according to PE
segmentation.

Why would the user ever need the address?


I will remove those address related macros in next revision because it's
user-level bussiness, not related to host kernel any more.

If the user is QEMU + guest, we need the address to identify the PE though PHB
BUID could be used as same purpose to get PHB, which is one-to-one mapping with
IOMMU group on sPAPR platform. However, once the PE address is built and 
returned
to guest, guest will use the PE address as input parameter in subsequent RTAS
calls.

If the user is some kind of application who just uses the ioctl() without 
supporting
RTAS calls. We don't need care about PE address. 

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 28.05.14 14:49, Gavin Shan wrote:

On Wed, May 28, 2014 at 01:41:35PM +0200, Alexander Graf wrote:

On 28.05.14 02:55, Gavin Shan wrote:

On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  Documentation/vfio.txt  | 92 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..c5fac36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,72 @@ struct vfio_iommu_spapr_tce_info {
  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
+/*
+ * EEH functionality can be enabled or disabled on one specific device.
+ * Also, the DMA or IO frozen state can be removed from the frozen PE
+ * if required.
+ */
+struct vfio_eeh_pe_set_option {
+   __u32 argsz;
+   __u32 flags;
+   __u32 option;
+#define VFIO_EEH_PE_SET_OPT_DISABLE0   /* Disable EEH  */
+#define VFIO_EEH_PE_SET_OPT_ENABLE 1   /* Enable EEH   */
+#define VFIO_EEH_PE_SET_OPT_IO 2   /* Enable IO*/
+#define VFIO_EEH_PE_SET_OPT_DMA3   /* Enable DMA   */

This is more of a command than an option isn't it?  Each of these
probably needs a more significant description.


Yeah, it would be regarded as opcode and I'll add more description about
them in next revision.

Please just call them commands.


Ok. I guess you want me to change the macro names like this ?

#define VFIO_EEH_CMD_DISABLE0   /* Disable EEH functionality*/
#define VFIO_EEH_CMD_ENABLE 1   /* Enable EEH functionality */
#define VFIO_EEH_CMD_ENABLE_IO  2   /* Enable IO for frozen PE  */
#define VFIO_EEH_CMD_ENABLE_DMA 3   /* Enable DMA for frozen PE */


Yes, the ioctl name too.




+};
+
+#define VFIO_EEH_PE_SET_OPTION _IO(VFIO_TYPE, VFIO_BASE + 21)
+
+/*
+ * Each EEH PE should have unique address to be identified. PE's
+ * sharing mode is also useful information as well.
+ */
+#define VFIO_EEH_PE_GET_ADDRESS0   /* Get address  */
+#define VFIO_EEH_PE_GET_MODE   1   /* Query mode   */
+#define VFIO_EEH_PE_MODE_NONE  0   /* Not a PE */
+#define VFIO_EEH_PE_MODE_NOT_SHARED1   /* Exclusive*/
+#define VFIO_EEH_PE_MODE_SHARED2   /* Shared mode  */
+
+/*
+ * EEH PE might have been frozen because of PCI errors. Also, it might
+ * be experiencing reset for error revoery. The following command helps
+ * to get the state.
+ */
+struct vfio_eeh_pe_get_state {
+   __u32 argsz;
+   __u32 flags;
+   __u32 state;
+};

Should state be a union to better describe the value returned?  What
exactly is the address and why does the user need to know it?  Does this
need user input or could we just return the address and mode regardless?


Ok. I think you want enum (not union) for state. I'll have macros for the
state in next revision as I did that for other cases.

Those macros defined for address just for ABI stuff as Alex.G mentioned.
There isn't corresponding ioctl command for host to get address any more
because QEMU (user) will have to figure it out by himself. The address
here means PE address and user has to figure it out according to PE
segmentation.

Why would the user ever need the address?


I will remove those address related macros in next revision because it's
user-level bussiness, not related to host kernel any more.


Ok, so the next question is whether there will be any state outside of 
GET_MODE left in the future.



Alex


If the user is QEMU + guest, we need the address to identify the PE though PHB
BUID could be used as same purpose to get PHB, which is one-to-one mapping with
IOMMU group on sPAPR platform. However, once the PE address is built and 
returned
to guest, guest will use the PE address as input parameter in subsequent RTAS
calls.

If the user is some kind of application who just uses the ioctl() without 
supporting
RTAS calls. We don't need care about PE address.

Thanks,
Gavin

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alex Williamson
On Wed, 2014-05-28 at 13:37 +0200, Alexander Graf wrote:
 On 28.05.14 02:57, Alex Williamson wrote:
  On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:
  On 28.05.14 02:39, Alex Williamson wrote:
  On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
  On 27.05.14 20:15, Alex Williamson wrote:
  On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
  The patch adds new IOCTL commands for sPAPR VFIO container device
  to support EEH functionality for PCI devices, which have been passed
  through from host to somebody else via VFIO.
 
  Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
  Documentation/vfio.txt  | 92 
  -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 
  ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
  [...]
 
  +
  +  return ret;
  +}
  +
  static long tce_iommu_ioctl(void *iommu_data,
  unsigned int cmd, unsigned long arg)
  {
  @@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
 tce_iommu_disable(container);
 mutex_unlock(container-lock);
 return 0;
  +  case VFIO_EEH_PE_SET_OPTION:
  +  case VFIO_EEH_PE_GET_STATE:
  +  case VFIO_EEH_PE_RESET:
  +  case VFIO_EEH_PE_CONFIGURE:
  +  return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
  This is where it would have really made sense to have a single
  VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
  AlexG, are you really attached to splitting these out into separate
  ioctls?
  I don't see the problem. We need to forward 4 ioctls to a separate piece
  of code, so we forward 4 ioctls to a separate piece of code :). Putting
  them into one ioctl just moves the switch() into another function.
  And uses an extra 3 ioctl numbers and gives us extra things to update if
  we ever need to add more ioctls, etc.  ioctl numbers are an address
  space, how much address space do we really want to give to EEH?  It's
  not a big difference, but I don't think it's completely even either.
  Thanks,
  Yes, that's the point. I by far prefer to have you push back on anyone
  who introduces useless ioctls rather than have a separate EEH number
  space that people can just throw anything in they like ;).
  Well, I appreciate that, but having them as separate ioctls doesn't
  really prevent that either.  Any one of these 4 could be set to take a
  sub-option to extend and contort the EEH interface.  The only way to
  prevent that would be to avoid the argsz+flags hack that make the ioctl
  extendable.  Thanks,
 
 Sure, that's what patch review is about. I'm really more concerned about 
 whose court the number space is in - you or Gavin. If we're talking 
 about top level ioctls you will care a lot more.
 
 But I'm not religious about this. You're the VFIO maintainer, so it's 
 your call. I just personally cringe when I see an ioctl that gets an 
 opcode and a parameter argument where the parameter argument is a 
 union with one struct for each opcode.

Well, what would it look like...

struct vfio_eeh_pe_op {
__u32 argsz;
__u32 flags;
__u32 op;
};

Couldn't every single one of these be a separate op?  Are there any
cases where we can't use the ioctl return value?

VFIO_EEH_PE_DISABLE
VFIO_EEH_PE_ENABLE
VFIO_EEH_PE_UNFREEZE_IO
VFIO_EEH_PE_UNFREEZE_DMA
VFIO_EEH_PE_GET_MODE
VFIO_EEH_PE_RESET_DEACTIVATE
VFIO_EEH_PE_RESET_HOT
VFIO_EEH_PE_RESET_FUNDAMENTAL
VFIO_EEH_PE_CONFIGURE

It doesn't look that bad to me, what am I missing?  Thanks,

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alex Williamson
On Wed, 2014-05-28 at 10:55 +1000, Gavin Shan wrote:
 On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
 On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
  The patch adds new IOCTL commands for sPAPR VFIO container device
  to support EEH functionality for PCI devices, which have been passed
  through from host to somebody else via VFIO.
  
  Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
   Documentation/vfio.txt  | 92 
  -
   drivers/vfio/pci/Makefile   |  1 +
   drivers/vfio/pci/vfio_pci.c | 20 +---
   drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
   drivers/vfio/pci/vfio_pci_private.h |  5 ++
   drivers/vfio/vfio_iommu_spapr_tce.c | 85 
  ++
   include/uapi/linux/vfio.h   | 66 ++
   7 files changed, 308 insertions(+), 7 deletions(-)
   create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
  
  diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
  index b9ca023..d890fed 100644
  --- a/Documentation/vfio.txt
  +++ b/Documentation/vfio.txt
  @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented 
  in real mode which provides
   an excellent performance which has limitations such as inability to do
   locked pages accounting in real time.
   
  -So 3 additional ioctls have been added:
  +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an 
  I/O
  +subtree that can be treated as a unit for the purposes of partitioning and
  +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
  +function of a multi-function IOA, or multiple IOAs (possibly including 
  switch
  +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
  errors
  +and recover from them via EEH RTAS services, which works on the basis of
  +additional ioctl commands.
  +
  +So 7 additional ioctls have been added:
   
 VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
 of the DMA window on the PCI bus.
  @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
   
 VFIO_IOMMU_DISABLE - disables the container.
   
  +  VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
  +  specified device. Also, it can be used to remove IO or DMA
  +  stopped state on the frozen PE.
  +
  +  VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
  +
  +  VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
  +  error recovering.
  +
  +  VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
  +  one of the major steps for error recoverying.
   
   The code flow from the example above should be slightly changed:
   
  @@ -346,6 +365,77 @@ The code flow from the example above should be 
  slightly changed:
 ioctl(container, VFIO_IOMMU_MAP_DMA, dma_map);
 .
   
  +Based on the initial example we have, the following piece of code could be
  +reference for EEH setup and error handling:
  +
  +  struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
  +  struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
  +  struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
  +  struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
  +
  +  
  +
  +  /* Get a file descriptor for the device */
  +  device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, :06:0d.0);
  +
  +  /* Enable the EEH functionality on the device */
  +  option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
  +  ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
  +
  +  /* You're suggested to create additional data struct to represent
  +   * PE, and put child devices belonging to same IOMMU group to the
  +   * PE instance for later reference.
  +   */
  +
  +  /* Check the PE's state and make sure it's in functional state */
  +  ioctl(container, VFIO_EEH_PE_GET_STATE, state);
  +
  +  /* Save device's state. pci_save_state() would be good enough
  +   * as an example.
  +   */
  +
  +  /* Test and setup the device */
  +  ioctl(device, VFIO_DEVICE_GET_INFO, device_info);
  +
  +  
  +
  +  /* When 0xFF's returned from reading PCI config space or IO BARs
  +   * of the PCI device. Check the PE state to see if that has been
  +   * frozen.
  +   */
  +  ioctl(container, VFIO_EEH_PE_GET_STATE, state);
  +
  +  /* Waiting for pending PCI transactions to be completed and don't
  +   * produce any more PCI traffic from/to the affected PE until
  +   * recovery is finished.
  +   */
  +
  +  /* Enable IO for the affected PE and collect logs. Usually, the
  +   * standard part of PCI config space, AER registers are dumped
  +   * as logs for further analysis.
  +   */
  +  option.option = VFIO_EEH_PE_SET_OPT_IO;
  +  ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
  +
  +  /* Issue PE reset */
  +  reset.option = VFIO_EEH_PE_RESET_HOT;
  +  ioctl(container, 

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Benjamin Herrenschmidt
On Wed, 2014-05-28 at 22:49 +1000, Gavin Shan wrote:
 
 I will remove those address related macros in next revision because it's
 user-level bussiness, not related to host kernel any more.
 
 If the user is QEMU + guest, we need the address to identify the PE though PHB
 BUID could be used as same purpose to get PHB, which is one-to-one mapping 
 with
 IOMMU group on sPAPR platform. However, once the PE address is built and 
 returned
 to guest, guest will use the PE address as input parameter in subsequent RTAS
 calls.
 
 If the user is some kind of application who just uses the ioctl() without 
 supporting
 RTAS calls. We don't need care about PE address. 

I am a bit reluctant with that PE==PHB equation we seem to be introducing.

This isn't the case in HW. It's possible that this is how we handle VFIO *today*
in qemu but it doesn't have to be does it ?

It also paints us into a corner if we want to start implementing some kind of
emulated EEH for selected emulated devices and/or virtio.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 28.05.14 18:17, Alex Williamson wrote:

On Wed, 2014-05-28 at 13:37 +0200, Alexander Graf wrote:

On 28.05.14 02:57, Alex Williamson wrote:

On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:

On 28.05.14 02:39, Alex Williamson wrote:

On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:

On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt  | 92 
-
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
 include/uapi/linux/vfio.h   | 66 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


+
+   return ret;
+}
+
 static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
 {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(container-lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?

I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,

Yes, that's the point. I by far prefer to have you push back on anyone
who introduces useless ioctls rather than have a separate EEH number
space that people can just throw anything in they like ;).

Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,

Sure, that's what patch review is about. I'm really more concerned about
whose court the number space is in - you or Gavin. If we're talking
about top level ioctls you will care a lot more.

But I'm not religious about this. You're the VFIO maintainer, so it's
your call. I just personally cringe when I see an ioctl that gets an
opcode and a parameter argument where the parameter argument is a
union with one struct for each opcode.

Well, what would it look like...

struct vfio_eeh_pe_op {
__u32 argsz;
__u32 flags;
__u32 op;
};

Couldn't every single one of these be a separate op?  Are there any
cases where we can't use the ioctl return value?

VFIO_EEH_PE_DISABLE
VFIO_EEH_PE_ENABLE
VFIO_EEH_PE_UNFREEZE_IO
VFIO_EEH_PE_UNFREEZE_DMA
VFIO_EEH_PE_GET_MODE
VFIO_EEH_PE_RESET_DEACTIVATE
VFIO_EEH_PE_RESET_HOT
VFIO_EEH_PE_RESET_FUNDAMENTAL
VFIO_EEH_PE_CONFIGURE

It doesn't look that bad to me, what am I missing?  Thanks,


Yup, that looks well to me as well :)


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 28.05.14 23:58, Benjamin Herrenschmidt wrote:

On Wed, 2014-05-28 at 22:49 +1000, Gavin Shan wrote:

I will remove those address related macros in next revision because it's
user-level bussiness, not related to host kernel any more.

If the user is QEMU + guest, we need the address to identify the PE though PHB
BUID could be used as same purpose to get PHB, which is one-to-one mapping with
IOMMU group on sPAPR platform. However, once the PE address is built and 
returned
to guest, guest will use the PE address as input parameter in subsequent RTAS
calls.

If the user is some kind of application who just uses the ioctl() without 
supporting
RTAS calls. We don't need care about PE address.

I am a bit reluctant with that PE==PHB equation we seem to be introducing.

This isn't the case in HW. It's possible that this is how we handle VFIO *today*
in qemu but it doesn't have to be does it ?


Right, but that's pure QEMU internals. From the VFIO kernel interface's 
point of view, a VFIO group is a PE context, as that's what gets IOMMU 
controlled together too.



It also paints us into a corner if we want to start implementing some kind of
emulated EEH for selected emulated devices and/or virtio.


I don't think so :). In QEMU the PHB emulation would have to notify the 
container (IOMMU emulation layer - PE) that a PE operation happened. 
It's that emulation code's responsibility to broadcast operations across 
its own emulated operations (recover config space access, reconfigure 
BARs, etc) and the VFIO PE operations.


So from a kernel interface point of view, I think leaving out any 
address information is the right way to go. Whether we managed to get 
all QEMU internal interfaces modeled correctly yet has to be seen on the 
next patch set revision :).



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Gavin Shan
On Wed, May 28, 2014 at 03:12:35PM +0200, Alexander Graf wrote:

On 28.05.14 14:49, Gavin Shan wrote:
On Wed, May 28, 2014 at 01:41:35PM +0200, Alexander Graf wrote:
On 28.05.14 02:55, Gavin Shan wrote:
On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  Documentation/vfio.txt  | 92 
 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 
 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
[...]

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index cb9023d..c5fac36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -455,6 +455,72 @@ struct vfio_iommu_spapr_tce_info {
  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO   _IO(VFIO_TYPE, VFIO_BASE + 12)
+/*
+ * EEH functionality can be enabled or disabled on one specific device.
+ * Also, the DMA or IO frozen state can be removed from the frozen PE
+ * if required.
+ */
+struct vfio_eeh_pe_set_option {
+ __u32 argsz;
+ __u32 flags;
+ __u32 option;
+#define VFIO_EEH_PE_SET_OPT_DISABLE  0   /* Disable EEH  */
+#define VFIO_EEH_PE_SET_OPT_ENABLE   1   /* Enable EEH   */
+#define VFIO_EEH_PE_SET_OPT_IO   2   /* Enable IO*/
+#define VFIO_EEH_PE_SET_OPT_DMA  3   /* Enable DMA   */
This is more of a command than an option isn't it?  Each of these
probably needs a more significant description.

Yeah, it would be regarded as opcode and I'll add more description about
them in next revision.
Please just call them commands.

Ok. I guess you want me to change the macro names like this ?

#define VFIO_EEH_CMD_DISABLE  0   /* Disable EEH functionality*/
#define VFIO_EEH_CMD_ENABLE   1   /* Enable EEH functionality */
#define VFIO_EEH_CMD_ENABLE_IO2   /* Enable IO for frozen PE  
*/
#define VFIO_EEH_CMD_ENABLE_DMA   3   /* Enable DMA for frozen PE 
*/

Yes, the ioctl name too.


Ok. Thanks. I will also to rename those option / command related macros
to VFIO_EEH_CMD_ in next revision.


+};
+
+#define VFIO_EEH_PE_SET_OPTION   _IO(VFIO_TYPE, VFIO_BASE + 21)
+
+/*
+ * Each EEH PE should have unique address to be identified. PE's
+ * sharing mode is also useful information as well.
+ */
+#define VFIO_EEH_PE_GET_ADDRESS  0   /* Get address  */
+#define VFIO_EEH_PE_GET_MODE 1   /* Query mode   */
+#define VFIO_EEH_PE_MODE_NONE0   /* Not a PE */
+#define VFIO_EEH_PE_MODE_NOT_SHARED  1   /* Exclusive*/
+#define VFIO_EEH_PE_MODE_SHARED  2   /* Shared mode  */
+
+/*
+ * EEH PE might have been frozen because of PCI errors. Also, it might
+ * be experiencing reset for error revoery. The following command helps
+ * to get the state.
+ */
+struct vfio_eeh_pe_get_state {
+ __u32 argsz;
+ __u32 flags;
+ __u32 state;
+};
Should state be a union to better describe the value returned?  What
exactly is the address and why does the user need to know it?  Does this
need user input or could we just return the address and mode regardless?

Ok. I think you want enum (not union) for state. I'll have macros for the
state in next revision as I did that for other cases.

Those macros defined for address just for ABI stuff as Alex.G mentioned.
There isn't corresponding ioctl command for host to get address any more
because QEMU (user) will have to figure it out by himself. The address
here means PE address and user has to figure it out according to PE
segmentation.
Why would the user ever need the address?

I will remove those address related macros in next revision because it's
user-level bussiness, not related to host kernel any more.

Ok, so the next question is whether there will be any state outside
of GET_MODE left in the future.


That's also user-level business and those macros should be removed as well :-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Benjamin Herrenschmidt
On Thu, 2014-05-29 at 00:46 +0200, Alexander Graf wrote:
 I don't think so :). In QEMU the PHB emulation would have to notify the 
 container (IOMMU emulation layer - PE) that a PE operation happened. 
 It's that emulation code's responsibility to broadcast operations across 
 its own emulated operations (recover config space access, reconfigure 
 BARs, etc) and the VFIO PE operations.
 
 So from a kernel interface point of view, I think leaving out any 
 address information is the right way to go. Whether we managed to get 
 all QEMU internal interfaces modeled correctly yet has to be seen on the 
 next patch set revision :).

For a kernel interface I absolutely agree, all we care about is the group fd.

From a qemu internal, I suppose it makes things easier to operate at the PHB
level for now and we can revisit later if the need arises.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Gavin Shan
On Thu, May 29, 2014 at 12:40:26AM +0200, Alexander Graf wrote:
On 28.05.14 18:17, Alex Williamson wrote:
On Wed, 2014-05-28 at 13:37 +0200, Alexander Graf wrote:
On 28.05.14 02:57, Alex Williamson wrote:
On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:
On 28.05.14 02:39, Alex Williamson wrote:
On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
On 27.05.14 20:15, Alex Williamson wrote:
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt  | 92 
 -
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 
 ++
 include/uapi/linux/vfio.h   | 66 
 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
[...]

+
+  return ret;
+}
+
 static long tce_iommu_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
 {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
   tce_iommu_disable(container);
   mutex_unlock(container-lock);
   return 0;
+  case VFIO_EEH_PE_SET_OPTION:
+  case VFIO_EEH_PE_GET_STATE:
+  case VFIO_EEH_PE_RESET:
+  case VFIO_EEH_PE_CONFIGURE:
+  return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?
I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.
And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,
Yes, that's the point. I by far prefer to have you push back on anyone
who introduces useless ioctls rather than have a separate EEH number
space that people can just throw anything in they like ;).
Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,
Sure, that's what patch review is about. I'm really more concerned about
whose court the number space is in - you or Gavin. If we're talking
about top level ioctls you will care a lot more.

But I'm not religious about this. You're the VFIO maintainer, so it's
your call. I just personally cringe when I see an ioctl that gets an
opcode and a parameter argument where the parameter argument is a
union with one struct for each opcode.
Well, what would it look like...

struct vfio_eeh_pe_op {
  __u32 argsz;
  __u32 flags;
  __u32 op;
};

Couldn't every single one of these be a separate op?  Are there any
cases where we can't use the ioctl return value?

VFIO_EEH_PE_DISABLE
VFIO_EEH_PE_ENABLE
VFIO_EEH_PE_UNFREEZE_IO
VFIO_EEH_PE_UNFREEZE_DMA
VFIO_EEH_PE_GET_MODE
VFIO_EEH_PE_RESET_DEACTIVATE
VFIO_EEH_PE_RESET_HOT
VFIO_EEH_PE_RESET_FUNDAMENTAL
VFIO_EEH_PE_CONFIGURE

It doesn't look that bad to me, what am I missing?  Thanks,

Yup, that looks well to me as well :)


s/VFIO_EEH_PE_GET_MODE/VFIO_EEH_PE_GET_STATE.

I'll include this in next revision. Thanks, Alex.

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Alexander Graf


On 29.05.14 01:37, Gavin Shan wrote:

On Thu, May 29, 2014 at 12:40:26AM +0200, Alexander Graf wrote:

On 28.05.14 18:17, Alex Williamson wrote:

On Wed, 2014-05-28 at 13:37 +0200, Alexander Graf wrote:

On 28.05.14 02:57, Alex Williamson wrote:

On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:

On 28.05.14 02:39, Alex Williamson wrote:

On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:

On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt  | 92 
-
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
 include/uapi/linux/vfio.h   | 66 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


+
+   return ret;
+}
+
 static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
 {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(container-lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?

I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,

Yes, that's the point. I by far prefer to have you push back on anyone
who introduces useless ioctls rather than have a separate EEH number
space that people can just throw anything in they like ;).

Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,

Sure, that's what patch review is about. I'm really more concerned about
whose court the number space is in - you or Gavin. If we're talking
about top level ioctls you will care a lot more.

But I'm not religious about this. You're the VFIO maintainer, so it's
your call. I just personally cringe when I see an ioctl that gets an
opcode and a parameter argument where the parameter argument is a
union with one struct for each opcode.

Well, what would it look like...

struct vfio_eeh_pe_op {
__u32 argsz;
__u32 flags;
__u32 op;
};

Couldn't every single one of these be a separate op?  Are there any
cases where we can't use the ioctl return value?

VFIO_EEH_PE_DISABLE
VFIO_EEH_PE_ENABLE
VFIO_EEH_PE_UNFREEZE_IO
VFIO_EEH_PE_UNFREEZE_DMA
VFIO_EEH_PE_GET_MODE
VFIO_EEH_PE_RESET_DEACTIVATE
VFIO_EEH_PE_RESET_HOT
VFIO_EEH_PE_RESET_FUNDAMENTAL
VFIO_EEH_PE_CONFIGURE

It doesn't look that bad to me, what am I missing?  Thanks,

Yup, that looks well to me as well :)


s/VFIO_EEH_PE_GET_MODE/VFIO_EEH_PE_GET_STATE.

I'll include this in next revision. Thanks, Alex.


Yup, no need for CMD anymore then either :)


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Gavin Shan
On Thu, May 29, 2014 at 01:38:46AM +0200, Alexander Graf wrote:
On 29.05.14 01:37, Gavin Shan wrote:
On Thu, May 29, 2014 at 12:40:26AM +0200, Alexander Graf wrote:
On 28.05.14 18:17, Alex Williamson wrote:
On Wed, 2014-05-28 at 13:37 +0200, Alexander Graf wrote:
On 28.05.14 02:57, Alex Williamson wrote:
On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:
On 28.05.14 02:39, Alex Williamson wrote:
On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
On 27.05.14 20:15, Alex Williamson wrote:
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt  | 92 
 -
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 
 ++
 include/uapi/linux/vfio.h   | 66 
 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
[...]

+
+return ret;
+}
+
 static long tce_iommu_ioctl(void *iommu_data,
  unsigned int cmd, unsigned 
 long arg)
 {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
 tce_iommu_disable(container);
 mutex_unlock(container-lock);
 return 0;
+case VFIO_EEH_PE_SET_OPTION:
+case VFIO_EEH_PE_GET_STATE:
+case VFIO_EEH_PE_RESET:
+case VFIO_EEH_PE_CONFIGURE:
+return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?
I don't see the problem. We need to forward 4 ioctls to a separate 
piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.
And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,
Yes, that's the point. I by far prefer to have you push back on anyone
who introduces useless ioctls rather than have a separate EEH number
space that people can just throw anything in they like ;).
Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,
Sure, that's what patch review is about. I'm really more concerned about
whose court the number space is in - you or Gavin. If we're talking
about top level ioctls you will care a lot more.

But I'm not religious about this. You're the VFIO maintainer, so it's
your call. I just personally cringe when I see an ioctl that gets an
opcode and a parameter argument where the parameter argument is a
union with one struct for each opcode.
Well, what would it look like...

struct vfio_eeh_pe_op {
__u32 argsz;
__u32 flags;
__u32 op;
};

Couldn't every single one of these be a separate op?  Are there any
cases where we can't use the ioctl return value?

VFIO_EEH_PE_DISABLE
VFIO_EEH_PE_ENABLE
VFIO_EEH_PE_UNFREEZE_IO
VFIO_EEH_PE_UNFREEZE_DMA
VFIO_EEH_PE_GET_MODE
VFIO_EEH_PE_RESET_DEACTIVATE
VFIO_EEH_PE_RESET_HOT
VFIO_EEH_PE_RESET_FUNDAMENTAL
VFIO_EEH_PE_CONFIGURE

It doesn't look that bad to me, what am I missing?  Thanks,
Yup, that looks well to me as well :)

s/VFIO_EEH_PE_GET_MODE/VFIO_EEH_PE_GET_STATE.

I'll include this in next revision. Thanks, Alex.

Yup, no need for CMD anymore then either :)


Yep. Thanks, Guys :)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Gavin Shan
On Wed, May 28, 2014 at 10:32:11AM -0600, Alex Williamson wrote:
On Wed, 2014-05-28 at 10:55 +1000, Gavin Shan wrote:
 On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
 On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
  The patch adds new IOCTL commands for sPAPR VFIO container device
  to support EEH functionality for PCI devices, which have been passed
  through from host to somebody else via VFIO.
  
  Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
   Documentation/vfio.txt  | 92 
  -
   drivers/vfio/pci/Makefile   |  1 +
   drivers/vfio/pci/vfio_pci.c | 20 +---
   drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
   drivers/vfio/pci/vfio_pci_private.h |  5 ++
   drivers/vfio/vfio_iommu_spapr_tce.c | 85 
  ++
   include/uapi/linux/vfio.h   | 66 ++
   7 files changed, 308 insertions(+), 7 deletions(-)
   create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
  
  diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
  index b9ca023..d890fed 100644
  --- a/Documentation/vfio.txt
  +++ b/Documentation/vfio.txt
  @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented 
  in real mode which provides
   an excellent performance which has limitations such as inability to do
   locked pages accounting in real time.
   
  -So 3 additional ioctls have been added:
  +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an 
  I/O
  +subtree that can be treated as a unit for the purposes of partitioning 
  and
  +error recovery. A PE may be a single or multi-function IOA (IO Adapter), 
  a
  +function of a multi-function IOA, or multiple IOAs (possibly including 
  switch
  +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
  errors
  +and recover from them via EEH RTAS services, which works on the basis of
  +additional ioctl commands.
  +
  +So 7 additional ioctls have been added:
   
VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
of the DMA window on the PCI bus.
  @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
   
VFIO_IOMMU_DISABLE - disables the container.
   
  + VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
  + specified device. Also, it can be used to remove IO or DMA
  + stopped state on the frozen PE.
  +
  + VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
  +
  + VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
  + error recovering.
  +
  + VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
  + one of the major steps for error recoverying.
   
   The code flow from the example above should be slightly changed:
   
  @@ -346,6 +365,77 @@ The code flow from the example above should be 
  slightly changed:
ioctl(container, VFIO_IOMMU_MAP_DMA, dma_map);
.
   
  +Based on the initial example we have, the following piece of code could 
  be
  +reference for EEH setup and error handling:
  +
  + struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
  + struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
  + struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
  + struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
  +
  + 
  +
  + /* Get a file descriptor for the device */
  + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, :06:0d.0);
  +
  + /* Enable the EEH functionality on the device */
  + option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
  + ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
  +
  + /* You're suggested to create additional data struct to represent
  +  * PE, and put child devices belonging to same IOMMU group to the
  +  * PE instance for later reference.
  +  */
  +
  + /* Check the PE's state and make sure it's in functional state */
  + ioctl(container, VFIO_EEH_PE_GET_STATE, state);
  +
  + /* Save device's state. pci_save_state() would be good enough
  +  * as an example.
  +  */
  +
  + /* Test and setup the device */
  + ioctl(device, VFIO_DEVICE_GET_INFO, device_info);
  +
  + 
  +
  + /* When 0xFF's returned from reading PCI config space or IO BARs
  +  * of the PCI device. Check the PE state to see if that has been
  +  * frozen.
  +  */
  + ioctl(container, VFIO_EEH_PE_GET_STATE, state);
  +
  + /* Waiting for pending PCI transactions to be completed and don't
  +  * produce any more PCI traffic from/to the affected PE until
  +  * recovery is finished.
  +  */
  +
  + /* Enable IO for the affected PE and collect logs. Usually, the
  +  * standard part of PCI config space, AER registers are dumped
  +  * as logs for further analysis.
  +  */
  + option.option = VFIO_EEH_PE_SET_OPT_IO;
  + ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
  +
  + /* Issue PE reset */
  + reset.option = VFIO_EEH_PE_RESET_HOT;
  + 

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-28 Thread Benjamin Herrenschmidt
On Thu, 2014-05-29 at 10:05 +1000, Gavin Shan wrote:
 The log stuff is TBD and I'll figure it out later.
 
 About to what are the errors, there are a lot. Most of them are related
 to hardware level, for example unstable PCI link. Usually, those error
 bits defined in AER fatal error state register contribute to EEH errors.
 It could be software related, e.g. violating IOMMU protection (read/write
 permission etc), or even one PCI device isn't capable of DMAing. Hopefully,
 it's the explaination you're looking for? :-)

Note to Alex('s) ...

The log we get from FW at the moment in the host is:

  - In the case of pHyp / RTAS host, opaque. Basically it's a blob that we store
and that can be sent to IBM service people :-) Not terribly practical.

  - On PowerNV, it's a IODA specific data structure (basically a dump of a 
bunch of register state and tables). IODA is our IO architecture (sadly the
document itself isn't public at this point) and we have two versions, IODA1
and IODA2. You can consider the structure as chipset specific basically.

What I want to do in the long run is:

  - In the case of pHyp/RTAS host, there's not much we can do, so basically
forward that log as-is.

  - In the case of PowerNV, forward the log *and* add a well-defined blob to
it that does some basic interpretation of it. In fact I want to do the latter
more generally in the host kernel for host kernel errors as well, but we
can forward it to the guest via VFIO too. What I mean by interpretation is
something along the lines of an error type DMA IOMMU fault, MMIO error,
Link loss, PCIe UE, ... among a list of well known error types that
represent the most common ones, with a little bit of added info when
available (for most DMA errors we can provide the DMA address that faulted
for example).

So Gavin and I need to work a bit on that, both in the context of the host
kernel to improve the reporting there, and in the context of what we pass to
user space.

However, no driver today cares about that information. The PCIe error recovery
system doesn't carry it and it has no impact on the EEH recovery procedures,
so EEH in that sense is useful without that reporting. It's useful for the
programmer (or user/admin) to identify what went wrong but it's not used by
the automated recovery process.

One last thing to look at is in the case of a VFIO device, we might want to
silence the host kernel printf's once we support guest EEH since otherwise
the guest has a path to flood the host kernel log by triggering a lot of
EEH errors purposefully.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Gavin Shan
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt  | 92 -
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
 include/uapi/linux/vfio.h   | 66 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index b9ca023..d890fed 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
real mode which provides
 an excellent performance which has limitations such as inability to do
 locked pages accounting in real time.
 
-So 3 additional ioctls have been added:
+4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
+subtree that can be treated as a unit for the purposes of partitioning and
+error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
+function of a multi-function IOA, or multiple IOAs (possibly including switch
+and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors
+and recover from them via EEH RTAS services, which works on the basis of
+additional ioctl commands.
+
+So 7 additional ioctls have been added:
 
VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
of the DMA window on the PCI bus.
@@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
 
VFIO_IOMMU_DISABLE - disables the container.
 
+   VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
+   specified device. Also, it can be used to remove IO or DMA
+   stopped state on the frozen PE.
+
+   VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
+
+   VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
+   error recovering.
+
+   VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
+   one of the major steps for error recoverying.
 
 The code flow from the example above should be slightly changed:
 
@@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
changed:
ioctl(container, VFIO_IOMMU_MAP_DMA, dma_map);
.
 
+Based on the initial example we have, the following piece of code could be
+reference for EEH setup and error handling:
+
+   struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
+   struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
+   struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
+   struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
+
+   
+
+   /* Get a file descriptor for the device */
+   device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, :06:0d.0);
+
+   /* Enable the EEH functionality on the device */
+   option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
+   ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
+
+   /* You're suggested to create additional data struct to represent
+* PE, and put child devices belonging to same IOMMU group to the
+* PE instance for later reference.
+*/
+
+   /* Check the PE's state and make sure it's in functional state */
+   ioctl(container, VFIO_EEH_PE_GET_STATE, state);
+
+   /* Save device's state. pci_save_state() would be good enough
+* as an example.
+*/
+
+   /* Test and setup the device */
+   ioctl(device, VFIO_DEVICE_GET_INFO, device_info);
+
+   
+
+   /* When 0xFF's returned from reading PCI config space or IO BARs
+* of the PCI device. Check the PE state to see if that has been
+* frozen.
+*/
+   ioctl(container, VFIO_EEH_PE_GET_STATE, state);
+
+   /* Waiting for pending PCI transactions to be completed and don't
+* produce any more PCI traffic from/to the affected PE until
+* recovery is finished.
+*/
+
+   /* Enable IO for the affected PE and collect logs. Usually, the
+* standard part of PCI config space, AER registers are dumped
+* as logs for further analysis.
+*/
+   option.option = VFIO_EEH_PE_SET_OPT_IO;
+   ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
+
+   /* Issue PE reset */
+   reset.option = VFIO_EEH_PE_RESET_HOT;
+   ioctl(container, VFIO_EEH_PE_RESET, reset);
+   reset.option = VFIO_EEH_PE_RESET_DEACTIVATE;
+   ioctl(container, VFIO_EEH_PE_RESET, reset);
+
+   /* Configure the 

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
 The patch adds new IOCTL commands for sPAPR VFIO container device
 to support EEH functionality for PCI devices, which have been passed
 through from host to somebody else via VFIO.
 
 Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
  Documentation/vfio.txt  | 92 
 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
 
 diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
 index b9ca023..d890fed 100644
 --- a/Documentation/vfio.txt
 +++ b/Documentation/vfio.txt
 @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
 real mode which provides
  an excellent performance which has limitations such as inability to do
  locked pages accounting in real time.
  
 -So 3 additional ioctls have been added:
 +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
 +subtree that can be treated as a unit for the purposes of partitioning and
 +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
 +function of a multi-function IOA, or multiple IOAs (possibly including switch
 +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
 errors
 +and recover from them via EEH RTAS services, which works on the basis of
 +additional ioctl commands.
 +
 +So 7 additional ioctls have been added:
  
   VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
   of the DMA window on the PCI bus.
 @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
  
   VFIO_IOMMU_DISABLE - disables the container.
  
 + VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
 + specified device. Also, it can be used to remove IO or DMA
 + stopped state on the frozen PE.
 +
 + VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
 +
 + VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
 + error recovering.
 +
 + VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
 + one of the major steps for error recoverying.
  
  The code flow from the example above should be slightly changed:
  
 @@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
 changed:
   ioctl(container, VFIO_IOMMU_MAP_DMA, dma_map);
   .
  
 +Based on the initial example we have, the following piece of code could be
 +reference for EEH setup and error handling:
 +
 + struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
 + struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
 + struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
 + struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
 +
 + 
 +
 + /* Get a file descriptor for the device */
 + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, :06:0d.0);
 +
 + /* Enable the EEH functionality on the device */
 + option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
 + ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
 +
 + /* You're suggested to create additional data struct to represent
 +  * PE, and put child devices belonging to same IOMMU group to the
 +  * PE instance for later reference.
 +  */
 +
 + /* Check the PE's state and make sure it's in functional state */
 + ioctl(container, VFIO_EEH_PE_GET_STATE, state);
 +
 + /* Save device's state. pci_save_state() would be good enough
 +  * as an example.
 +  */
 +
 + /* Test and setup the device */
 + ioctl(device, VFIO_DEVICE_GET_INFO, device_info);
 +
 + 
 +
 + /* When 0xFF's returned from reading PCI config space or IO BARs
 +  * of the PCI device. Check the PE state to see if that has been
 +  * frozen.
 +  */
 + ioctl(container, VFIO_EEH_PE_GET_STATE, state);
 +
 + /* Waiting for pending PCI transactions to be completed and don't
 +  * produce any more PCI traffic from/to the affected PE until
 +  * recovery is finished.
 +  */
 +
 + /* Enable IO for the affected PE and collect logs. Usually, the
 +  * standard part of PCI config space, AER registers are dumped
 +  * as logs for further analysis.
 +  */
 + option.option = VFIO_EEH_PE_SET_OPT_IO;
 + ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
 +
 + /* Issue PE reset */
 + reset.option = VFIO_EEH_PE_RESET_HOT;
 + ioctl(container, VFIO_EEH_PE_RESET, reset);
 + reset.option = VFIO_EEH_PE_RESET_DEACTIVATE;
 + 

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 12:15 -0600, Alex Williamson wrote:

  +/*
  + * Reset is the major step to recover problematic PE. The following
  + * command helps on that.
  + */
  +struct vfio_eeh_pe_reset {
  +   __u32 argsz;
  +   __u32 flags;
  +   __u32 option;
  +#define VFIO_EEH_PE_RESET_DEACTIVATE   0   /* Deactivate reset 
  */
  +#define VFIO_EEH_PE_RESET_HOT  1   /* Hot reset
  */
  +#define VFIO_EEH_PE_RESET_FUNDAMENTAL  3   /* Fundamental reset
  */
 
 How does a user know which of these to use?

The usual way is the driver asks for one or the other, this plumbs back
into the guest EEH code which itself plumbs into the PCIe error recovery
framework in Linux.

However I do have a question for Gavin here: Why do we expose an
explicit deactivate ? The reset functions should do the whole
reset sequence (assertion, delay, deassertion). In fact the firmware
doesn't really give you a choice for PERST right ? Or do we have
a requirement to expose both phases for RTAS? (In that case I'm
happy to ignore the deassertion there too).

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 06:30 +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2014-05-27 at 12:15 -0600, Alex Williamson wrote:
 
   +/*
   + * Reset is the major step to recover problematic PE. The following
   + * command helps on that.
   + */
   +struct vfio_eeh_pe_reset {
   + __u32 argsz;
   + __u32 flags;
   + __u32 option;
   +#define VFIO_EEH_PE_RESET_DEACTIVATE 0   /* Deactivate reset 
   */
   +#define VFIO_EEH_PE_RESET_HOT1   /* Hot reset
   */
   +#define VFIO_EEH_PE_RESET_FUNDAMENTAL3   /* Fundamental reset
   */
  
  How does a user know which of these to use?
 
 The usual way is the driver asks for one or the other, this plumbs back
 into the guest EEH code which itself plumbs into the PCIe error recovery
 framework in Linux.

So magic?

 
 However I do have a question for Gavin here: Why do we expose an
 explicit deactivate ? The reset functions should do the whole
 reset sequence (assertion, delay, deassertion). In fact the firmware
 doesn't really give you a choice for PERST right ? Or do we have
 a requirement to expose both phases for RTAS? (In that case I'm
 happy to ignore the deassertion there too).
 
 Cheers,
 Ben.
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Benjamin Herrenschmidt
On Tue, 2014-05-27 at 14:37 -0600, Alex Williamson wrote:

  The usual way is the driver asks for one or the other, this plumbs back
  into the guest EEH code which itself plumbs into the PCIe error recovery
  framework in Linux.
 
 So magic?

Yes. The driver is expected to more or less knows what kind of reset it
wants for its device. Ideally hot reset is sufficient but some drivers
knows that the device they drive is crappy enough that it mostly ignores
hot reset and really needs a PERST for example...

Also we have other reasons to expose those interfaces outside of EEH. 

For example, some drivers might want to specifically trigger a PERST
after a microcode update. IE. There are path outside of EEH error
recovery where drivers in the guest might want to trigger a reset
to the device and they have control under some circumstances on
which kind of reset they are doing (and the guest Linux does  have
different code path to do a hot reset vs. a fundamental reset).

So we need to expose that distinction to be able to honor the guest
decision.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alexander Graf


On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  Documentation/vfio.txt  | 92 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c


[...]


+
+   return ret;
+}
+
  static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
  {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(container-lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?


I don't see the problem. We need to forward 4 ioctls to a separate piece 
of code, so we forward 4 ioctls to a separate piece of code :). Putting 
them into one ioctl just moves the switch() into another function.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
 On 27.05.14 20:15, Alex Williamson wrote:
  On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
  The patch adds new IOCTL commands for sPAPR VFIO container device
  to support EEH functionality for PCI devices, which have been passed
  through from host to somebody else via VFIO.
 
  Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
Documentation/vfio.txt  | 92 
  -
drivers/vfio/pci/Makefile   |  1 +
drivers/vfio/pci/vfio_pci.c | 20 +---
drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
drivers/vfio/pci/vfio_pci_private.h |  5 ++
drivers/vfio/vfio_iommu_spapr_tce.c | 85 
  ++
include/uapi/linux/vfio.h   | 66 ++
7 files changed, 308 insertions(+), 7 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
 
 [...]
 
  +
  +  return ret;
  +}
  +
static long tce_iommu_ioctl(void *iommu_data,
  unsigned int cmd, unsigned long arg)
{
  @@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
 tce_iommu_disable(container);
 mutex_unlock(container-lock);
 return 0;
  +  case VFIO_EEH_PE_SET_OPTION:
  +  case VFIO_EEH_PE_GET_STATE:
  +  case VFIO_EEH_PE_RESET:
  +  case VFIO_EEH_PE_CONFIGURE:
  +  return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
  This is where it would have really made sense to have a single
  VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
  AlexG, are you really attached to splitting these out into separate
  ioctls?
 
 I don't see the problem. We need to forward 4 ioctls to a separate piece 
 of code, so we forward 4 ioctls to a separate piece of code :). Putting 
 them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,

Alex


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alexander Graf


On 28.05.14 02:39, Alex Williamson wrote:

On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:

On 27.05.14 20:15, Alex Williamson wrote:

On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:

The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
   Documentation/vfio.txt  | 92 
-
   drivers/vfio/pci/Makefile   |  1 +
   drivers/vfio/pci/vfio_pci.c | 20 +---
   drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
   drivers/vfio/pci/vfio_pci_private.h |  5 ++
   drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
   include/uapi/linux/vfio.h   | 66 ++
   7 files changed, 308 insertions(+), 7 deletions(-)
   create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c

[...]


+
+   return ret;
+}
+
   static long tce_iommu_ioctl(void *iommu_data,
 unsigned int cmd, unsigned long arg)
   {
@@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(container-lock);
return 0;
+   case VFIO_EEH_PE_SET_OPTION:
+   case VFIO_EEH_PE_GET_STATE:
+   case VFIO_EEH_PE_RESET:
+   case VFIO_EEH_PE_CONFIGURE:
+   return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);

This is where it would have really made sense to have a single
VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
AlexG, are you really attached to splitting these out into separate
ioctls?

I don't see the problem. We need to forward 4 ioctls to a separate piece
of code, so we forward 4 ioctls to a separate piece of code :). Putting
them into one ioctl just moves the switch() into another function.

And uses an extra 3 ioctl numbers and gives us extra things to update if
we ever need to add more ioctls, etc.  ioctl numbers are an address
space, how much address space do we really want to give to EEH?  It's
not a big difference, but I don't think it's completely even either.
Thanks,


Yes, that's the point. I by far prefer to have you push back on anyone 
who introduces useless ioctls rather than have a separate EEH number 
space that people can just throw anything in they like ;).



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Gavin Shan
On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
 The patch adds new IOCTL commands for sPAPR VFIO container device
 to support EEH functionality for PCI devices, which have been passed
 through from host to somebody else via VFIO.
 
 Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
  Documentation/vfio.txt  | 92 
 -
  drivers/vfio/pci/Makefile   |  1 +
  drivers/vfio/pci/vfio_pci.c | 20 +---
  drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
  drivers/vfio/pci/vfio_pci_private.h |  5 ++
  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++
  include/uapi/linux/vfio.h   | 66 ++
  7 files changed, 308 insertions(+), 7 deletions(-)
  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
 
 diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
 index b9ca023..d890fed 100644
 --- a/Documentation/vfio.txt
 +++ b/Documentation/vfio.txt
 @@ -305,7 +305,15 @@ faster, the map/unmap handling has been implemented in 
 real mode which provides
  an excellent performance which has limitations such as inability to do
  locked pages accounting in real time.
  
 -So 3 additional ioctls have been added:
 +4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
 +subtree that can be treated as a unit for the purposes of partitioning and
 +error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
 +function of a multi-function IOA, or multiple IOAs (possibly including 
 switch
 +and bridge structures above the multiple IOAs). PPC64 guests detect PCI 
 errors
 +and recover from them via EEH RTAS services, which works on the basis of
 +additional ioctl commands.
 +
 +So 7 additional ioctls have been added:
  
  VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
  of the DMA window on the PCI bus.
 @@ -316,6 +324,17 @@ So 3 additional ioctls have been added:
  
  VFIO_IOMMU_DISABLE - disables the container.
  
 +VFIO_EEH_PE_SET_OPTION - enables or disables EEH functionality on the
 +specified device. Also, it can be used to remove IO or DMA
 +stopped state on the frozen PE.
 +
 +VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
 +
 +VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
 +error recovering.
 +
 +VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
 +one of the major steps for error recoverying.
  
  The code flow from the example above should be slightly changed:
  
 @@ -346,6 +365,77 @@ The code flow from the example above should be slightly 
 changed:
  ioctl(container, VFIO_IOMMU_MAP_DMA, dma_map);
  .
  
 +Based on the initial example we have, the following piece of code could be
 +reference for EEH setup and error handling:
 +
 +struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
 +struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
 +struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
 +struct vfio_eeh_pe_configure configure = { .argsz = sizeof(configure) };
 +
 +
 +
 +/* Get a file descriptor for the device */
 +device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, :06:0d.0);
 +
 +/* Enable the EEH functionality on the device */
 +option.option = VFIO_EEH_PE_SET_OPT_ENABLE;
 +ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
 +
 +/* You're suggested to create additional data struct to represent
 + * PE, and put child devices belonging to same IOMMU group to the
 + * PE instance for later reference.
 + */
 +
 +/* Check the PE's state and make sure it's in functional state */
 +ioctl(container, VFIO_EEH_PE_GET_STATE, state);
 +
 +/* Save device's state. pci_save_state() would be good enough
 + * as an example.
 + */
 +
 +/* Test and setup the device */
 +ioctl(device, VFIO_DEVICE_GET_INFO, device_info);
 +
 +
 +
 +/* When 0xFF's returned from reading PCI config space or IO BARs
 + * of the PCI device. Check the PE state to see if that has been
 + * frozen.
 + */
 +ioctl(container, VFIO_EEH_PE_GET_STATE, state);
 +
 +/* Waiting for pending PCI transactions to be completed and don't
 + * produce any more PCI traffic from/to the affected PE until
 + * recovery is finished.
 + */
 +
 +/* Enable IO for the affected PE and collect logs. Usually, the
 + * standard part of PCI config space, AER registers are dumped
 + * as logs for further analysis.
 + */
 +option.option = VFIO_EEH_PE_SET_OPT_IO;
 +ioctl(container, VFIO_EEH_PE_SET_OPTION, option);
 +
 +/* Issue PE reset */
 +reset.option = VFIO_EEH_PE_RESET_HOT;
 +ioctl(container, VFIO_EEH_PE_RESET, reset);
 +reset.option = VFIO_EEH_PE_RESET_DEACTIVATE;
 

Re: [PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

2014-05-27 Thread Alex Williamson
On Wed, 2014-05-28 at 02:44 +0200, Alexander Graf wrote:
 On 28.05.14 02:39, Alex Williamson wrote:
  On Wed, 2014-05-28 at 00:49 +0200, Alexander Graf wrote:
  On 27.05.14 20:15, Alex Williamson wrote:
  On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
  The patch adds new IOCTL commands for sPAPR VFIO container device
  to support EEH functionality for PCI devices, which have been passed
  through from host to somebody else via VFIO.
 
  Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
  ---
 Documentation/vfio.txt  | 92 
  -
 drivers/vfio/pci/Makefile   |  1 +
 drivers/vfio/pci/vfio_pci.c | 20 +---
 drivers/vfio/pci/vfio_pci_eeh.c | 46 +++
 drivers/vfio/pci/vfio_pci_private.h |  5 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 85 
  ++
 include/uapi/linux/vfio.h   | 66 ++
 7 files changed, 308 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
  [...]
 
  +
  +return ret;
  +}
  +
 static long tce_iommu_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
 {
  @@ -283,6 +363,11 @@ static long tce_iommu_ioctl(void *iommu_data,
   tce_iommu_disable(container);
   mutex_unlock(container-lock);
   return 0;
  +case VFIO_EEH_PE_SET_OPTION:
  +case VFIO_EEH_PE_GET_STATE:
  +case VFIO_EEH_PE_RESET:
  +case VFIO_EEH_PE_CONFIGURE:
  +return tce_iommu_eeh_ioctl(iommu_data, cmd, arg);
  This is where it would have really made sense to have a single
  VFIO_EEH_OP ioctl with a data structure passed to indicate the sub-op.
  AlexG, are you really attached to splitting these out into separate
  ioctls?
  I don't see the problem. We need to forward 4 ioctls to a separate piece
  of code, so we forward 4 ioctls to a separate piece of code :). Putting
  them into one ioctl just moves the switch() into another function.
  And uses an extra 3 ioctl numbers and gives us extra things to update if
  we ever need to add more ioctls, etc.  ioctl numbers are an address
  space, how much address space do we really want to give to EEH?  It's
  not a big difference, but I don't think it's completely even either.
  Thanks,
 
 Yes, that's the point. I by far prefer to have you push back on anyone 
 who introduces useless ioctls rather than have a separate EEH number 
 space that people can just throw anything in they like ;).

Well, I appreciate that, but having them as separate ioctls doesn't
really prevent that either.  Any one of these 4 could be set to take a
sub-option to extend and contort the EEH interface.  The only way to
prevent that would be to avoid the argsz+flags hack that make the ioctl
extendable.  Thanks,

Alex


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev