Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Konrad Rzeszutek Wilk
On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
 On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
  
   2.2PHYSDEVOP_pci_host_bridge_add hypercall
   --
   Xen code accesses PCI configuration space based on the sbdf received from
   the
   guest. The order in which the pci device tree node appear may not be the
   same
   order of device enumeration in dom0. Thus there needs to be a mechanism to
   bind
   the segment number assigned by dom0 to the pci host controller. The
   hypercall
   is introduced:
  
  Why can't we extend the existing hypercall to have the segment value?
  
  Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
  
  And have the hypercall (and Xen) be able to deal with introduction of PCI
  devices that are out of sync?
  
  Maybe I am confused but aren't PCI host controllers also 'uploaded' to
  Xen?
 
 The issue is that Dom0 and Xen need to agree on a common numbering space
 for the PCI domain AKA segment, which is really just a software concept
 i.e. on ARM Linux just makes them up (on x86 I believe they come from some
 firmware table so Xen and Dom0 agree to both use that).

Doesn't the PCI domain or segments have an notion of which PCI devices are
underneath it? Or vice-verse - PCI devices know what their segment (or domain) 
is?

 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Konrad Rzeszutek Wilk
On Wed, Aug 12, 2015 at 03:42:10PM +0100, Ian Campbell wrote:
 On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote:
  On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
   On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:

 2.2PHYSDEVOP_pci_host_bridge_add hypercall
 --
 Xen code accesses PCI configuration space based on the sbdf 
 received from
 the
 guest. The order in which the pci device tree node appear may not 
 be the
 same
 order of device enumeration in dom0. Thus there needs to be a 
 mechanism to
 bind
 the segment number assigned by dom0 to the pci host controller. The
 hypercall
 is introduced:

Why can't we extend the existing hypercall to have the segment value?

Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!

And have the hypercall (and Xen) be able to deal with introduction of 
PCI
devices that are out of sync?

Maybe I am confused but aren't PCI host controllers also 'uploaded' 
to
Xen?
   
   The issue is that Dom0 and Xen need to agree on a common numbering 
   space
   for the PCI domain AKA segment, which is really just a software 
   concept
   i.e. on ARM Linux just makes them up (on x86 I believe they come from 
   some
   firmware table so Xen and Dom0 agree to both use that).
  
  Doesn't the PCI domain or segments have an notion of which PCI devices are
  underneath it? Or vice-verse - PCI devices know what their segment (or 
  domain) is?
 
 The PCI domain or segment does contain a device, but it is a purely OS
 level concept, it has no real meaning in the hardware. So both Xen and
 Linux are free to fabricate whatever segment naming space they want, but
 obviously they need to agree, hence this hypercall lets Linux tell Xen what
 segment it has associated with a given PCI controller.
 
 Perhaps an example will help.
 
 Imagine we have two PCI host bridges, one with CFG space at 0xA000 and
 a second with CFG space at 0xB000.
 
 Xen discovers these and assigns segment 0=0xA000 and segment
 1=0xB000.
 
 Dom0 discovers them too but assigns segment 1=0xA000 and segment
 0=0xB000 (i.e. the other way).
 
 Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
 the device with BDF behind the root bridge at 0xA000. (Perhaps this is
 the PHYSDEVOP_manage_pci_add_ext call).
 
 But Xen thinks it is talking about the device with BDF behind the root
 bridge at 0xB000 because Dom0 and Xen do not agree on what the segments
 mean. Now Xen will use the wrong device ID in the IOMMU (since that is
 associated with the host bridge), or poke the wrong configuration space, or
 whatever.
 
 Or maybe Xen chose 42=0xB000 and 43=0xA000 so when Dom0 starts
 talking about segment=0 and =1 it has no idea what is going on.
 
 PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say Segment 0
 is the host bridge at 0xB000 and Segment 1 is the host bridge at
 0xA000. With this there is no confusion between Xen and Dom0 because
 Xen isn't picking a segment ID, it is being told what it is by Dom0 which
 has done the picking.
 
 Does that help?

Yes thank you!

Manish, please include this explanation in the design as it will surely
help other folks in understanding it.
 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Konrad Rzeszutek Wilk
On Wed, Aug 12, 2015 at 01:03:07PM +0530, Manish Jaggi wrote:
 Below are the comments. I will also send a Draft 4 taking account of the 
 comments.
 
 
 On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote:
 On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:
   -
  | PCI Pass-through in Xen ARM |
   -
  manish.ja...@caviumnetworks.com
  ---
 
   Draft-3
 ...
 [snip]
 2.2PHYSDEVOP_pci_host_bridge_add hypercall
 --
 Xen code accesses PCI configuration space based on the sbdf received from
 the
 guest. The order in which the pci device tree node appear may not be the
 same
 order of device enumeration in dom0. Thus there needs to be a mechanism to
 bind
 the segment number assigned by dom0 to the pci host controller. The
 hypercall
 is introduced:
 Why can't we extend the existing hypercall to have the segment value?
 
 Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
 It doesn’t pass the cfg_base and size to xen

cfg_base is the BAR? Or the MMIO ?

 
 And have the hypercall (and Xen) be able to deal with introduction of PCI
 devices that are out of sync?
 
 Maybe I am confused but aren't PCI host controllers also 'uploaded' to
 Xen?
 I need to add one more line here to be more descriptive. The binding is
 between the segment number (domain number in linux)
 used by dom0 and the pci config space address in the pci node of device tree
 (reg property).
 The hypercall was introduced to cater the fact that the dom0 may process pci
 nodes in the device tree in any order.

I still don't follow - sorry.

Why would it matter that the PCI nodes are processed in any order?

 By this binding it is a clear ABI.
 #define PHYSDEVOP_pci_host_bridge_add44
 struct physdev_pci_host_bridge_add {
  /* IN */
  uint16_t seg;
  uint64_t cfg_base;
  uint64_t cfg_size;
 };
 
 This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
 hypercall. The handler code invokes to update segment number in
 pci_hostbridge:
 
 int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
 cfg_size);
 
 Subsequent calls to pci_conf_read/write are completed by the
 pci_hostbridge_ops
 of the respective pci_hostbridge.
 This design sounds like it is added to deal with having to pre-allocate the
 amount host controllers structure before the PCI devices are streaming in?
 
 Instead of having the PCI devices and PCI host controllers be updated
 as they are coming in?
 
 Why can't the second option be done?
 If you are referring to ACPI, we have to add the support.
 PCI Host controllers are pci nodes in device tree.

I think what you are saying is that the PCI devices are being uploaded
during ACPI parsing. The PCI host controllers are done via
device tree.

But what difference does that make? Why can't Xen deal with these
being in any order? Can't it re-organize its internal represenation
of PCI host controllers and PCI devices based on new data?



 2.3Helper Functions
 
 a) pci_hostbridge_dt_node(pdev-seg);
 Returns the device tree node pointer of the pci node from which the pdev got
 enumerated.
 
 3.SMMU programming
 ---
 
 3.1.Additions for PCI Passthrough
 ---
 3.1.1 - add_device in iommu_ops is implemented.
 
 This is called when PHYSDEVOP_pci_add_device is called from dom0.
 Or for PHYSDEVOP_manage_pci_add_ext ?
 Not sure but it seems logical for this also.
 .add_device = arm_smmu_add_dom0_dev,
 static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
 {
  if (dev_is_pci(dev)) {
  struct pci_dev *pdev = to_pci_dev(dev);
  return arm_smmu_assign_dev(pdev-domain, devfn, dev);
  }
  return -1;
 }
 
 What about removal?
 
 What if the device is removed (hot-unplugged??
 .remove_device  = arm_smmu_remove_device(). would be called.
 Will update in Draft4

Also please mention what hypercall you would use.

 
 3.1.2 dev_get_dev_node is modified for pci devices.
 -
 The function is modified to return the dt_node of the pci hostbridge from
 the device tree. This is required as non-dt devices need a way to find on
 which smmu they are attached.
 
 static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
 {
  struct device_node *dev_node = dev_get_dev_node(dev);
 
 
 static struct device_node *dev_get_dev_node(struct device *dev)
 {
  if (dev_is_pci(dev)) {
  struct pci_dev *pdev = to_pci_dev(dev);
  return pci_hostbridge_dt_node(pdev-seg);
  }
 ...
 
 
 3.2.Mapping between streamID - deviceID - pci sbdf - requesterID
 -
 

Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Ian Campbell
On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote:
  On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
   
2.2PHYSDEVOP_pci_host_bridge_add hypercall
--
Xen code accesses PCI configuration space based on the sbdf 
received from
the
guest. The order in which the pci device tree node appear may not 
be the
same
order of device enumeration in dom0. Thus there needs to be a 
mechanism to
bind
the segment number assigned by dom0 to the pci host controller. The
hypercall
is introduced:
   
   Why can't we extend the existing hypercall to have the segment value?
   
   Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
   
   And have the hypercall (and Xen) be able to deal with introduction of 
   PCI
   devices that are out of sync?
   
   Maybe I am confused but aren't PCI host controllers also 'uploaded' 
   to
   Xen?
  
  The issue is that Dom0 and Xen need to agree on a common numbering 
  space
  for the PCI domain AKA segment, which is really just a software 
  concept
  i.e. on ARM Linux just makes them up (on x86 I believe they come from 
  some
  firmware table so Xen and Dom0 agree to both use that).
 
 Doesn't the PCI domain or segments have an notion of which PCI devices are
 underneath it? Or vice-verse - PCI devices know what their segment (or 
 domain) is?

The PCI domain or segment does contain a device, but it is a purely OS
level concept, it has no real meaning in the hardware. So both Xen and
Linux are free to fabricate whatever segment naming space they want, but
obviously they need to agree, hence this hypercall lets Linux tell Xen what
segment it has associated with a given PCI controller.

Perhaps an example will help.

Imagine we have two PCI host bridges, one with CFG space at 0xA000 and
a second with CFG space at 0xB000.

Xen discovers these and assigns segment 0=0xA000 and segment
1=0xB000.

Dom0 discovers them too but assigns segment 1=0xA000 and segment
0=0xB000 (i.e. the other way).

Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
the device with BDF behind the root bridge at 0xA000. (Perhaps this is
the PHYSDEVOP_manage_pci_add_ext call).

But Xen thinks it is talking about the device with BDF behind the root
bridge at 0xB000 because Dom0 and Xen do not agree on what the segments
mean. Now Xen will use the wrong device ID in the IOMMU (since that is
associated with the host bridge), or poke the wrong configuration space, or
whatever.

Or maybe Xen chose 42=0xB000 and 43=0xA000 so when Dom0 starts
talking about segment=0 and =1 it has no idea what is going on.

PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say Segment 0
is the host bridge at 0xB000 and Segment 1 is the host bridge at
0xA000. With this there is no confusion between Xen and Dom0 because
Xen isn't picking a segment ID, it is being told what it is by Dom0 which
has done the picking.

Does that help?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Ian Campbell
On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote:
 
  2.2PHYSDEVOP_pci_host_bridge_add hypercall
  --
  Xen code accesses PCI configuration space based on the sbdf received from
  the
  guest. The order in which the pci device tree node appear may not be the
  same
  order of device enumeration in dom0. Thus there needs to be a mechanism to
  bind
  the segment number assigned by dom0 to the pci host controller. The
  hypercall
  is introduced:
 
 Why can't we extend the existing hypercall to have the segment value?
 
 Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!
 
 And have the hypercall (and Xen) be able to deal with introduction of PCI
 devices that are out of sync?
 
 Maybe I am confused but aren't PCI host controllers also 'uploaded' to
 Xen?

The issue is that Dom0 and Xen need to agree on a common numbering space
for the PCI domain AKA segment, which is really just a software concept
i.e. on ARM Linux just makes them up (on x86 I believe they come from some
firmware table so Xen and Dom0 agree to both use that).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-12 Thread Manish Jaggi

Below are the comments. I will also send a Draft 4 taking account of the 
comments.


On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote:

On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:

  -
 | PCI Pass-through in Xen ARM |
  -
 manish.ja...@caviumnetworks.com
 ---

  Draft-3
...
[snip]
2.2PHYSDEVOP_pci_host_bridge_add hypercall
--
Xen code accesses PCI configuration space based on the sbdf received from
the
guest. The order in which the pci device tree node appear may not be the
same
order of device enumeration in dom0. Thus there needs to be a mechanism to
bind
the segment number assigned by dom0 to the pci host controller. The
hypercall
is introduced:

Why can't we extend the existing hypercall to have the segment value?

Oh wait, PHYSDEVOP_manage_pci_add_ext does it already!

It doesn’t pass the cfg_base and size to xen


And have the hypercall (and Xen) be able to deal with introduction of PCI
devices that are out of sync?

Maybe I am confused but aren't PCI host controllers also 'uploaded' to
Xen?

I need to add one more line here to be more descriptive. The binding is
between the segment number (domain number in linux)
used by dom0 and the pci config space address in the pci node of device
tree (reg property).
The hypercall was introduced to cater the fact that the dom0 may process
pci nodes in the device tree in any order.
By this binding it is a clear ABI.

#define PHYSDEVOP_pci_host_bridge_add44
struct physdev_pci_host_bridge_add {
 /* IN */
 uint16_t seg;
 uint64_t cfg_base;
 uint64_t cfg_size;
};

This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
hypercall. The handler code invokes to update segment number in
pci_hostbridge:

int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
cfg_size);

Subsequent calls to pci_conf_read/write are completed by the
pci_hostbridge_ops
of the respective pci_hostbridge.

This design sounds like it is added to deal with having to pre-allocate the
amount host controllers structure before the PCI devices are streaming in?

Instead of having the PCI devices and PCI host controllers be updated
as they are coming in?

Why can't the second option be done?

If you are referring to ACPI, we have to add the support.
PCI Host controllers are pci nodes in device tree.

2.3Helper Functions

a) pci_hostbridge_dt_node(pdev-seg);
Returns the device tree node pointer of the pci node from which the pdev got
enumerated.

3.SMMU programming
---

3.1.Additions for PCI Passthrough
---
3.1.1 - add_device in iommu_ops is implemented.

This is called when PHYSDEVOP_pci_add_device is called from dom0.

Or for PHYSDEVOP_manage_pci_add_ext ?

Not sure but it seems logical for this also.

.add_device = arm_smmu_add_dom0_dev,
static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
{
 if (dev_is_pci(dev)) {
 struct pci_dev *pdev = to_pci_dev(dev);
 return arm_smmu_assign_dev(pdev-domain, devfn, dev);
 }
 return -1;
}


What about removal?

What if the device is removed (hot-unplugged??

.remove_device  = arm_smmu_remove_device(). would be called.
Will update in Draft4


3.1.2 dev_get_dev_node is modified for pci devices.
-
The function is modified to return the dt_node of the pci hostbridge from
the device tree. This is required as non-dt devices need a way to find on
which smmu they are attached.

static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
{
 struct device_node *dev_node = dev_get_dev_node(dev);


static struct device_node *dev_get_dev_node(struct device *dev)
{
 if (dev_is_pci(dev)) {
 struct pci_dev *pdev = to_pci_dev(dev);
 return pci_hostbridge_dt_node(pdev-seg);
 }
...


3.2.Mapping between streamID - deviceID - pci sbdf - requesterID
-
For a simpler case all should be equal to BDF. But there are some devices
that
use the wrong requester ID for DMA transactions. Linux kernel has pci quirks
for these. How the same be implemented in Xen or a diffrent approach has to

s/pci/PCI/

be
taken is TODO here.
Till that time, for basic implementation it is assumed that all are equal to
BDF.


4.Assignment of PCI device
-

4.1Dom0

All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs in
dom0.

'pci-hide' in dom0? Greeping in Documentation/kernel-parameters.txt I don't
see anything.

%s/pci-hide//pciback/./hide//

Dom0 enumerates the PCI devices. For each device the 

Re: [Xen-devel] PCI Passthrough Design - Draft 3

2015-08-11 Thread Konrad Rzeszutek Wilk
On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote:
  -
 | PCI Pass-through in Xen ARM |
  -
 manish.ja...@caviumnetworks.com
 ---
 
  Draft-3
 
 
 ---
 Introduction
 ---
 This document describes the design for the PCI passthrough support in Xen
 ARM.
 The target system is an ARM 64bit Soc with GICv3 and SMMU v2 and PCIe
 devices.
 
 ---
 Revision History
 ---
 Changes from Draft-1:
 -
 a) map_mmio hypercall removed from earlier draft
 b) device bar mapping into guest not 1:1
 c) holes in guest address space 32bit / 64bit for MMIO virtual BARs
 d) xenstore device's BAR info addition.
 
 Changes from Draft-2:
 -
 a) DomU boot information updated with boot-time device assignment and
 hotplug.
 b) SMMU description added
 c) Mapping between streamID - bdf - deviceID.
 d) assign_device hypercall to include virtual(guest) sbdf.
 Toolstack to generate guest sbdf rather than pciback.
 
 ---
 Index
 ---
   (1) Background
 
   (2) Basic PCI Support in Xen ARM
   (2.1)pci_hostbridge and pci_hostbridge_ops
   (2.2)PHYSDEVOP_HOSTBRIDGE_ADD hypercall
 
   (3) SMMU programming
   (3.1) Additions for PCI Passthrough
   (3.2)Mapping between streamID - deviceID - pci sbdf
 
   (4) Assignment of PCI device
 
   (4.1) Dom0
   (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
   (4.1.1.1) For Dom0
   (4.1.1.2) For DomU
   (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space
 
   (4.2) DomU
   (4.2.1) Reserved Areas in guest memory space
   (4.2.2) New entries in xenstore for device BARs
   (4.2.4) Hypercall Modification for bdf mapping notification to xen
 
   (5) DomU FrontEnd Bus Changes
   (5.1)Change in Linux PCI FrontEnd - backend driver for MSI/X
 programming
   (5.2)Frontend bus and interrupt parent vITS
 
   (6) NUMA and PCI passthrough
 ---
 
 1.Background of PCI passthrough
 --
 Passthrough refers to assigning a pci device to a guest domain (domU) such
 that
 the guest has full control over the device. The MMIO space and interrupts
 are
 managed by the guest itself, close to how a bare kernel manages a device.

s/pci/PCI/
 
 Device's access to guest address space needs to be isolated and protected.
 SMMU
 (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device
 access guest memory for data transfer and sending MSI/X interrupts. PCI
 devices
 generated message signalled interrupt write are within guest address spaces
 which
 are also translated using SMMU.
 For this reason the GITS (ITS address space) Interrupt Translation Register
 space is mapped in the guest address space.
 
 2.Basic PCI Support for ARM
 --
 The apis to read write from pci configuration space are based on

s/apis/APIs/
s/pci/PCI/
 segment:bdf.
 How the sbdf is mapped to a physical address is under the realm of the pci
s/pci/PCI/
 host controller.
 
 ARM PCI support in Xen, introduces pci host controller similar to what

s/pci/PCI/
 exists
 in Linux. Each drivers registers callbacks, which are invoked on matching
 the
 compatible property in pci device tree node.
 
 2.1pci_hostbridge and pci_hostbridge_ops
 --
 The init function in the pci host driver calls to register hostbridge
 callbacks:
 int pci_hostbridge_register(pci_hostbridge_t *pcihb);
 
 struct pci_hostbridge_ops {
 u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
 u32 reg, u32 bytes);
 void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
 u32 reg, u32 bytes, u32 val);
 };
 
 struct pci_hostbridge{
 u32 segno;
 paddr_t cfg_base;
 paddr_t cfg_size;
 struct dt_device_node *dt_node;
 struct pci_hostbridge_ops ops;
 struct list_head list;
 };
 
 A pci conf read function would internally be as follows:
 u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
 {
 pci_hostbridge_t *pcihb;
 list_for_each_entry(pcihb, pci_hostbridge_list, list)
 {
 if(pcihb-segno == seg)
 return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
 }
 return -1;
 }
 
 2.2PHYSDEVOP_pci_host_bridge_add hypercall