Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote: On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote: 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? The issue is that Dom0 and Xen need to agree on a common numbering space for the PCI domain AKA segment, which is really just a software concept i.e. on ARM Linux just makes them up (on x86 I believe they come from some firmware table so Xen and Dom0 agree to both use that). Doesn't the PCI domain or segments have an notion of which PCI devices are underneath it? Or vice-verse - PCI devices know what their segment (or domain) is? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Wed, Aug 12, 2015 at 03:42:10PM +0100, Ian Campbell wrote: On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote: On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote: On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote: 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? The issue is that Dom0 and Xen need to agree on a common numbering space for the PCI domain AKA segment, which is really just a software concept i.e. on ARM Linux just makes them up (on x86 I believe they come from some firmware table so Xen and Dom0 agree to both use that). Doesn't the PCI domain or segments have an notion of which PCI devices are underneath it? Or vice-verse - PCI devices know what their segment (or domain) is? The PCI domain or segment does contain a device, but it is a purely OS level concept, it has no real meaning in the hardware. So both Xen and Linux are free to fabricate whatever segment naming space they want, but obviously they need to agree, hence this hypercall lets Linux tell Xen what segment it has associated with a given PCI controller. Perhaps an example will help. Imagine we have two PCI host bridges, one with CFG space at 0xA000 and a second with CFG space at 0xB000. Xen discovers these and assigns segment 0=0xA000 and segment 1=0xB000. Dom0 discovers them too but assigns segment 1=0xA000 and segment 0=0xB000 (i.e. the other way). Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e. the device with BDF behind the root bridge at 0xA000. (Perhaps this is the PHYSDEVOP_manage_pci_add_ext call). But Xen thinks it is talking about the device with BDF behind the root bridge at 0xB000 because Dom0 and Xen do not agree on what the segments mean. Now Xen will use the wrong device ID in the IOMMU (since that is associated with the host bridge), or poke the wrong configuration space, or whatever. Or maybe Xen chose 42=0xB000 and 43=0xA000 so when Dom0 starts talking about segment=0 and =1 it has no idea what is going on. PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say Segment 0 is the host bridge at 0xB000 and Segment 1 is the host bridge at 0xA000. With this there is no confusion between Xen and Dom0 because Xen isn't picking a segment ID, it is being told what it is by Dom0 which has done the picking. Does that help? Yes thank you! Manish, please include this explanation in the design as it will surely help other folks in understanding it. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Wed, Aug 12, 2015 at 01:03:07PM +0530, Manish Jaggi wrote: Below are the comments. I will also send a Draft 4 taking account of the comments. On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote: - | PCI Pass-through in Xen ARM | - manish.ja...@caviumnetworks.com --- Draft-3 ... [snip] 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! It doesn’t pass the cfg_base and size to xen cfg_base is the BAR? Or the MMIO ? And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? I need to add one more line here to be more descriptive. The binding is between the segment number (domain number in linux) used by dom0 and the pci config space address in the pci node of device tree (reg property). The hypercall was introduced to cater the fact that the dom0 may process pci nodes in the device tree in any order. I still don't follow - sorry. Why would it matter that the PCI nodes are processed in any order? By this binding it is a clear ABI. #define PHYSDEVOP_pci_host_bridge_add44 struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint64_t cfg_base; uint64_t cfg_size; }; This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add hypercall. The handler code invokes to update segment number in pci_hostbridge: int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size); Subsequent calls to pci_conf_read/write are completed by the pci_hostbridge_ops of the respective pci_hostbridge. This design sounds like it is added to deal with having to pre-allocate the amount host controllers structure before the PCI devices are streaming in? Instead of having the PCI devices and PCI host controllers be updated as they are coming in? Why can't the second option be done? If you are referring to ACPI, we have to add the support. PCI Host controllers are pci nodes in device tree. I think what you are saying is that the PCI devices are being uploaded during ACPI parsing. The PCI host controllers are done via device tree. But what difference does that make? Why can't Xen deal with these being in any order? Can't it re-organize its internal represenation of PCI host controllers and PCI devices based on new data? 2.3Helper Functions a) pci_hostbridge_dt_node(pdev-seg); Returns the device tree node pointer of the pci node from which the pdev got enumerated. 3.SMMU programming --- 3.1.Additions for PCI Passthrough --- 3.1.1 - add_device in iommu_ops is implemented. This is called when PHYSDEVOP_pci_add_device is called from dom0. Or for PHYSDEVOP_manage_pci_add_ext ? Not sure but it seems logical for this also. .add_device = arm_smmu_add_dom0_dev, static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev) { if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); return arm_smmu_assign_dev(pdev-domain, devfn, dev); } return -1; } What about removal? What if the device is removed (hot-unplugged?? .remove_device = arm_smmu_remove_device(). would be called. Will update in Draft4 Also please mention what hypercall you would use. 3.1.2 dev_get_dev_node is modified for pci devices. - The function is modified to return the dt_node of the pci hostbridge from the device tree. This is required as non-dt devices need a way to find on which smmu they are attached. static struct arm_smmu_device *find_smmu_for_device(struct device *dev) { struct device_node *dev_node = dev_get_dev_node(dev); static struct device_node *dev_get_dev_node(struct device *dev) { if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); return pci_hostbridge_dt_node(pdev-seg); } ... 3.2.Mapping between streamID - deviceID - pci sbdf - requesterID -
Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Wed, 2015-08-12 at 10:25 -0400, Konrad Rzeszutek Wilk wrote: On Wed, Aug 12, 2015 at 09:56:41AM +0100, Ian Campbell wrote: On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote: 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? The issue is that Dom0 and Xen need to agree on a common numbering space for the PCI domain AKA segment, which is really just a software concept i.e. on ARM Linux just makes them up (on x86 I believe they come from some firmware table so Xen and Dom0 agree to both use that). Doesn't the PCI domain or segments have an notion of which PCI devices are underneath it? Or vice-verse - PCI devices know what their segment (or domain) is? The PCI domain or segment does contain a device, but it is a purely OS level concept, it has no real meaning in the hardware. So both Xen and Linux are free to fabricate whatever segment naming space they want, but obviously they need to agree, hence this hypercall lets Linux tell Xen what segment it has associated with a given PCI controller. Perhaps an example will help. Imagine we have two PCI host bridges, one with CFG space at 0xA000 and a second with CFG space at 0xB000. Xen discovers these and assigns segment 0=0xA000 and segment 1=0xB000. Dom0 discovers them too but assigns segment 1=0xA000 and segment 0=0xB000 (i.e. the other way). Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e. the device with BDF behind the root bridge at 0xA000. (Perhaps this is the PHYSDEVOP_manage_pci_add_ext call). But Xen thinks it is talking about the device with BDF behind the root bridge at 0xB000 because Dom0 and Xen do not agree on what the segments mean. Now Xen will use the wrong device ID in the IOMMU (since that is associated with the host bridge), or poke the wrong configuration space, or whatever. Or maybe Xen chose 42=0xB000 and 43=0xA000 so when Dom0 starts talking about segment=0 and =1 it has no idea what is going on. PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say Segment 0 is the host bridge at 0xB000 and Segment 1 is the host bridge at 0xA000. With this there is no confusion between Xen and Dom0 because Xen isn't picking a segment ID, it is being told what it is by Dom0 which has done the picking. Does that help? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Tue, 2015-08-11 at 16:34 -0400, Konrad Rzeszutek Wilk wrote: 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? The issue is that Dom0 and Xen need to agree on a common numbering space for the PCI domain AKA segment, which is really just a software concept i.e. on ARM Linux just makes them up (on x86 I believe they come from some firmware table so Xen and Dom0 agree to both use that). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough Design - Draft 3
Below are the comments. I will also send a Draft 4 taking account of the comments. On Wednesday 12 August 2015 02:04 AM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote: - | PCI Pass-through in Xen ARM | - manish.ja...@caviumnetworks.com --- Draft-3 ... [snip] 2.2PHYSDEVOP_pci_host_bridge_add hypercall -- Xen code accesses PCI configuration space based on the sbdf received from the guest. The order in which the pci device tree node appear may not be the same order of device enumeration in dom0. Thus there needs to be a mechanism to bind the segment number assigned by dom0 to the pci host controller. The hypercall is introduced: Why can't we extend the existing hypercall to have the segment value? Oh wait, PHYSDEVOP_manage_pci_add_ext does it already! It doesn’t pass the cfg_base and size to xen And have the hypercall (and Xen) be able to deal with introduction of PCI devices that are out of sync? Maybe I am confused but aren't PCI host controllers also 'uploaded' to Xen? I need to add one more line here to be more descriptive. The binding is between the segment number (domain number in linux) used by dom0 and the pci config space address in the pci node of device tree (reg property). The hypercall was introduced to cater the fact that the dom0 may process pci nodes in the device tree in any order. By this binding it is a clear ABI. #define PHYSDEVOP_pci_host_bridge_add44 struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint64_t cfg_base; uint64_t cfg_size; }; This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add hypercall. The handler code invokes to update segment number in pci_hostbridge: int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size); Subsequent calls to pci_conf_read/write are completed by the pci_hostbridge_ops of the respective pci_hostbridge. This design sounds like it is added to deal with having to pre-allocate the amount host controllers structure before the PCI devices are streaming in? Instead of having the PCI devices and PCI host controllers be updated as they are coming in? Why can't the second option be done? If you are referring to ACPI, we have to add the support. PCI Host controllers are pci nodes in device tree. 2.3Helper Functions a) pci_hostbridge_dt_node(pdev-seg); Returns the device tree node pointer of the pci node from which the pdev got enumerated. 3.SMMU programming --- 3.1.Additions for PCI Passthrough --- 3.1.1 - add_device in iommu_ops is implemented. This is called when PHYSDEVOP_pci_add_device is called from dom0. Or for PHYSDEVOP_manage_pci_add_ext ? Not sure but it seems logical for this also. .add_device = arm_smmu_add_dom0_dev, static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev) { if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); return arm_smmu_assign_dev(pdev-domain, devfn, dev); } return -1; } What about removal? What if the device is removed (hot-unplugged?? .remove_device = arm_smmu_remove_device(). would be called. Will update in Draft4 3.1.2 dev_get_dev_node is modified for pci devices. - The function is modified to return the dt_node of the pci hostbridge from the device tree. This is required as non-dt devices need a way to find on which smmu they are attached. static struct arm_smmu_device *find_smmu_for_device(struct device *dev) { struct device_node *dev_node = dev_get_dev_node(dev); static struct device_node *dev_get_dev_node(struct device *dev) { if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); return pci_hostbridge_dt_node(pdev-seg); } ... 3.2.Mapping between streamID - deviceID - pci sbdf - requesterID - For a simpler case all should be equal to BDF. But there are some devices that use the wrong requester ID for DMA transactions. Linux kernel has pci quirks for these. How the same be implemented in Xen or a diffrent approach has to s/pci/PCI/ be taken is TODO here. Till that time, for basic implementation it is assumed that all are equal to BDF. 4.Assignment of PCI device - 4.1Dom0 All PCI devices are assigned to dom0 unless hidden by pci-hide bootargs in dom0. 'pci-hide' in dom0? Greeping in Documentation/kernel-parameters.txt I don't see anything. %s/pci-hide//pciback/./hide// Dom0 enumerates the PCI devices. For each device the
Re: [Xen-devel] PCI Passthrough Design - Draft 3
On Tue, Aug 04, 2015 at 05:57:24PM +0530, Manish Jaggi wrote: - | PCI Pass-through in Xen ARM | - manish.ja...@caviumnetworks.com --- Draft-3 --- Introduction --- This document describes the design for the PCI passthrough support in Xen ARM. The target system is an ARM 64bit Soc with GICv3 and SMMU v2 and PCIe devices. --- Revision History --- Changes from Draft-1: - a) map_mmio hypercall removed from earlier draft b) device bar mapping into guest not 1:1 c) holes in guest address space 32bit / 64bit for MMIO virtual BARs d) xenstore device's BAR info addition. Changes from Draft-2: - a) DomU boot information updated with boot-time device assignment and hotplug. b) SMMU description added c) Mapping between streamID - bdf - deviceID. d) assign_device hypercall to include virtual(guest) sbdf. Toolstack to generate guest sbdf rather than pciback. --- Index --- (1) Background (2) Basic PCI Support in Xen ARM (2.1)pci_hostbridge and pci_hostbridge_ops (2.2)PHYSDEVOP_HOSTBRIDGE_ADD hypercall (3) SMMU programming (3.1) Additions for PCI Passthrough (3.2)Mapping between streamID - deviceID - pci sbdf (4) Assignment of PCI device (4.1) Dom0 (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k) (4.1.1.1) For Dom0 (4.1.1.2) For DomU (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space (4.2) DomU (4.2.1) Reserved Areas in guest memory space (4.2.2) New entries in xenstore for device BARs (4.2.4) Hypercall Modification for bdf mapping notification to xen (5) DomU FrontEnd Bus Changes (5.1)Change in Linux PCI FrontEnd - backend driver for MSI/X programming (5.2)Frontend bus and interrupt parent vITS (6) NUMA and PCI passthrough --- 1.Background of PCI passthrough -- Passthrough refers to assigning a pci device to a guest domain (domU) such that the guest has full control over the device. The MMIO space and interrupts are managed by the guest itself, close to how a bare kernel manages a device. s/pci/PCI/ Device's access to guest address space needs to be isolated and protected. SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device access guest memory for data transfer and sending MSI/X interrupts. PCI devices generated message signalled interrupt write are within guest address spaces which are also translated using SMMU. For this reason the GITS (ITS address space) Interrupt Translation Register space is mapped in the guest address space. 2.Basic PCI Support for ARM -- The apis to read write from pci configuration space are based on s/apis/APIs/ s/pci/PCI/ segment:bdf. How the sbdf is mapped to a physical address is under the realm of the pci s/pci/PCI/ host controller. ARM PCI support in Xen, introduces pci host controller similar to what s/pci/PCI/ exists in Linux. Each drivers registers callbacks, which are invoked on matching the compatible property in pci device tree node. 2.1pci_hostbridge and pci_hostbridge_ops -- The init function in the pci host driver calls to register hostbridge callbacks: int pci_hostbridge_register(pci_hostbridge_t *pcihb); struct pci_hostbridge_ops { u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn, u32 reg, u32 bytes); void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn, u32 reg, u32 bytes, u32 val); }; struct pci_hostbridge{ u32 segno; paddr_t cfg_base; paddr_t cfg_size; struct dt_device_node *dt_node; struct pci_hostbridge_ops ops; struct list_head list; }; A pci conf read function would internally be as follows: u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes) { pci_hostbridge_t *pcihb; list_for_each_entry(pcihb, pci_hostbridge_list, list) { if(pcihb-segno == seg) return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes); } return -1; } 2.2PHYSDEVOP_pci_host_bridge_add hypercall