Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Tuesday 17 March 2015 06:01 PM, Jan Beulich wrote: On 17.03.15 at 13:06, mja...@caviumnetworks.com wrote: On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote: On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote: In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? I'm pretty sure (a) would even be refused by the maintainers, unless there already is a notification being sent. As to (b) - kernel code could keep track of which segment/bus pairs it informed Xen about, and hence wouldn't even need to wait for an error to be returned from the device-add request (which in your proposal would need to be re- issued after the host-bridge-add). Have a query on the CFG space address to be passed as hypercall parameter. The of_pci_get_host_bridge_resource only parses the ranges property and not reg. reg property has the CFG space address, which is usually stored in private pci host controller driver structures. so pci_dev 's parent pci_bus would not have that info. One way is to add a method in struct pci_ops but not sure it will be accepted or not. I'm afraid I don't understand what you're trying to tell me. Hi Jan, I missed this during initial discussion and found out while coding that CFG Space address of a pci host is stored in the reg property and the of_pci code dos not store reg in the resources only ranges are stored. So the pci_bus which is the rootbus created in the probe function of the pcie controller driver will have ranges values in resources but reg property value (CFG space address) in the private data. So from drivers/xen/pci.c we can find out the root bus (pci_bus) from the pci_dev (via BUS_NOTIFY) but cannot get the CFG space address. Now there are 2 ways a) Add a pci_ops to return the CFG space address b) Let the pci host controller driver invoke a function xen_invoke_hypercall () providing bus number and cfg_space address. xen_invoke_hypercall would be implemented in drivers/xen/pci.c and would issue PHYSDEVOP_pci_host_bridge_add hypercall Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote: In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? I'm pretty sure (a) would even be refused by the maintainers, unless there already is a notification being sent. As to (b) - kernel code could keep track of which segment/bus pairs it informed Xen about, and hence wouldn't even need to wait for an error to be returned from the device-add request (which in your proposal would need to be re- issued after the host-bridge-add). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote: On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote: In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? I'm pretty sure (a) would even be refused by the maintainers, unless there already is a notification being sent. As to (b) - kernel code could keep track of which segment/bus pairs it informed Xen about, and hence wouldn't even need to wait for an error to be returned from the device-add request (which in your proposal would need to be re- issued after the host-bridge-add). Have a query on the CFG space address to be passed as hypercall parameter. The of_pci_get_host_bridge_resource only parses the ranges property and not reg. reg property has the CFG space address, which is usually stored in private pci host controller driver structures. so pci_dev 's parent pci_bus would not have that info. One way is to add a method in struct pci_ops but not sure it will be accepted or not. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 17.03.15 at 13:06, mja...@caviumnetworks.com wrote: On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote: On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote: In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? I'm pretty sure (a) would even be refused by the maintainers, unless there already is a notification being sent. As to (b) - kernel code could keep track of which segment/bus pairs it informed Xen about, and hence wouldn't even need to wait for an error to be returned from the device-add request (which in your proposal would need to be re- issued after the host-bridge-add). Have a query on the CFG space address to be passed as hypercall parameter. The of_pci_get_host_bridge_resource only parses the ranges property and not reg. reg property has the CFG space address, which is usually stored in private pci host controller driver structures. so pci_dev 's parent pci_bus would not have that info. One way is to add a method in struct pci_ops but not sure it will be accepted or not. I'm afraid I don't understand what you're trying to tell me. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Tue, Mar 17, 2015 at 10:56:48AM +0530, Manish Jaggi wrote: On Friday 27 February 2015 10:20 PM, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. On ACPI based devices we will have the MCFG table, and things follow much as for x86: * Xen should parse MCFG to discover the PCI host-bridges * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in the same way as Xen/x86 does. The SBSA, an ARM standard for servers, mandates various things which we can rely on here because ACPI on ARM requires an SBSA compliant system. So things like odd quirks in PCI controllers or magic setup are spec'd out of our zone of caring (into the firmware I suppose), hence there is nothing like the DT_DEVICE_START stuff to register specific drivers etc. The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM systems (any more than it is on x86). We can decide whether to omit it from dom0 or ignore it from Xen later on. (Manish, this is FYI, I don't expect you to implement ACPI support!) In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. Couldn't the code figure out from 'struct pci_dev' whether the device is a bridge or an PCI device? And then do the proper hypercall? Interesting thing you _might_ hit (that I did) was that if you use 'bus=reassign' which re-assigns the bus numbers during scan - Xen gets very very confused. As in, the bus devices that Xen sees vs the ones Linux sees are different. Whether you will encounter this depends on whether the bridge devices and pci devices end up having an differnet bus number from what Xen scanned, and from what Linux has determined. (As in, Linux has found a bridge device with more PCI devices -so it repograms the bridge which moves all of the other PCI devices below it by X number). The reason I am bringing it up - it sounds like Xen will have no clue about some devices - and be told about it by Linux - if some reason it has the same bus number as some that Xen already scanned - gah! I think (b) can be done with minimal code changes. What do you think ? Less code == better. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Friday 27 February 2015 10:20 PM, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. On ACPI based devices we will have the MCFG table, and things follow much as for x86: * Xen should parse MCFG to discover the PCI host-bridges * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in the same way as Xen/x86 does. The SBSA, an ARM standard for servers, mandates various things which we can rely on here because ACPI on ARM requires an SBSA compliant system. So things like odd quirks in PCI controllers or magic setup are spec'd out of our zone of caring (into the firmware I suppose), hence there is nothing like the DT_DEVICE_START stuff to register specific drivers etc. The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM systems (any more than it is on x86). We can decide whether to omit it from dom0 or ignore it from Xen later on. (Manish, this is FYI, I don't expect you to implement ACPI support!) In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote: On Mon, 23 Feb 2015, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Actually this turns out to be wrong. On the PCI MCFG spec it is clearly stated: The MCFG table format allows for more than one memory mapped base address entry provided each entry (memory mapped configuration space base address allocation structure) corresponds to a unique PCI Segment Group consisting of 256 PCI buses. Multiple entries corresponding to a single PCI Segment Group is not allowed. For one, what you quote is in no contradiction to what I said. All it specifies is that there shouldn't be multiple MCFG table entries specifying the same segment. Whether on any such segment there is a single host bridge or multiple of them is of no interest here. And then the present x86 Linux code suggests that there might be systems actually violating this (the fact that each entry names not only a segment, but also a bus range also kind of suggests this despite the wording above); see commit 068258bc15 and its neighbors - even if it talks about the address ranges coming from other than ACPI tables, firmware wanting to express them by ACPI tables would have to violate that rule. I think that it is reasonable to expect device tree systems to respect this too. Not really - as soon as we leave ACPI land, we're free to arrange things however they suit us best (of course in agreement with other components involved, like Dom0 in this case), and for that case the cited Linux commit is a proper reference that it can (and has been) done differently by system designers. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Wed, 2015-03-11 at 18:26 +, Stefano Stabellini wrote: In other words I think that we still need PHYSDEVOP_pci_host_bridge_add (http://marc.info/?l=xen-develm=142470392016381) or equivalent, but we can drop the bus field from the struct. I think it makes sense for the struct to contain a similar set of entries to the MCFG ones, which would give us flexibility in the future if a) our interpretation of the specs is wrong or b) new specs come along which say something different (or Linux changes what it does internally). IOW I think segment+bus start+bus end is probably the way to go, even if we think bus will be unused today (which equates to it always being 0). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Thu, 12 Mar 2015, Jan Beulich wrote: On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote: On Mon, 23 Feb 2015, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Actually this turns out to be wrong. On the PCI MCFG spec it is clearly stated: The MCFG table format allows for more than one memory mapped base address entry provided each entry (memory mapped configuration space base address allocation structure) corresponds to a unique PCI Segment Group consisting of 256 PCI buses. Multiple entries corresponding to a single PCI Segment Group is not allowed. For one, what you quote is in no contradiction to what I said. All it specifies is that there shouldn't be multiple MCFG table entries specifying the same segment. Whether on any such segment there is a single host bridge or multiple of them is of no interest here. I thought that we had already established that one host bridge corresponds to one PCI config memory region, see the last sentence in http://marc.info/?l=xen-develm=142529695117142. Did I misunderstand it? If a host bridge has a 1:1 relationship with CFG space, then each MCFG entry would correspond to one host bridge and one segment. But it looks like that things are more complicated than that. And then the present x86 Linux code suggests that there might be systems actually violating this (the fact that each entry names not only a segment, but also a bus range also kind of suggests this despite the wording above); see commit 068258bc15 and its neighbors - even if it talks about the address ranges coming from other than ACPI tables, firmware wanting to express them by ACPI tables would have to violate that rule. Interesting. I think that it is reasonable to expect device tree systems to respect this too. Not really - as soon as we leave ACPI land, we're free to arrange things however they suit us best (of course in agreement with other components involved, like Dom0 in this case), and for that case the cited Linux commit is a proper reference that it can (and has been) done differently by system designers. OK. It looks like everything should work OK with the hypercall proposed by Ian anyway. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 12.03.15 at 11:33, stefano.stabell...@eu.citrix.com wrote: On Thu, 12 Mar 2015, Jan Beulich wrote: On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote: On Mon, 23 Feb 2015, Jan Beulich wrote: No - there can be multiple roots (i.e. host bridges) on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Actually this turns out to be wrong. On the PCI MCFG spec it is clearly stated: The MCFG table format allows for more than one memory mapped base address entry provided each entry (memory mapped configuration space base address allocation structure) corresponds to a unique PCI Segment Group consisting of 256 PCI buses. Multiple entries corresponding to a single PCI Segment Group is not allowed. For one, what you quote is in no contradiction to what I said. All it specifies is that there shouldn't be multiple MCFG table entries specifying the same segment. Whether on any such segment there is a single host bridge or multiple of them is of no interest here. I thought that we had already established that one host bridge corresponds to one PCI config memory region, see the last sentence in http://marc.info/?l=xen-develm=142529695117142. Did I misunderstand it? If a host bridge has a 1:1 relationship with CFG space, then each MCFG entry would correspond to one host bridge and one segment. No, that sentence doesn't imply what you appear to think. Within the same segment (and, for ACPI's sake within the same MCFG region) you could have multiple host bridges. And then what calls itself a host bridge (via class code) may or may not be one - often there are many devices calling themselves such even on a single bus (and my prior sentence specifically means to exclude those). And finally there are systems with their PCI roots expressed only in ACPI, without any specific PCI device serving as the host bridge. There it is most obvious that firmware assigns both segment and bus numbers to its liking. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Mon, 23 Feb 2015, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Actually this turns out to be wrong. On the PCI MCFG spec it is clearly stated: The MCFG table format allows for more than one memory mapped base address entry provided each entry (memory mapped configuration space base address allocation structure) corresponds to a unique PCI Segment Group consisting of 256 PCI buses. Multiple entries corresponding to a single PCI Segment Group is not allowed. I think that it is reasonable to expect device tree systems to respect this too. On ACPI systems the MCFG contains all the info we need, however on device tree systems the segment number is missing from the pcie node, so we still need to find a way to agree with Dom0 on which host bridge corresponds to which segnment. In other words I think that we still need PHYSDEVOP_pci_host_bridge_add (http://marc.info/?l=xen-develm=142470392016381) or equivalent, but we can drop the bus field from the struct. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Monday 02 March 2015 05:18 PM, Ian Campbell wrote: On Fri, 2015-02-27 at 17:15 +, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was intendend for passing down firmware information to Xen, as the information that we need is exactly the same, I think it would be acceptable to use the same hypercall on ARM too. I strongly disagree, they have very different semantics and overloading the existing interface would be both wrong and confusing. It'll also make things harder in the ACPI case where we want to use the existing hypercall for its original purpose (to propagate MCFG information to Xen). I am not hard set on this and the new hypercall is also a viable option. However If we do introduce a new hypercall as Ian suggested, do we need to take into account the possibility that an host bridge might have multiple cfg memory ranges? I don't believe so, a host bridge has a 1:1 relationship with CFG space. Ian. I agree with this flow. It fits into what we have implemented at our end. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-27 at 17:15 +, Stefano Stabellini wrote: On Fri, 27 Feb 2015, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was intendend for passing down firmware information to Xen, as the information that we need is exactly the same, I think it would be acceptable to use the same hypercall on ARM too. I strongly disagree, they have very different semantics and overloading the existing interface would be both wrong and confusing. It'll also make things harder in the ACPI case where we want to use the existing hypercall for its original purpose (to propagate MCFG information to Xen). I am not hard set on this and the new hypercall is also a viable option. However If we do introduce a new hypercall as Ian suggested, do we need to take into account the possibility that an host bridge might have multiple cfg memory ranges? I don't believe so, a host bridge has a 1:1 relationship with CFG space. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 27 Feb 2015, Ian Campbell wrote: On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: On 27.02.15 at 16:24, ian.campb...@citrix.com wrote: On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: MMCFG is a Linux config option, not to be confused with PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved is relevant. My (possibly flawed) understanding was that pci_mmcfg_reserved was intended to propagate the result of dom0 parsing some firmware table or other to the hypevisor. That's not flawed at all. I think that's a first in this thread ;-) In Linux dom0 we call it walking pci_mmcfg_list, which looking at arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking over a struct acpi_table_mcfg (there also appears to be a bunch of processor family derived entries, which I guess are quirks of some sort). Right - this parses ACPI tables (plus applies some knowledge about certain specific systems/chipsets/CPUs) and verifies that the space needed for the MMCFG region is properly reserved either in E820 or in the ACPI specified resources (only if so Linux decides to use MMCFG and consequently also tells Xen that it may use it). Thanks. So I think what I wrote in 1424948710.14641.25.ca...@citrix.com applies as is to Device Tree based ARM devices, including the need for the PHYSDEVOP_pci_host_bridge_add call. Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was intendend for passing down firmware information to Xen, as the information that we need is exactly the same, I think it would be acceptable to use the same hypercall on ARM too. I am not hard set on this and the new hypercall is also a viable option. However If we do introduce a new hypercall as Ian suggested, do we need to take into account the possibility that an host bridge might have multiple cfg memory ranges? On ACPI based devices we will have the MCFG table, and things follow much as for x86: * Xen should parse MCFG to discover the PCI host-bridges * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in the same way as Xen/x86 does. The SBSA, an ARM standard for servers, mandates various things which we can rely on here because ACPI on ARM requires an SBSA compliant system. So things like odd quirks in PCI controllers or magic setup are spec'd out of our zone of caring (into the firmware I suppose), hence there is nothing like the DT_DEVICE_START stuff to register specific drivers etc. The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM systems (any more than it is on x86). We can decide whether to omit it from dom0 or ignore it from Xen later on. (Manish, this is FYI, I don't expect you to implement ACPI support!) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-27 at 14:33 +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Ian Campbell wrote: On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote: Have you reached a conclusion? My current thinking on how PCI for Xen on ARM should look is thus: xen/arch/arm/pci.c: New file, containing core PCI infrastructure for ARM. Includes: pci_hostbridge_register(), which registers a host bridge: Registration includes: DT node pointer CFG space address pci_hostbridge_ops function table, which contains e.g. cfg space read/write ops, perhaps other stuff). Function for setting the (segment,bus) for a given host bridge. Lets say pci_hostbridge_setup(), the host bridge must have been previously registered. Looks up the host bridge via CFG space address and maps that to (segment,bus). Functions for looking up host bridges by various keys as needed (cfg base address, DT node, etc) pci_init() function, called from somewhere appropriate in setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see gic_init() for the shape of this) Any other common helper functions for managing PCI devices, e.g. for implementing PHYSDEVOP_*, which cannot be made properly common (i.e. shared with x86). xen/drivers/pci/host-*.c (or pci/host/*.c): New files, one per supported PCI controller IP block. Each should use the normal DT_DEVICE infrastructure for probing, i.e.: DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST) Probe function should call pci_hostbridge_register() for each host bridge which the controller exposes. xen/arch/arm/physdev.c: Implements do_physdev_op handling PHYSDEVOP_*. Includes: New hypercall subop PHYSDEVOP_pci_host_bridge_add: As per 1424703761.27930.140.ca...@citrix.com which calls pci_hostbridge_setup() to map the (segment,bus) to a specific pci_hostbridge_ops (i.e. must have previously been registered with pci_hostbridge_register(), else error). I think that the new hypercall is unnecessary. We know the MMCFG address ranges belonging to a given host bridge from DT and PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for a specific MMCFG. My understanding from discussion with Jan was that this is not what this hypercall does, or at least that this would be an abuse of the existing interface. See: 54e75d87027800062...@mail.emea.novell.com Anyway, what happens for when there is no MMCFG table to drive dom0's calls to pci_mmcfg_reserved? Or a given host-bridge doesn't have special flags and so isn't mentioned there. I think a dedicated hypercall is better. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Thu, 26 Feb 2015, Ian Campbell wrote: On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote: Have you reached a conclusion? My current thinking on how PCI for Xen on ARM should look is thus: xen/arch/arm/pci.c: New file, containing core PCI infrastructure for ARM. Includes: pci_hostbridge_register(), which registers a host bridge: Registration includes: DT node pointer CFG space address pci_hostbridge_ops function table, which contains e.g. cfg space read/write ops, perhaps other stuff). Function for setting the (segment,bus) for a given host bridge. Lets say pci_hostbridge_setup(), the host bridge must have been previously registered. Looks up the host bridge via CFG space address and maps that to (segment,bus). Functions for looking up host bridges by various keys as needed (cfg base address, DT node, etc) pci_init() function, called from somewhere appropriate in setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see gic_init() for the shape of this) Any other common helper functions for managing PCI devices, e.g. for implementing PHYSDEVOP_*, which cannot be made properly common (i.e. shared with x86). xen/drivers/pci/host-*.c (or pci/host/*.c): New files, one per supported PCI controller IP block. Each should use the normal DT_DEVICE infrastructure for probing, i.e.: DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST) Probe function should call pci_hostbridge_register() for each host bridge which the controller exposes. xen/arch/arm/physdev.c: Implements do_physdev_op handling PHYSDEVOP_*. Includes: New hypercall subop PHYSDEVOP_pci_host_bridge_add: As per 1424703761.27930.140.ca...@citrix.com which calls pci_hostbridge_setup() to map the (segment,bus) to a specific pci_hostbridge_ops (i.e. must have previously been registered with pci_hostbridge_register(), else error). I think that the new hypercall is unnecessary. We know the MMCFG address ranges belonging to a given host bridge from DT and PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for a specific MMCFG. We don't need anything else: we can simply match the host bridge based on the MMCFG address that dom0 tells us via PHYSDEVOP_pci_mmcfg_reserved with the addresses on DT. But we do need to support PHYSDEVOP_pci_mmcfg_reserved on ARM. PHYSDEVOP_pci_device_add/remove: Implement existing hypercall interface used by x86 for ARM. This requires that PHYSDEVOP_pci_host_bridge_add has been called for the (segment,bus) which it refers to, otherwise error. Looks up the host bridge and does whatever setup is required plus e.g. calling of pci_add_device(). No doubt various other existing interfaces will need wiring up, e.g. pci_conf_{read,write}* should lookup the host bridge ops struct and call the associated method. I'm sure the above must be incomplete, but I hope the general shape makes sense? I think it makes sense and it is along the lines of what I was thinking too. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote: Hi Julien, On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote: On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote: Hi Hi Pranavkumar, Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ. Yes, but in case of GICv2m( I am not sure about ITS) in register MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to generate an interrupt. If you write virq which is different that pirq (associated with the actual GICv2m frame ) then it will not trigger any interrupt. Now there is case which I am not sure how it can be solvable with one vITS/vGICv2m - . Suppose we have two GICv2m frames and say oneis having an address 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's MSI_SETSPI_NS register . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and second has 0x80-0x88 associated. . Now there are two PCIe hosts, first using first GICv2m frame as a MSI parent and another using second frame. . Device on first host uses MSI_SETSPI_NS (0x1000) address along with a data (i.e. intr number say 0x64) and device on second host uses 0x2000 and data 0x80 Now if we show one vGICv2m frame in guest for both the devices then what address I will program in each device's config space for MSI and also what will the data value. Secondly device's write for these addresses will be transparent to cpu so how can we trap them while device wants to trigger any interrupt ? Please correct me if I misunderstood anything. Is what you are suggesting a v2m specific issue? I thought the whole point of the ITS stuff in GICv3 was that one could program such virt-phys mappings into the hardware ITS and it would do the translation (the T in ITS) such that the host got the pIRQ it was expecting when the guest wrote the virtualised vIRQ information to the device. Caveat: If I've read the ITS bits of that doc at any point it was long ago and I've forgotten everything I knew about it... And I've never read anything about v2m at all ;-) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
Hi Julien, On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote: On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote: Hi Hi Pranavkumar, Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ. Yes, but in case of GICv2m( I am not sure about ITS) in register MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to generate an interrupt. If you write virq which is different that pirq (associated with the actual GICv2m frame ) then it will not trigger any interrupt. Now there is case which I am not sure how it can be solvable with one vITS/vGICv2m - . Suppose we have two GICv2m frames and say oneis having an address 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's MSI_SETSPI_NS register . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and second has 0x80-0x88 associated. . Now there are two PCIe hosts, first using first GICv2m frame as a MSI parent and another using second frame. . Device on first host uses MSI_SETSPI_NS (0x1000) address along with a data (i.e. intr number say 0x64) and device on second host uses 0x2000 and data 0x80 Now if we show one vGICv2m frame in guest for both the devices then what address I will program in each device's config space for MSI and also what will the data value. Secondly device's write for these addresses will be transparent to cpu so how can we trap them while device wants to trigger any interrupt ? Please correct me if I misunderstood anything. Thanks, Pranav I have a patch which allow virq != pirq: https://patches.linaro.org/43012/ Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote: Hi Hi Pranavkumar, Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ. I have a patch which allow virq != pirq: https://patches.linaro.org/43012/ Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
Hi On Thu, Feb 26, 2015 at 4:19 PM, Vijay Kilari vijay.kil...@gmail.com wrote: On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote: On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote: On 24/02/15 7:13 pm, Julien Grall wrote: On 24/02/15 00:23, Manish Jaggi wrote: Because you have to parse all the device tree to remove the reference to the second ITS. It's pointless and can be difficult to do it. Could you please describe the case where it is difficult You have to parse every node in the device tree and replace the msi-parent properties with only one ITS. Thats the idea. If you are able to emulate on ITS, you can do it for multiple one. keeping it simple and similar across dom0/domUs Consider a case where a domU is assigned two PCI devices which are attached to different nodes. (Node is an entity having its own cores are host controllers). The DOM0 view and guest view of the hardware are different. In the case of DOM0, we want to expose the same hardware layout as the host. So if there is 2 ITS then we should expose the 2 ITS. AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are handled by xen and a virtualized interface is provided to the guest. So if none of SMMU in the layout of host is presented to dom0 the same can be valid for multiple ITS. SMMU is one of the things which Xen hides from dom0, for obvious reasons. Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no reason for ITS to differ here. Since dom0 needs to be able to cope with being able to see all of the host host I/O devices (in the default no-passthrough case), it is possible, if not likely, that it will need the same amount of ITS resources (i.e. numbers of LPIs) as the host provides. In the case of the Guest, we (Xen) controls the memory layout. For Dom0 as well. Not true. For dom0 the memory layout is determined by the host memory layout. The MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM regions of the host physical address space (often in 1:1, but with sufficient h/w support this need not be the case). Therefore we can expose only one ITS. If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen GIC ITS emulation driver to work. It should check that request came from a dom0 handle it differently. I think this would be *difficult*. I don't think so. If the vITS is written to handle multiple instances (i.e. in a modular way, as it should be) then it shouldn't matter whether any given domain has 1 or many vITS. The fact that dom0 may have one or more and domU only (currently) has one then becomes largely uninteresting. I have few queries 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS command Q is mapped to which Physical ITS command Q. In case of linux, the ITS node is added as msi chip to pci using of_pci_msi_chip_add() and from pci_dev structure we can know which ITS to use. But in case of Xen, when ITS command is trapped we have only dev_id info from ITS command. 2) If DomU is always given one virtual ITS node. If DomU is assinged with two different PCI devices connected to different physical ITS, then Xen vITS driver should know how to map PCI device to physical ITS For the two issues above, Xen should have mapping to pci segment and physical ITS node to use which can be queried by vITS driver and send command on to correct physical ITS Also if we just show only one vITS (or only one Virtual v2m frame) instead of two vITS then actual hardware interrupt number and virtual interrupt number which guest will see will become different This will hamper direct irq routing to guest. - Pranav Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Thu, 2015-02-26 at 16:19 +0530, Vijay Kilari wrote: On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote: On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote: On 24/02/15 7:13 pm, Julien Grall wrote: On 24/02/15 00:23, Manish Jaggi wrote: Because you have to parse all the device tree to remove the reference to the second ITS. It's pointless and can be difficult to do it. Could you please describe the case where it is difficult You have to parse every node in the device tree and replace the msi-parent properties with only one ITS. Thats the idea. If you are able to emulate on ITS, you can do it for multiple one. keeping it simple and similar across dom0/domUs Consider a case where a domU is assigned two PCI devices which are attached to different nodes. (Node is an entity having its own cores are host controllers). The DOM0 view and guest view of the hardware are different. In the case of DOM0, we want to expose the same hardware layout as the host. So if there is 2 ITS then we should expose the 2 ITS. AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are handled by xen and a virtualized interface is provided to the guest. So if none of SMMU in the layout of host is presented to dom0 the same can be valid for multiple ITS. SMMU is one of the things which Xen hides from dom0, for obvious reasons. Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no reason for ITS to differ here. Since dom0 needs to be able to cope with being able to see all of the host host I/O devices (in the default no-passthrough case), it is possible, if not likely, that it will need the same amount of ITS resources (i.e. numbers of LPIs) as the host provides. In the case of the Guest, we (Xen) controls the memory layout. For Dom0 as well. Not true. For dom0 the memory layout is determined by the host memory layout. The MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM regions of the host physical address space (often in 1:1, but with sufficient h/w support this need not be the case). Therefore we can expose only one ITS. If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen GIC ITS emulation driver to work. It should check that request came from a dom0 handle it differently. I think this would be *difficult*. I don't think so. If the vITS is written to handle multiple instances (i.e. in a modular way, as it should be) then it shouldn't matter whether any given domain has 1 or many vITS. The fact that dom0 may have one or more and domU only (currently) has one then becomes largely uninteresting. I have few queries 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS command Q is mapped to which Physical ITS command Q. In case of linux, the ITS node is added as msi chip to pci using of_pci_msi_chip_add() and from pci_dev structure we can know which ITS to use. But in case of Xen, when ITS command is trapped we have only dev_id info from ITS command. With the proper PCI infrastructure in place we can map the vdev_id to a pdev_id, and from there to our own struct pci_dev The mapping from pdev_id to pci_dev is based on the PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I described just now in my mail to Manish in this thread (specifically pci_device_add creates and registers struct pci_dev I think). 2) If DomU is always given one virtual ITS node. If DomU is assinged with two different PCI devices connected to different physical ITS, then Xen vITS driver should know how to map PCI device to physical ITS Correct, I think that all falls out from the proper tracking of the vdev_id to pdev_id and from vits to pits for a given domain and the management/tracking of the struct pci_dev. Ian. For the two issues above, Xen should have mapping to pci segment and physical ITS node to use which can be queried by vITS driver and send command on to correct physical ITS Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
Hi Ian, On 26/02/15 11:12, Ian Campbell wrote: I have few queries 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS command Q is mapped to which Physical ITS command Q. In case of linux, the ITS node is added as msi chip to pci using of_pci_msi_chip_add() and from pci_dev structure we can know which ITS to use. But in case of Xen, when ITS command is trapped we have only dev_id info from ITS command. With the proper PCI infrastructure in place we can map the vdev_id to a pdev_id, and from there to our own struct pci_dev The mapping from pdev_id to pci_dev is based on the PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I described just now in my mail to Manish in this thread (specifically pci_device_add creates and registers struct pci_dev I think). We may need an hypercall to map the dev_id to a vdev_id. IIRC, Vijay and Manish was already planned to add one. 2) If DomU is always given one virtual ITS node. If DomU is assinged with two different PCI devices connected to different physical ITS, then Xen vITS driver should know how to map PCI device to physical ITS Correct, I think that all falls out from the proper tracking of the vdev_id to pdev_id and from vits to pits for a given domain and the management/tracking of the struct pci_dev. I think this is the right way to go. Though I haven't read the ITS spec closely. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote: On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote: On 24/02/15 7:13 pm, Julien Grall wrote: On 24/02/15 00:23, Manish Jaggi wrote: Because you have to parse all the device tree to remove the reference to the second ITS. It's pointless and can be difficult to do it. Could you please describe the case where it is difficult You have to parse every node in the device tree and replace the msi-parent properties with only one ITS. Thats the idea. If you are able to emulate on ITS, you can do it for multiple one. keeping it simple and similar across dom0/domUs Consider a case where a domU is assigned two PCI devices which are attached to different nodes. (Node is an entity having its own cores are host controllers). The DOM0 view and guest view of the hardware are different. In the case of DOM0, we want to expose the same hardware layout as the host. So if there is 2 ITS then we should expose the 2 ITS. AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are handled by xen and a virtualized interface is provided to the guest. So if none of SMMU in the layout of host is presented to dom0 the same can be valid for multiple ITS. SMMU is one of the things which Xen hides from dom0, for obvious reasons. Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no reason for ITS to differ here. Since dom0 needs to be able to cope with being able to see all of the host host I/O devices (in the default no-passthrough case), it is possible, if not likely, that it will need the same amount of ITS resources (i.e. numbers of LPIs) as the host provides. In the case of the Guest, we (Xen) controls the memory layout. For Dom0 as well. Not true. For dom0 the memory layout is determined by the host memory layout. The MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM regions of the host physical address space (often in 1:1, but with sufficient h/w support this need not be the case). Therefore we can expose only one ITS. If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen GIC ITS emulation driver to work. It should check that request came from a dom0 handle it differently. I think this would be *difficult*. I don't think so. If the vITS is written to handle multiple instances (i.e. in a modular way, as it should be) then it shouldn't matter whether any given domain has 1 or many vITS. The fact that dom0 may have one or more and domU only (currently) has one then becomes largely uninteresting. I have few queries 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS command Q is mapped to which Physical ITS command Q. In case of linux, the ITS node is added as msi chip to pci using of_pci_msi_chip_add() and from pci_dev structure we can know which ITS to use. But in case of Xen, when ITS command is trapped we have only dev_id info from ITS command. 2) If DomU is always given one virtual ITS node. If DomU is assinged with two different PCI devices connected to different physical ITS, then Xen vITS driver should know how to map PCI device to physical ITS For the two issues above, Xen should have mapping to pci segment and physical ITS node to use which can be queried by vITS driver and send command on to correct physical ITS Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Monday 23 February 2015 09:50 PM, Jan Beulich wrote: On 23.02.15 at 16:46, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote: On 23.02.15 at 16:02, ian.campb...@citrix.com wrote: Is the reason for the scan being of segment 0 only is that it is the one which lives at the legacy PCI CFG addresses (or those magic I/O ports)? Right - ideally we would scan all segments, but we need Dom0 to tell us which MMCFG regions are safe to access, Is this done via PHYSDEVOP_pci_mmcfg_reserved? Yes. and hence can't do that scan at boot time. But we also won't get away without scanning, as we need to set up the IOMMU(s) to at least cover the devices used for booting the system. Which hopefully are all segment 0 or aren't needed until after dom0 tells Xen about them I suppose. Right. With EFI one may be able to overcome this one day, but the legacy BIOS doesn't even surface mechanisms (software interrupts) to access devices outside of segment 0. (All devices on segment zero are supposed to be accessible via config space access method 1.) Is that the legacy or magic ... again? Yes (just that there are two of them). Ian/Jan, Have you reached a conclusion? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote: Have you reached a conclusion? My current thinking on how PCI for Xen on ARM should look is thus: xen/arch/arm/pci.c: New file, containing core PCI infrastructure for ARM. Includes: pci_hostbridge_register(), which registers a host bridge: Registration includes: DT node pointer CFG space address pci_hostbridge_ops function table, which contains e.g. cfg space read/write ops, perhaps other stuff). Function for setting the (segment,bus) for a given host bridge. Lets say pci_hostbridge_setup(), the host bridge must have been previously registered. Looks up the host bridge via CFG space address and maps that to (segment,bus). Functions for looking up host bridges by various keys as needed (cfg base address, DT node, etc) pci_init() function, called from somewhere appropriate in setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see gic_init() for the shape of this) Any other common helper functions for managing PCI devices, e.g. for implementing PHYSDEVOP_*, which cannot be made properly common (i.e. shared with x86). xen/drivers/pci/host-*.c (or pci/host/*.c): New files, one per supported PCI controller IP block. Each should use the normal DT_DEVICE infrastructure for probing, i.e.: DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST) Probe function should call pci_hostbridge_register() for each host bridge which the controller exposes. xen/arch/arm/physdev.c: Implements do_physdev_op handling PHYSDEVOP_*. Includes: New hypercall subop PHYSDEVOP_pci_host_bridge_add: As per 1424703761.27930.140.ca...@citrix.com which calls pci_hostbridge_setup() to map the (segment,bus) to a specific pci_hostbridge_ops (i.e. must have previously been registered with pci_hostbridge_register(), else error). PHYSDEVOP_pci_device_add/remove: Implement existing hypercall interface used by x86 for ARM. This requires that PHYSDEVOP_pci_host_bridge_add has been called for the (segment,bus) which it refers to, otherwise error. Looks up the host bridge and does whatever setup is required plus e.g. calling of pci_add_device(). No doubt various other existing interfaces will need wiring up, e.g. pci_conf_{read,write}* should lookup the host bridge ops struct and call the associated method. I'm sure the above must be incomplete, but I hope the general shape makes sense? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote: On 24/02/15 7:13 pm, Julien Grall wrote: On 24/02/15 00:23, Manish Jaggi wrote: Because you have to parse all the device tree to remove the reference to the second ITS. It's pointless and can be difficult to do it. Could you please describe the case where it is difficult You have to parse every node in the device tree and replace the msi-parent properties with only one ITS. Thats the idea. If you are able to emulate on ITS, you can do it for multiple one. keeping it simple and similar across dom0/domUs Consider a case where a domU is assigned two PCI devices which are attached to different nodes. (Node is an entity having its own cores are host controllers). The DOM0 view and guest view of the hardware are different. In the case of DOM0, we want to expose the same hardware layout as the host. So if there is 2 ITS then we should expose the 2 ITS. AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are handled by xen and a virtualized interface is provided to the guest. So if none of SMMU in the layout of host is presented to dom0 the same can be valid for multiple ITS. SMMU is one of the things which Xen hides from dom0, for obvious reasons. Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no reason for ITS to differ here. Since dom0 needs to be able to cope with being able to see all of the host host I/O devices (in the default no-passthrough case), it is possible, if not likely, that it will need the same amount of ITS resources (i.e. numbers of LPIs) as the host provides. In the case of the Guest, we (Xen) controls the memory layout. For Dom0 as well. Not true. For dom0 the memory layout is determined by the host memory layout. The MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM regions of the host physical address space (often in 1:1, but with sufficient h/w support this need not be the case). Therefore we can expose only one ITS. If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen GIC ITS emulation driver to work. It should check that request came from a dom0 handle it differently. I think this would be *difficult*. I don't think so. If the vITS is written to handle multiple instances (i.e. in a modular way, as it should be) then it shouldn't matter whether any given domain has 1 or many vITS. The fact that dom0 may have one or more and domU only (currently) has one then becomes largely uninteresting. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 24/02/15 7:13 pm, Julien Grall wrote: On 24/02/15 00:23, Manish Jaggi wrote: Because you have to parse all the device tree to remove the reference to the second ITS. It's pointless and can be difficult to do it. Could you please describe the case where it is difficult You have to parse every node in the device tree and replace the msi-parent properties with only one ITS. Thats the idea. If you are able to emulate on ITS, you can do it for multiple one. keeping it simple and similar across dom0/domUs Consider a case where a domU is assigned two PCI devices which are attached to different nodes. (Node is an entity having its own cores are host controllers). The DOM0 view and guest view of the hardware are different. In the case of DOM0, we want to expose the same hardware layout as the host. So if there is 2 ITS then we should expose the 2 ITS. AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are handled by xen and a virtualized interface is provided to the guest. So if none of SMMU in the layout of host is presented to dom0 the same can be valid for multiple ITS. In the case of the Guest, we (Xen) controls the memory layout. For Dom0 as well. Therefore we can expose only one ITS. If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen GIC ITS emulation driver to work. It should check that request came from a dom0 handle it differently. I think this would be *difficult*. IHMO, any ITS trap before this is wrong. AFAIK guest always sees a virtual ITS, could you please explain what is wrong in trapping. I never say the trapping is wrong in all case The before was here for any trap before the PCI has been added to Xen is, IHMO, wrong. There is no trap before. So I still don't understand why you need to parse the device tree node for PCI device at boot time... If it doesn't trap before, you should not need to know the PCI. Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20/02/15 8:09 pm, Ian Campbell wrote: On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote: Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). This seems ok, i will try it out. I recommend you let this subthread (e.g. the conversation with Jan) settle upon a preferred course of action before implementing any one suggestion. Ian we have also to consider for NUMA / multi node where there are two or more its nodes. pci0{ msi-parent = its0; } pci1{ msi-parent = its1; } This requires parsing pci nodes in xen and create a mapping between pci nodes and its. Xe would need to be aware of PCI nodes in device tree prior to dom0 sending a hypercall. Adding a property to pci node in device tree should be a good approach. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23/02/2015 10:59, Manish Jaggi wrote: On 20/02/15 8:09 pm, Ian Campbell wrote: On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote: Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). This seems ok, i will try it out. I recommend you let this subthread (e.g. the conversation with Jan) settle upon a preferred course of action before implementing any one suggestion. Ian we have also to consider for NUMA / multi node where there are two or more its nodes. pci0{ msi-parent = its0; } pci1{ msi-parent = its1; } This requires parsing pci nodes in xen and create a mapping between pci nodes and its. Xe would need to be aware of PCI nodes in device tree prior to dom0 sending a hypercall. Adding a property to pci node in device tree should be a good approach. Why do you need it early? Wouldn't be sufficient to retrieve those information when the hypercall pci_device_add is called? What about ACPI case? Does everything necessary live in static table? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23.02.15 at 13:45, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) Where a host bridge == what I've been calling PCI host controller? Some problems here may originate in this naming: I'm not aware of anything named host controller in PCI. The root of a PCI hierarchy (single or multiple buses) connects to the system bus via a host bridge. Whether one or more of them sit in a single chip is of no interest (on the x86 and ia64 sides at least). I suppose in principal a single controller might expose multiple host bridges, but I think we can logically treat such things as being multiple controllers (e.g. with multiple CFG spaces etc). Perhaps. on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Is the converse true -- i.e. can a single host bridge span multiple segments? Not that I'm aware of. IOW is the mapping from segment-host bridge many-one or many-many? Each segment may have multiple host bridges, each host bridge connect devices on multiple buses. Any such hierarchy is entirely separate from any other such hierarchy (both physically and in terms of the SBDFs used to identify them). Maybe what I should read into what you are saying is that segments are purely a software and/or firmware concept with no real basis in the hardware? Right - they just represent separation in hardware, but they have no real equivalent there. In which case might we be at liberty to specify that on ARM+Device Tree systems (i.e. those where the f/w tables don't give an enumeration) there is a 1:1 mapping from segments to host bridges? This again can only be answered knowing how bus number assignment gets done on ARM (see also below). If all bus numbers are distinct, there's no need for using segments numbers other then zero. In the end, if you used segment numbers now, you may end up closing the path to using them for much larger future setups. But if that's not seen as a problem then yes, I think you could go that route. So given a system with two PCI host controllers we end up with two segments (lets say A and B, but choosing those is the topic of this thread) and it is acceptable for both to contain a bus 0 with a device 1 on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist. It sounds like you are saying that this is not actually acceptable and that 0:0.0 must be unique in the system irrespective of the associated segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead? No, there can be multiple buses numbered zero. And at the same time a root bus doesn't need to be bus zero on its segment. 0:0.0 was just an example I pulled out of thin air, it wasn't supposed to imply some special property of bus 0 e.g. being the root or anything like that. If there are multiple buses numbered 0 then are they distinguished via segment or something else? Just by segment. What I don't get from this is where the BDF is being represented. It isn't, since this is representing the host controller not any given PCI devices which it contains. I thought in general BDFs were probed (or even configured) by the firmware and/or OS by walking over the CFG space and so aren't necessarily described anywhere in the firmware tables. They're effectively getting assigned by firmware, yes. This mainly affects bridges, which get stored (in their config space) the secondary and subordinate bus numbers (bounding the bus range they're responsible for when it comes to routing requests). If on ARM firmware doesn't assign bus numbers, is bridge setup then a job of the OS? FWIW the first 4 bytes in each line of interrupt-map are actually somehow matched against the masked (via interrupt-map-mask) against an encoding of the BDF to give the INTx routing, but BDFs aren't represented in the sense I think you meant in the example above. There is a capability to have child nodes of this root controller node which describe individual devices, and there is an encoding for the BDF in there, but these are not required. For reference I've
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote: On 23.02.15 at 13:45, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) Where a host bridge == what I've been calling PCI host controller? Some problems here may originate in this naming: I'm not aware of anything named host controller in PCI. The root of a PCI hierarchy (single or multiple buses) connects to the system bus via a host bridge. Whether one or more of them sit in a single chip is of no interest (on the x86 and ia64 sides at least). Yes, I think I've just been using the terminology wrongly, I mean host bridge throughout. There is generally one such bridge per controller (i.e. IP block) whic his what I was trying to get at in the next paragraph. Lets just talk about host bridges from now on to avoid confusion. I suppose in principal a single controller might expose multiple host bridges, but I think we can logically treat such things as being multiple controllers (e.g. with multiple CFG spaces etc). Perhaps. IOW is the mapping from segment-host bridge many-one or many-many? Each segment may have multiple host bridges, each host bridge connect devices on multiple buses. Any such hierarchy is entirely separate from any other such hierarchy (both physically and in terms of the SBDFs used to identify them). Maybe what I should read into what you are saying is that segments are purely a software and/or firmware concept with no real basis in the hardware? Right - they just represent separation in hardware, but they have no real equivalent there. I think I now understand. In which case might we be at liberty to specify that on ARM+Device Tree systems (i.e. those where the f/w tables don't give an enumeration) there is a 1:1 mapping from segments to host bridges? This again can only be answered knowing how bus number assignment gets done on ARM (see also below). If all bus numbers are distinct, there's no need for using segments numbers other then zero. In the end, if you used segment numbers now, you may end up closing the path to using them for much larger future setups. But if that's not seen as a problem then yes, I think you could go that route. Ultimately we just need to be able to go from the set of input parameters to e.g. PHYSDEVOP_pci_device_add to the associate host bridge. It seems like the appropriate pair is (segment,bus), which uniquely corresponds to a single host bridge (and many such pairs may do so). So I think the original question just goes from having to determine a way to map a segment to a host bridge to how to map a (segment,bus) tuple to a host bridge. What I don't get from this is where the BDF is being represented. It isn't, since this is representing the host controller not any given PCI devices which it contains. I thought in general BDFs were probed (or even configured) by the firmware and/or OS by walking over the CFG space and so aren't necessarily described anywhere in the firmware tables. They're effectively getting assigned by firmware, yes. This mainly affects bridges, which get stored (in their config space) the secondary and subordinate bus numbers (bounding the bus range they're responsible for when it comes to routing requests). If on ARM firmware doesn't assign bus numbers, is bridge setup then a job of the OS? I'm not completely sure I think it depends on the particular firmware (u-boot, EFI etc) but AIUI it can be the case that the OS does the enumeration on at least some ARM platforms (quite how/when it knows to do so I'm not sure). FWIW the first 4 bytes in each line of interrupt-map are actually somehow matched against the masked (via interrupt-map-mask) against an encoding of the BDF to give the INTx routing, but BDFs aren't represented in the sense I think you meant in the example above. There is a capability to have child nodes of this root controller node which describe individual devices, and there is an encoding for the BDF in there, but these are not required. For reference I've
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23/02/15 4:44 pm, Julien Grall wrote: On 23/02/2015 10:59, Manish Jaggi wrote: On 20/02/15 8:09 pm, Ian Campbell wrote: On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote: Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). This seems ok, i will try it out. I recommend you let this subthread (e.g. the conversation with Jan) settle upon a preferred course of action before implementing any one suggestion. Ian we have also to consider for NUMA / multi node where there are two or more its nodes. pci0{ msi-parent = its0; } pci1{ msi-parent = its1; } This requires parsing pci nodes in xen and create a mapping between pci nodes and its. Xe would need to be aware of PCI nodes in device tree prior to dom0 sending a hypercall. Adding a property to pci node in device tree should be a good approach. Why do you need it early? Wouldn't be sufficient to retrieve those information when the hypercall pci_device_add is called? The dom0/U device tree should have one 1 its node, xen should map to specific its when trapped. What about ACPI case? Does everything necessary live in static table? Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote: On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) Where a host bridge == what I've been calling PCI host controller? I suppose in principal a single controller might expose multiple host bridges, but I think we can logically treat such things as being multiple controllers (e.g. with multiple CFG spaces etc). on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. Is the converse true -- i.e. can a single host bridge span multiple segments? IOW is the mapping from segment-host bridge many-one or many-many? Maybe what I should read into what you are saying is that segments are purely a software and/or firmware concept with no real basis in the hardware? In which case might we be at liberty to specify that on ARM+Device Tree systems (i.e. those where the f/w tables don't give an enumeration) there is a 1:1 mapping from segments to host bridges? A PCI host controller defines the root of a bus, within which the BDF need not be distinct due to the differing segments which are effectively a higher level namespace on the BDFs. The host controller really defines the root of a tree (often covering multiple buses, i.e. as soon as bridges come into play). Right, I think that's the one thing I'd managed to understanding correctly ;-) So given a system with two PCI host controllers we end up with two segments (lets say A and B, but choosing those is the topic of this thread) and it is acceptable for both to contain a bus 0 with a device 1 on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist. It sounds like you are saying that this is not actually acceptable and that 0:0.0 must be unique in the system irrespective of the associated segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead? No, there can be multiple buses numbered zero. And at the same time a root bus doesn't need to be bus zero on its segment. 0:0.0 was just an example I pulled out of thin air, it wasn't supposed to imply some special property of bus 0 e.g. being the root or anything like that. If there are multiple buses numbered 0 then are they distinguished via segment or something else? Just for reference a DT node describing a PCI host controller might look like (taking the APM Mustang one as an example): pcie0: pcie@1f2b { status = disabled; device_type = pci; compatible = apm,xgene-storm-pcie, apm,xgene-pcie; #interrupt-cells = 1; #size-cells = 2; #address-cells = 3; reg = 0x00 0x1f2b 0x0 0x0001 /* Controller registers */ 0xe0 0xd000 0x0 0x0004; /* PCI config space */ reg-names = csr, cfg; ranges = 0x0100 0x00 0x 0xe0 0x1000 0x00 0x0001 /* io */ 0x0200 0x00 0x8000 0xe1 0x8000 0x00 0x8000; /* mem */ dma-ranges = 0x4200 0x80 0x 0x80 0x 0x00 0x8000 0x4200 0x00 0x 0x00 0x 0x80 0x; interrupt-map-mask = 0x0 0x0 0x0 0x7; interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1 0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1 0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1 0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1; dma-coherent; clocks = pcie0clk 0; }; I expect most of this is uninteresting but the key thing is that there is no segment number nor topology relative to e.g. pcie1: pcie@1f2c (the node look identical except e.g. all the base addresses and interrupt numbers differ). What I don't get from this is where the BDF is being represented. It isn't, since this
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote: On 23.02.15 at 15:33, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote: On 23.02.15 at 13:45, ian.campb...@citrix.com wrote: In which case might we be at liberty to specify that on ARM+Device Tree systems (i.e. those where the f/w tables don't give an enumeration) there is a 1:1 mapping from segments to host bridges? This again can only be answered knowing how bus number assignment gets done on ARM (see also below). If all bus numbers are distinct, there's no need for using segments numbers other then zero. In the end, if you used segment numbers now, you may end up closing the path to using them for much larger future setups. But if that's not seen as a problem then yes, I think you could go that route. Ultimately we just need to be able to go from the set of input parameters to e.g. PHYSDEVOP_pci_device_add to the associate host bridge. It seems like the appropriate pair is (segment,bus), which uniquely corresponds to a single host bridge (and many such pairs may do so). So I think the original question just goes from having to determine a way to map a segment to a host bridge to how to map a (segment,bus) tuple to a host bridge. Right. Avoiding (at this point in time) non-zero segments if at all possible. I think it sounds like we are going to leave that up to the dom0 OS and whatever it does gets registered with Xen. So non-zero segments is no longer (directly) up to the Xen code. I think that's fine. What I don't get from this is where the BDF is being represented. It isn't, since this is representing the host controller not any given PCI devices which it contains. I thought in general BDFs were probed (or even configured) by the firmware and/or OS by walking over the CFG space and so aren't necessarily described anywhere in the firmware tables. They're effectively getting assigned by firmware, yes. This mainly affects bridges, which get stored (in their config space) the secondary and subordinate bus numbers (bounding the bus range they're responsible for when it comes to routing requests). If on ARM firmware doesn't assign bus numbers, is bridge setup then a job of the OS? I'm not completely sure I think it depends on the particular firmware (u-boot, EFI etc) but AIUI it can be the case that the OS does the enumeration on at least some ARM platforms (quite how/when it knows to do so I'm not sure). In which case the Dom0 OS doing so would need to communicate its decisions to the hypervisor, as you suggest further down. So more concretely something like: #define PHYSDEVOP_pci_host_bridge_add XX struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint8_t bus; uint64_t address; }; typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t; DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t); Where seg+bus are enumerated/assigned by dom0 and address is some unique property of the host bridge -- most likely its pci cfg space base address (which is what physdev_pci_mmcfg_reserved also takes, I think?) Do you think we would need start_bus + end_bus here? Xen could enumerate this itself I think, and perhaps should even if dom0 tells us something? This basically replaces the bus scan (on segment 0) that Xen does on x86 (which topology information gets derived from). Is the reason for the scan being of segment 0 only is that it is the one which lives at the legacy PCI CFG addresses (or those magic I/O ports)? What about other host bridges in segment 0 which aren't at that address? You could do the others based on MMCFG tables if you wanted, right? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23/02/15 11:50, Manish Jaggi wrote: On 23/02/15 4:44 pm, Julien Grall wrote: On 23/02/2015 10:59, Manish Jaggi wrote: On 20/02/15 8:09 pm, Ian Campbell wrote: On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote: Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). This seems ok, i will try it out. I recommend you let this subthread (e.g. the conversation with Jan) settle upon a preferred course of action before implementing any one suggestion. Ian we have also to consider for NUMA / multi node where there are two or more its nodes. pci0{ msi-parent = its0; } pci1{ msi-parent = its1; } This requires parsing pci nodes in xen and create a mapping between pci nodes and its. Xe would need to be aware of PCI nodes in device tree prior to dom0 sending a hypercall. Adding a property to pci node in device tree should be a good approach. Why do you need it early? Wouldn't be sufficient to retrieve those information when the hypercall pci_device_add is called? The dom0/U device tree should have one 1 its node, xen should map to specific its when trapped. The DOM0 device tree should expose the same layout as the hardware. By exposing only one ITS you make your life more complicate. PHYSDEVOP_pci_device_add should be called before any initialization is done. Therefore ITS should be configured for this PCI after Xen is aware of the PCI. IHMO, any ITS trap before this is wrong. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23.02.15 at 15:33, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote: On 23.02.15 at 13:45, ian.campb...@citrix.com wrote: In which case might we be at liberty to specify that on ARM+Device Tree systems (i.e. those where the f/w tables don't give an enumeration) there is a 1:1 mapping from segments to host bridges? This again can only be answered knowing how bus number assignment gets done on ARM (see also below). If all bus numbers are distinct, there's no need for using segments numbers other then zero. In the end, if you used segment numbers now, you may end up closing the path to using them for much larger future setups. But if that's not seen as a problem then yes, I think you could go that route. Ultimately we just need to be able to go from the set of input parameters to e.g. PHYSDEVOP_pci_device_add to the associate host bridge. It seems like the appropriate pair is (segment,bus), which uniquely corresponds to a single host bridge (and many such pairs may do so). So I think the original question just goes from having to determine a way to map a segment to a host bridge to how to map a (segment,bus) tuple to a host bridge. Right. Avoiding (at this point in time) non-zero segments if at all possible. What I don't get from this is where the BDF is being represented. It isn't, since this is representing the host controller not any given PCI devices which it contains. I thought in general BDFs were probed (or even configured) by the firmware and/or OS by walking over the CFG space and so aren't necessarily described anywhere in the firmware tables. They're effectively getting assigned by firmware, yes. This mainly affects bridges, which get stored (in their config space) the secondary and subordinate bus numbers (bounding the bus range they're responsible for when it comes to routing requests). If on ARM firmware doesn't assign bus numbers, is bridge setup then a job of the OS? I'm not completely sure I think it depends on the particular firmware (u-boot, EFI etc) but AIUI it can be the case that the OS does the enumeration on at least some ARM platforms (quite how/when it knows to do so I'm not sure). In which case the Dom0 OS doing so would need to communicate its decisions to the hypervisor, as you suggest further down. This basically replaces the bus scan (on segment 0) that Xen does on x86 (which topology information gets derived from). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23.02.15 at 16:02, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote: In which case the Dom0 OS doing so would need to communicate its decisions to the hypervisor, as you suggest further down. So more concretely something like: #define PHYSDEVOP_pci_host_bridge_add XX struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint8_t bus; uint64_t address; }; typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t; DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t); Where seg+bus are enumerated/assigned by dom0 and address is some unique property of the host bridge -- most likely its pci cfg space base address (which is what physdev_pci_mmcfg_reserved also takes, I think?) Right. Do you think we would need start_bus + end_bus here? Xen could enumerate this itself I think, and perhaps should even if dom0 tells us something? That depends - if what you get presented here by Dom0 is a PCI device at seg:bus:00.0, and if all other setup was already done on it, then you could read the secondary and subordinate bus numbers from its config space. If that's not possible, then Dom0 handing you these values would seem to be necessary. As a result you may also need a hook from PCI device registration, allowing to associate it with the right host bridge (and refusing to add any for which there's none). As an alternative, extending PHYSDEVOP_manage_pci_add_ext in a suitable manner may be worth considering, provided (like on x86 and ia64) the host bridges get surfaced as distinct PCI devices. This basically replaces the bus scan (on segment 0) that Xen does on x86 (which topology information gets derived from). Is the reason for the scan being of segment 0 only is that it is the one which lives at the legacy PCI CFG addresses (or those magic I/O ports)? Right - ideally we would scan all segments, but we need Dom0 to tell us which MMCFG regions are safe to access, and hence can't do that scan at boot time. But we also won't get away without scanning, as we need to set up the IOMMU(s) to at least cover the devices used for booting the system. What about other host bridges in segment 0 which aren't at that address? At which address? (All devices on segment zero are supposed to be accessible via config space access method 1.) You could do the others based on MMCFG tables if you wanted, right? Yes, with the above mentioned caveat. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote: On 23.02.15 at 16:02, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote: In which case the Dom0 OS doing so would need to communicate its decisions to the hypervisor, as you suggest further down. So more concretely something like: #define PHYSDEVOP_pci_host_bridge_add XX struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint8_t bus; uint64_t address; }; typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t; DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t); Where seg+bus are enumerated/assigned by dom0 and address is some unique property of the host bridge -- most likely its pci cfg space base address (which is what physdev_pci_mmcfg_reserved also takes, I think?) Right. Do you think we would need start_bus + end_bus here? Xen could enumerate this itself I think, and perhaps should even if dom0 tells us something? That depends - if what you get presented here by Dom0 is a PCI device at seg:bus:00.0, and if all other setup was already done on it, then you could read the secondary and subordinate bus numbers from its config space. If that's not possible, then Dom0 handing you these values would seem to be necessary. As a result you may also need a hook from PCI device registration, allowing to associate it with the right host bridge (and refusing to add any for which there's none). Right. My thinking was that PHYSDEVOP_pci_host_bridge_add would add an entry into some mapping data structure from (segment,bus) to a handle associated with the associated pci host bridge driver in Xen. PHYSDEVOP_manage_pci_add would have to lookup the host bridge driver from the (segment,bus) I think to construct the necessary linkage for use later when we try to do things to the device, and it should indeed fail if it can't find one. As an alternative, extending PHYSDEVOP_manage_pci_add_ext in a suitable manner may be worth considering, provided (like on x86 and ia64) the host bridges get surfaced as distinct PCI devices. This basically replaces the bus scan (on segment 0) that Xen does on x86 (which topology information gets derived from). Is the reason for the scan being of segment 0 only is that it is the one which lives at the legacy PCI CFG addresses (or those magic I/O ports)? Right - ideally we would scan all segments, but we need Dom0 to tell us which MMCFG regions are safe to access, Is this done via PHYSDEVOP_pci_mmcfg_reserved? and hence can't do that scan at boot time. But we also won't get away without scanning, as we need to set up the IOMMU(s) to at least cover the devices used for booting the system. Which hopefully are all segment 0 or aren't needed until after dom0 tells Xen about them I suppose. What about other host bridges in segment 0 which aren't at that address? At which address? I meant this to be a back reference to the legacy PCI CFG addresses (or those magic I/O ports). (All devices on segment zero are supposed to be accessible via config space access method 1.) Is that the legacy or magic ... again? You could do the others based on MMCFG tables if you wanted, right? Yes, with the above mentioned caveat. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 23.02.15 at 16:46, ian.campb...@citrix.com wrote: On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote: On 23.02.15 at 16:02, ian.campb...@citrix.com wrote: Is the reason for the scan being of segment 0 only is that it is the one which lives at the legacy PCI CFG addresses (or those magic I/O ports)? Right - ideally we would scan all segments, but we need Dom0 to tell us which MMCFG regions are safe to access, Is this done via PHYSDEVOP_pci_mmcfg_reserved? Yes. and hence can't do that scan at boot time. But we also won't get away without scanning, as we need to set up the IOMMU(s) to at least cover the devices used for booting the system. Which hopefully are all segment 0 or aren't needed until after dom0 tells Xen about them I suppose. Right. With EFI one may be able to overcome this one day, but the legacy BIOS doesn't even surface mechanisms (software interrupts) to access devices outside of segment 0. (All devices on segment zero are supposed to be accessible via config space access method 1.) Is that the legacy or magic ... again? Yes (just that there are two of them). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20.02.15 at 18:33, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. No - there can be multiple roots (i.e. host bridges) on a single segment. Segments are - afaict - purely a scalability extension allowing to overcome the 256 bus limit. I think that PCI specs etc perhaps call a segment a PCI domain, which we avoided in Xen due to the obvious potential for confusion. Right, the two terms are getting mixed depending on where you look. A PCI host controller defines the root of a bus, within which the BDF need not be distinct due to the differing segments which are effectively a higher level namespace on the BDFs. The host controller really defines the root of a tree (often covering multiple buses, i.e. as soon as bridges come into play). So given a system with two PCI host controllers we end up with two segments (lets say A and B, but choosing those is the topic of this thread) and it is acceptable for both to contain a bus 0 with a device 1 on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist. It sounds like you are saying that this is not actually acceptable and that 0:0.0 must be unique in the system irrespective of the associated segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead? No, there can be multiple buses numbered zero. And at the same time a root bus doesn't need to be bus zero on its segment. Just for reference a DT node describing a PCI host controller might look like (taking the APM Mustang one as an example): pcie0: pcie@1f2b { status = disabled; device_type = pci; compatible = apm,xgene-storm-pcie, apm,xgene-pcie; #interrupt-cells = 1; #size-cells = 2; #address-cells = 3; reg = 0x00 0x1f2b 0x0 0x0001 /* Controller registers */ 0xe0 0xd000 0x0 0x0004; /* PCI config space */ reg-names = csr, cfg; ranges = 0x0100 0x00 0x 0xe0 0x1000 0x00 0x0001 /* io */ 0x0200 0x00 0x8000 0xe1 0x8000 0x00 0x8000; /* mem */ dma-ranges = 0x4200 0x80 0x 0x80 0x 0x00 0x8000 0x4200 0x00 0x 0x00 0x 0x80 0x; interrupt-map-mask = 0x0 0x0 0x0 0x7; interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1 0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1 0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1 0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1; dma-coherent; clocks = pcie0clk 0; }; I expect most of this is uninteresting but the key thing is that there is no segment number nor topology relative to e.g. pcie1: pcie@1f2c (the node look identical except e.g. all the base addresses and interrupt numbers differ). What I don't get from this is where the BDF is being represented. Yet that arrangement is fundamental to understand whether you really need segments to properly disambiguate devices. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20/02/15 12:10, Manish Jaggi wrote: On 20/02/15 5:33 pm, Julien Grall wrote: Hello Manish, On 20/02/15 11:34, Manish Jaggi wrote: The platform APIs are enhanced to provide support for parsing pci device tree nodes and storing the config-space address which is later used for pci_read/pci_write config calls. Can you explain why you choose to add per-platform callbacks rather than a generic solution? The platform code is similar to what linux has in drivers/pci/host/pci-platform.c. I have used the same concept. Please explain it in the commit message, it helps us to understand why you did it. Anyway, based on what you said, your approach looks wrong. Firstly, the platform code is DT-centric and we don't expect to have a such things for ACPI. Secondly, the PCI host code be shared between multiple platform. Overall, I would prefer to have a separate file and structure for handling PCI host. Also, I think we could re-use the Linux code for this purpose. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20/02/15 5:33 pm, Julien Grall wrote: Hello Manish, On 20/02/15 11:34, Manish Jaggi wrote: The platform APIs are enhanced to provide support for parsing pci device tree nodes and storing the config-space address which is later used for pci_read/pci_write config calls. Can you explain why you choose to add per-platform callbacks rather than a generic solution? The platform code is similar to what linux has in drivers/pci/host/pci-platform.c. I have used the same concept. Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20.02.15 at 14:45, ian.campb...@citrix.com wrote: (Jan, curious if you have any thoughts on this, hopefully I've left sufficient context for you to get what we are on about, the gist is we need some way for dom0 and Xen to agree on how a PCI segment ID maps to an actual PCI root controller, I think on x86 you either Just Know from PC legacy or ACPI tells you?) Yeah, without information from ACPI we'd have no way to know how to access the config space of segments other than 0. Both I/O port based access methods don't have room for specifying a segment number. Since the MMCFG addresses get set up by firmware, it is also firmware telling us the segment numbers. If you don't get them arranged in any particular order, ... On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote: I have added ABI that segment no maps to the position on pci node in xen device tree. We had partially discussed about this during Linaro connect. What is your teams view on this, should this be ok or we introduce a property in device tree pci node {xen_segment_id = 1} The DT node ordering cannot be relied on this way, so we certainly need something else. Ideally we should find a solution which doesn't require new properties. The best solution would be to find some existing property of the PCI host controller which is well defined and unique to each host controller. I had been thinking that the base address of the PCI CFG space might be a good enough handle. ... this approach would seem reasonable. The only question is whether the data type used for segment id in the hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm not sure if we are going to be able to make this pci_segment_t and have it differ for ARM. Are you expecting to have more than 64k segments? Otherwise, just sequentially assign segment numbers as you discover them or get them reported by Dom0. You could even have Dom0 tell you the segment numbers (just like we do on x86), thus eliminating the need for an extra mechanism for Dom0 to learn them. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-20 at 12:20 +, Julien Grall wrote: Overall, I would prefer to have a separate file and structure for handling PCI host. Also, I think we could re-use the Linux code for this purpose. (caveat; I've not looked at the code yet) I had expected that PCI host controllers would be discovered via the existing device model stuff and compatible string matching, e.g. DT_DEVICE_START(some_pcihost, SOME PCI HOST CONTROLLER, DEVICE_PCIBUS) which would reference a set of compatible strings and a probe function, the probe function would then call some pci bus registration function to hook that bus into the system. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
(Jan, curious if you have any thoughts on this, hopefully I've left sufficient context for you to get what we are on about, the gist is we need some way for dom0 and Xen to agree on how a PCI segment ID maps to an actual PCI root controller, I think on x86 you either Just Know from PC legacy or ACPI tells you?) On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote: I have added ABI that segment no maps to the position on pci node in xen device tree. We had partially discussed about this during Linaro connect. What is your teams view on this, should this be ok or we introduce a property in device tree pci node {xen_segment_id = 1} The DT node ordering cannot be relied on this way, so we certainly need something else. Ideally we should find a solution which doesn't require new properties. The best solution would be to find some existing property of the PCI host controller which is well defined and unique to each host controller. I had been thinking that the base address of the PCI CFG space might be a good enough handle. The only question is whether the data type used for segment id in the hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm not sure if we are going to be able to make this pci_segment_t and have it differ for ARM. Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). I'm not sure if this ends up being different on ACPI, where perhaps MMCONFIG or some other table actually gives us a segment ID for each PCI bus. Ideally whatever solution we end up with would fit into this model. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote: On 20.02.15 at 14:45, ian.campb...@citrix.com wrote: (Jan, curious if you have any thoughts on this, hopefully I've left sufficient context for you to get what we are on about, the gist is we need some way for dom0 and Xen to agree on how a PCI segment ID maps to an actual PCI root controller, I think on x86 you either Just Know from PC legacy or ACPI tells you?) Yeah, without information from ACPI we'd have no way to know how to access the config space of segments other than 0. Both I/O port based access methods don't have room for specifying a segment number. Since the MMCFG addresses get set up by firmware, it is also firmware telling us the segment numbers. If you don't get them arranged in any particular order, ... On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote: I have added ABI that segment no maps to the position on pci node in xen device tree. We had partially discussed about this during Linaro connect. What is your teams view on this, should this be ok or we introduce a property in device tree pci node {xen_segment_id = 1} The DT node ordering cannot be relied on this way, so we certainly need something else. Ideally we should find a solution which doesn't require new properties. The best solution would be to find some existing property of the PCI host controller which is well defined and unique to each host controller. I had been thinking that the base address of the PCI CFG space might be a good enough handle. ... this approach would seem reasonable. The only question is whether the data type used for segment id in the hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm not sure if we are going to be able to make this pci_segment_t and have it differ for ARM. Are you expecting to have more than 64k segments? If we were to use the PCI CFG base address as the handle for a segment then we would need a 64 bit field is all, it would of course be very sparse ;-). Otherwise, just sequentially assign segment numbers as you discover them or get them reported by Dom0. You could even have Dom0 tell you the segment numbers (just like we do on x86), Aha, how does this work on x86 then? I've been looking for a hypercall which dom0 uses to tell Xen about PCI segments to no avail (I just find ones for registering actual devices). If there is an existing mechanism on x86 and it suits (or nearly so) then I am entirely in favour of using it. Ian. thus eliminating the need for an extra mechanism for Dom0 to learn them. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20.02.15 at 15:26, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote: Otherwise, just sequentially assign segment numbers as you discover them or get them reported by Dom0. You could even have Dom0 tell you the segment numbers (just like we do on x86), Aha, how does this work on x86 then? I've been looking for a hypercall which dom0 uses to tell Xen about PCI segments to no avail (I just find ones for registering actual devices). But that's the one, plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved). Without ACPI, how do you know on ARM how to access config space for a particular segment? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote: Another option might be a new hypercall (assuming one doesn't already exist) to register a PCI bus which would take e.g. the PCI CFG base address and return a new u16 segment id to be used for all subsequent PCI related calls. This would require the dom0 OS to hook its pci_bus_add function, which might be doable (more doable than handling xen_segment_id DT properties I think). This seems ok, i will try it out. I recommend you let this subthread (e.g. the conversation with Jan) settle upon a preferred course of action before implementing any one suggestion. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote: On 20.02.15 at 15:26, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote: Otherwise, just sequentially assign segment numbers as you discover them or get them reported by Dom0. You could even have Dom0 tell you the segment numbers (just like we do on x86), Aha, how does this work on x86 then? I've been looking for a hypercall which dom0 uses to tell Xen about PCI segments to no avail (I just find ones for registering actual devices). But that's the one, that == physdev_pci_device_add? AFAICT that tells Xen about a given device existing on a particular segment, but doesn't tell Xen any of the properties of that segment. plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved). This looks promising, but rather under-documented. #define PHYSDEVOP_pci_mmcfg_reserved24 struct physdev_pci_mmcfg_reserved { uint64_t address; uint16_t segment; uint8_t start_bus; uint8_t end_bus; uint32_t flags; }; I suppose the first 4 fields correspond to entries in the MMCFG table? Which x86 Xen can parse and so can dom0, so dom0 can then make this hypercall, passing (address,segment,start_bus,end_bus) to set the flags? What is address the address of? The CFG space I think? On ARM with DT I think we only get given address, and something has to make up segment, start/end_bus I'm not sure where we would get them from. So although I think we could perhaps bend this interface to ARMs needs it would have rather different semantics to x86, i.e. instead of the key being (address,segment,start_bus,end_bus) and the value being flags it would be something like key = (address) and value = (segment,start_bus,end_bus,flags). I don't think reusing like that would be wise. Without ACPI, how do you know on ARM how to access config space for a particular segment? That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. So something somewhere needs to make up a segment ID for each PCI bus and Xen and dom0 need to somehow agree on what the mapping is e.g. by the one which made up the segment ID telling the other or some other TBD means. On x86 you solve this because both Xen and dom0 can parse the same table and reach the same answer, sadly DT doesn't have everything needed in it. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20.02.15 at 16:01, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote: plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved). This looks promising, but rather under-documented. #define PHYSDEVOP_pci_mmcfg_reserved24 struct physdev_pci_mmcfg_reserved { uint64_t address; uint16_t segment; uint8_t start_bus; uint8_t end_bus; uint32_t flags; }; I suppose the first 4 fields correspond to entries in the MMCFG table? Yes. Which x86 Xen can parse and so can dom0, so dom0 can then make this hypercall, passing (address,segment,start_bus,end_bus) to set the flags? No, the flags are IN too - since Xen can parse the table itself, there wouldn't be any need for the hypercall if there weren't many systems which don't reserve the MMCFG address range(s) in E820 and/or ACPI resources. Xen can check E820, but obtaining ACPI resource info requires AML parsing. What is address the address of? The CFG space I think? Yes, the base address of the MMCFG range (maybe suitably offset by the bus number). On ARM with DT I think we only get given address, and something has to make up segment, start/end_bus I'm not sure where we would get them from. So although I think we could perhaps bend this interface to ARMs needs it would have rather different semantics to x86, i.e. instead of the key being (address,segment,start_bus,end_bus) and the value being flags it would be something like key = (address) and value = (segment,start_bus,end_bus,flags). I don't think reusing like that would be wise. Yes, sufficiently different semantics would call for a new interface. Without ACPI, how do you know on ARM how to access config space for a particular segment? That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20/02/15 15:13, Manish Jaggi wrote: On x86 you solve this because both Xen and dom0 can parse the same table and reach the same answer, sadly DT doesn't have everything needed in it. In fact xen and dom0 use the same device tree nodes and in the same order. xen creates the device tree for dom0. I think xen can enforce the ABI while creating device tree While it's true for Linux, you can't assume that another OS (such as FreeBSD) will parse the device tree in the same order. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On 20/02/15 8:31 pm, Ian Campbell wrote: On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote: On 20.02.15 at 15:26, ian.campb...@citrix.com wrote: On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote: Otherwise, just sequentially assign segment numbers as you discover them or get them reported by Dom0. You could even have Dom0 tell you the segment numbers (just like we do on x86), Aha, how does this work on x86 then? I've been looking for a hypercall which dom0 uses to tell Xen about PCI segments to no avail (I just find ones for registering actual devices). But that's the one, that == physdev_pci_device_add? AFAICT that tells Xen about a given device existing on a particular segment, but doesn't tell Xen any of the properties of that segment. plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved). This looks promising, but rather under-documented. #define PHYSDEVOP_pci_mmcfg_reserved24 struct physdev_pci_mmcfg_reserved { uint64_t address; uint16_t segment; uint8_t start_bus; uint8_t end_bus; uint32_t flags; }; I suppose the first 4 fields correspond to entries in the MMCFG table? Which x86 Xen can parse and so can dom0, so dom0 can then make this hypercall, passing (address,segment,start_bus,end_bus) to set the flags? What is address the address of? The CFG space I think? On ARM with DT I think we only get given address, and something has to make up segment, start/end_bus I'm not sure where we would get them from. So although I think we could perhaps bend this interface to ARMs needs it would have rather different semantics to x86, i.e. instead of the key being (address,segment,start_bus,end_bus) and the value being flags it would be something like key = (address) and value = (segment,start_bus,end_bus,flags). I don't think reusing like that would be wise. Without ACPI, how do you know on ARM how to access config space for a particular segment? That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. So something somewhere needs to make up a segment ID for each PCI bus and Xen and dom0 need to somehow agree on what the mapping is e.g. by the one which made up the segment ID telling the other or some other TBD means. On x86 you solve this because both Xen and dom0 can parse the same table and reach the same answer, sadly DT doesn't have everything needed in it. In fact xen and dom0 use the same device tree nodes and in the same order. xen creates the device tree for dom0. I think xen can enforce the ABI while creating device tree Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote: That's the issue we are trying to resolve, with device tree there is no explicit segment ID, so we have an essentially unindexed set of PCI buses in both Xen and dom0. How that? What if two bus numbers are equal? There ought to be some kind of topology information. Or if all buses are distinct, then you don't need a segment number. It's very possible that I simply don't have the PCI terminology straight in my head, leading to me talking nonsense. I'll explain how I'm using it and perhaps you can put me straight... My understanding was that a PCI segment equates to a PCI host controller, i.e. a specific instance of some PCI host IP on an SoC. I think that PCI specs etc perhaps call a segment a PCI domain, which we avoided in Xen due to the obvious potential for confusion. A PCI host controller defines the root of a bus, within which the BDF need not be distinct due to the differing segments which are effectively a higher level namespace on the BDFs. So given a system with two PCI host controllers we end up with two segments (lets say A and B, but choosing those is the topic of this thread) and it is acceptable for both to contain a bus 0 with a device 1 on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist. It sounds like you are saying that this is not actually acceptable and that 0:0.0 must be unique in the system irrespective of the associated segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead? Just for reference a DT node describing a PCI host controller might look like (taking the APM Mustang one as an example): pcie0: pcie@1f2b { status = disabled; device_type = pci; compatible = apm,xgene-storm-pcie, apm,xgene-pcie; #interrupt-cells = 1; #size-cells = 2; #address-cells = 3; reg = 0x00 0x1f2b 0x0 0x0001 /* Controller registers */ 0xe0 0xd000 0x0 0x0004; /* PCI config space */ reg-names = csr, cfg; ranges = 0x0100 0x00 0x 0xe0 0x1000 0x00 0x0001 /* io */ 0x0200 0x00 0x8000 0xe1 0x8000 0x00 0x8000; /* mem */ dma-ranges = 0x4200 0x80 0x 0x80 0x 0x00 0x8000 0x4200 0x00 0x 0x00 0x 0x80 0x; interrupt-map-mask = 0x0 0x0 0x0 0x7; interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1 0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1 0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1 0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1; dma-coherent; clocks = pcie0clk 0; }; I expect most of this is uninteresting but the key thing is that there is no segment number nor topology relative to e.g. pcie1: pcie@1f2c (the node look identical except e.g. all the base addresses and interrupt numbers differ). (FWIW reg here shows that the PCI cfg space is at 0xe0d000, interrupt-map shows that SPI(AKA GSI) 0xc2 is INTA and 0xc3 is INTB (I think, a bit fuzzy...), ranges is the space where BARs live, I think you can safely ignore everything else for the purposes of this conversation). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel