Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Manish Jaggi


On Tuesday 17 March 2015 06:01 PM, Jan Beulich wrote:

On 17.03.15 at 13:06, mja...@caviumnetworks.com wrote:

On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:

On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote:

In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
hypercall to inform xen that a new pci device has been added.
If we were to inform xen about a new pci bus that is added there are  2 ways
a) Issue the hypercall from drivers/pci/probe.c
b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
segment number (s_bdf), it will return an error
SEG_NO_NOT_FOUND. After that the linux xen code could issue the
PHYSDEVOP_pci_host_bridge_add hypercall.

I think (b) can be done with minimal code changes. What do you think ?

I'm pretty sure (a) would even be refused by the maintainers, unless
there already is a notification being sent. As to (b) - kernel code could
keep track of which segment/bus pairs it informed Xen about, and
hence wouldn't even need to wait for an error to be returned from
the device-add request (which in your proposal would need to be re-
issued after the host-bridge-add).

Have a query on the CFG space address to be passed as hypercall parameter.
The of_pci_get_host_bridge_resource only parses the ranges property and
not reg.
reg property has the CFG space address, which is usually stored in
private pci host controller driver structures.

so pci_dev 's parent pci_bus would not have that info.
One way is to add a method in struct pci_ops but not sure it will be
accepted or not.

I'm afraid I don't understand what you're trying to tell me.

Hi Jan,
I missed this during initial discussion and found out while coding that 
CFG Space address  of a pci host is stored in the reg property
and the of_pci code dos not store reg in the resources only ranges are 
stored. So the pci_bus which is the rootbus created in the probe 
function of the
pcie controller driver will have ranges values in resources but reg 
property value (CFG space address) in the private data.
So from drivers/xen/pci.c we can find out the root bus (pci_bus) from 
the pci_dev (via BUS_NOTIFY) but cannot get the CFG space address.


Now there are 2 ways
a) Add a pci_ops to return the CFG space address
b) Let the pci host controller driver invoke a function 
xen_invoke_hypercall () providing bus number and cfg_space address.
xen_invoke_hypercall would be implemented in drivers/xen/pci.c and 
would issue  PHYSDEVOP_pci_host_bridge_add hypercall

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Jan Beulich
 On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote:
 In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a 
 hypercall to inform xen that a new pci device has been added.
 If we were to inform xen about a new pci bus that is added there are  2 ways
 a) Issue the hypercall from drivers/pci/probe.c
 b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue 
 PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that 
 segment number (s_bdf), it will return an error
 SEG_NO_NOT_FOUND. After that the linux xen code could issue the 
 PHYSDEVOP_pci_host_bridge_add hypercall.
 
 I think (b) can be done with minimal code changes. What do you think ?

I'm pretty sure (a) would even be refused by the maintainers, unless
there already is a notification being sent. As to (b) - kernel code could
keep track of which segment/bus pairs it informed Xen about, and
hence wouldn't even need to wait for an error to be returned from
the device-add request (which in your proposal would need to be re-
issued after the host-bridge-add).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Manish Jaggi


On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:

On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote:

In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
hypercall to inform xen that a new pci device has been added.
If we were to inform xen about a new pci bus that is added there are  2 ways
a) Issue the hypercall from drivers/pci/probe.c
b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
segment number (s_bdf), it will return an error
SEG_NO_NOT_FOUND. After that the linux xen code could issue the
PHYSDEVOP_pci_host_bridge_add hypercall.

I think (b) can be done with minimal code changes. What do you think ?

I'm pretty sure (a) would even be refused by the maintainers, unless
there already is a notification being sent. As to (b) - kernel code could
keep track of which segment/bus pairs it informed Xen about, and
hence wouldn't even need to wait for an error to be returned from
the device-add request (which in your proposal would need to be re-
issued after the host-bridge-add).

Have a query on the CFG space address to be passed as hypercall parameter.
The of_pci_get_host_bridge_resource only parses the ranges property and 
not reg.
reg property has the CFG space address, which is usually stored in 
private pci host controller driver structures.


so pci_dev 's parent pci_bus would not have that info.
One way is to add a method in struct pci_ops but not sure it will be 
accepted or not.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Jan Beulich
 On 17.03.15 at 13:06, mja...@caviumnetworks.com wrote:

 On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:
 On 17.03.15 at 06:26, mja...@caviumnetworks.com wrote:
 In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
 hypercall to inform xen that a new pci device has been added.
 If we were to inform xen about a new pci bus that is added there are  2 ways
 a) Issue the hypercall from drivers/pci/probe.c
 b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
 PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
 segment number (s_bdf), it will return an error
 SEG_NO_NOT_FOUND. After that the linux xen code could issue the
 PHYSDEVOP_pci_host_bridge_add hypercall.

 I think (b) can be done with minimal code changes. What do you think ?
 I'm pretty sure (a) would even be refused by the maintainers, unless
 there already is a notification being sent. As to (b) - kernel code could
 keep track of which segment/bus pairs it informed Xen about, and
 hence wouldn't even need to wait for an error to be returned from
 the device-add request (which in your proposal would need to be re-
 issued after the host-bridge-add).
 Have a query on the CFG space address to be passed as hypercall parameter.
 The of_pci_get_host_bridge_resource only parses the ranges property and 
 not reg.
 reg property has the CFG space address, which is usually stored in 
 private pci host controller driver structures.
 
 so pci_dev 's parent pci_bus would not have that info.
 One way is to add a method in struct pci_ops but not sure it will be 
 accepted or not.

I'm afraid I don't understand what you're trying to tell me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Konrad Rzeszutek Wilk
On Tue, Mar 17, 2015 at 10:56:48AM +0530, Manish Jaggi wrote:
 
 On Friday 27 February 2015 10:20 PM, Ian Campbell wrote:
 On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:
 On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
 MMCFG is a Linux config option, not to be confused with
 PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
 think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
 is relevant.
 My (possibly flawed) understanding was that pci_mmcfg_reserved was
 intended to propagate the result of dom0 parsing some firmware table or
 other to the hypevisor.
 That's not flawed at all.
 I think that's a first in this thread ;-)
 
 In Linux dom0 we call it walking pci_mmcfg_list, which looking at
 arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
 over a struct acpi_table_mcfg (there also appears to be a bunch of
 processor family derived entries, which I guess are quirks of some
 sort).
 Right - this parses ACPI tables (plus applies some knowledge about
 certain specific systems/chipsets/CPUs) and verifies that the space
 needed for the MMCFG region is properly reserved either in E820 or
 in the ACPI specified resources (only if so Linux decides to use
 MMCFG and consequently also tells Xen that it may use it).
 Thanks.
 
 So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
 applies as is to Device Tree based ARM devices, including the need for
 the PHYSDEVOP_pci_host_bridge_add call.
 
 On ACPI based devices we will have the MCFG table, and things follow
 much as for x86:
 
* Xen should parse MCFG to discover the PCI host-bridges
* Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
  the same way as Xen/x86 does.
 
 The SBSA, an ARM standard for servers, mandates various things which
 we can rely on here because ACPI on ARM requires an SBSA compliant
 system. So things like odd quirks in PCI controllers or magic setup are
 spec'd out of our zone of caring (into the firmware I suppose), hence
 there is nothing like the DT_DEVICE_START stuff to register specific
 drivers etc.
 
 The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
 systems (any more than it is on x86). We can decide whether to omit it
 from dom0 or ignore it from Xen later on.
 
 (Manish, this is FYI, I don't expect you to implement ACPI support!)
 
 In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
 hypercall to inform xen that a new pci device has been added.
 If we were to inform xen about a new pci bus that is added there are  2 ways
 a) Issue the hypercall from drivers/pci/probe.c
 b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
 PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
 segment number (s_bdf), it will return an error
 SEG_NO_NOT_FOUND. After that the linux xen code could issue the
 PHYSDEVOP_pci_host_bridge_add hypercall.

Couldn't the code figure out from 'struct pci_dev' whether the device
is a bridge or an PCI device? And then do the proper hypercall?

Interesting thing you _might_ hit (that I did) was that if you use
'bus=reassign' which re-assigns the bus numbers during scan - Xen
gets very very confused. As in, the bus devices that Xen sees vs the
ones Linux sees are different.

Whether you will encounter this depends on whether the bridge
devices and pci devices end up having an differnet bus number
from what Xen scanned, and from what Linux has determined.

(As in, Linux has found a bridge device with more PCI devices -so
it repograms the bridge which moves all of the other PCI devices 
below it by X number).

The reason I am bringing it up - it sounds like Xen will have no clue
about some devices - and be told about it by Linux  - if some reason
it has the same bus number as some that Xen already scanned - gah!
 
 I think (b) can be done with minimal code changes. What do you think ?

Less code == better.
 
 Ian.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-16 Thread Manish Jaggi


On Friday 27 February 2015 10:20 PM, Ian Campbell wrote:

On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:

On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:

On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:

MMCFG is a Linux config option, not to be confused with
PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
is relevant.

My (possibly flawed) understanding was that pci_mmcfg_reserved was
intended to propagate the result of dom0 parsing some firmware table or
other to the hypevisor.

That's not flawed at all.

I think that's a first in this thread ;-)


In Linux dom0 we call it walking pci_mmcfg_list, which looking at
arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
over a struct acpi_table_mcfg (there also appears to be a bunch of
processor family derived entries, which I guess are quirks of some
sort).

Right - this parses ACPI tables (plus applies some knowledge about
certain specific systems/chipsets/CPUs) and verifies that the space
needed for the MMCFG region is properly reserved either in E820 or
in the ACPI specified resources (only if so Linux decides to use
MMCFG and consequently also tells Xen that it may use it).

Thanks.

So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
applies as is to Device Tree based ARM devices, including the need for
the PHYSDEVOP_pci_host_bridge_add call.

On ACPI based devices we will have the MCFG table, and things follow
much as for x86:

   * Xen should parse MCFG to discover the PCI host-bridges
   * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
 the same way as Xen/x86 does.

The SBSA, an ARM standard for servers, mandates various things which
we can rely on here because ACPI on ARM requires an SBSA compliant
system. So things like odd quirks in PCI controllers or magic setup are
spec'd out of our zone of caring (into the firmware I suppose), hence
there is nothing like the DT_DEVICE_START stuff to register specific
drivers etc.

The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
systems (any more than it is on x86). We can decide whether to omit it
from dom0 or ignore it from Xen later on.

(Manish, this is FYI, I don't expect you to implement ACPI support!)


In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a 
hypercall to inform xen that a new pci device has been added.

If we were to inform xen about a new pci bus that is added there are  2 ways
a) Issue the hypercall from drivers/pci/probe.c
b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue 
PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that 
segment number (s_bdf), it will return an error
SEG_NO_NOT_FOUND. After that the linux xen code could issue the 
PHYSDEVOP_pci_host_bridge_add hypercall.


I think (b) can be done with minimal code changes. What do you think ?


Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-12 Thread Jan Beulich
 On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 23 Feb 2015, Jan Beulich wrote:
  On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
  On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
   That's the issue we are trying to resolve, with device tree there is no
   explicit segment ID, so we have an essentially unindexed set of PCI
   buses in both Xen and dom0.
  
  How that? What if two bus numbers are equal? There ought to be
  some kind of topology information. Or if all buses are distinct, then
  you don't need a segment number.
  
  It's very possible that I simply don't have the PCI terminology straight
  in my head, leading to me talking nonsense.
  
  I'll explain how I'm using it and perhaps you can put me straight...
  
  My understanding was that a PCI segment equates to a PCI host
  controller, i.e. a specific instance of some PCI host IP on an SoC.
 
 No - there can be multiple roots (i.e. host bridges) on a single
 segment. Segments are - afaict - purely a scalability extension
 allowing to overcome the 256 bus limit.
 
 Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
 stated:
 
 The MCFG table format allows for more than one memory mapped base
 address entry provided each entry (memory mapped configuration space
 base address allocation structure) corresponds to a unique PCI Segment
 Group consisting of 256 PCI buses. Multiple entries corresponding to a
 single PCI Segment Group is not allowed.

For one, what you quote is in no contradiction to what I said. All it
specifies is that there shouldn't be multiple MCFG table entries
specifying the same segment. Whether on any such segment there
is a single host bridge or multiple of them is of no interest here.

And then the present x86 Linux code suggests that there might be
systems actually violating this (the fact that each entry names not
only a segment, but also a bus range also kind of suggests this
despite the wording above); see commit 068258bc15 and its
neighbors - even if it talks about the address ranges coming from
other than ACPI tables, firmware wanting to express them by ACPI
tables would have to violate that rule.

 I think that it is reasonable to expect device tree systems to respect this
 too. 

Not really - as soon as we leave ACPI land, we're free to arrange
things however they suit us best (of course in agreement with other
components involved, like Dom0 in this case), and for that case the
cited Linux commit is a proper reference that it can (and has been)
done differently by system designers.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-12 Thread Ian Campbell
On Wed, 2015-03-11 at 18:26 +, Stefano Stabellini wrote:
 In other words I think that we still need PHYSDEVOP_pci_host_bridge_add
 (http://marc.info/?l=xen-develm=142470392016381) or equivalent, but
 we can drop the bus field from the struct.

I think it makes sense for the struct to contain a similar set of
entries to the MCFG ones, which would give us flexibility in the future
if a) our interpretation of the specs is wrong or b) new specs come
along which say something different (or Linux changes what it does
internally).

IOW I think segment+bus start+bus end is probably the way to go, even if
we think bus will be unused today (which equates to it always being 0).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-12 Thread Stefano Stabellini
On Thu, 12 Mar 2015, Jan Beulich wrote:
  On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 23 Feb 2015, Jan Beulich wrote:
   On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
   On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
That's the issue we are trying to resolve, with device tree there is 
no
explicit segment ID, so we have an essentially unindexed set of PCI
buses in both Xen and dom0.
   
   How that? What if two bus numbers are equal? There ought to be
   some kind of topology information. Or if all buses are distinct, then
   you don't need a segment number.
   
   It's very possible that I simply don't have the PCI terminology straight
   in my head, leading to me talking nonsense.
   
   I'll explain how I'm using it and perhaps you can put me straight...
   
   My understanding was that a PCI segment equates to a PCI host
   controller, i.e. a specific instance of some PCI host IP on an SoC.
  
  No - there can be multiple roots (i.e. host bridges) on a single
  segment. Segments are - afaict - purely a scalability extension
  allowing to overcome the 256 bus limit.
  
  Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
  stated:
  
  The MCFG table format allows for more than one memory mapped base
  address entry provided each entry (memory mapped configuration space
  base address allocation structure) corresponds to a unique PCI Segment
  Group consisting of 256 PCI buses. Multiple entries corresponding to a
  single PCI Segment Group is not allowed.
 
 For one, what you quote is in no contradiction to what I said. All it
 specifies is that there shouldn't be multiple MCFG table entries
 specifying the same segment. Whether on any such segment there
 is a single host bridge or multiple of them is of no interest here.

I thought that we had already established that one host bridge
corresponds to one PCI config memory region, see the last sentence in
http://marc.info/?l=xen-develm=142529695117142.  Did I misunderstand
it?  If a host bridge has a 1:1 relationship with CFG space, then each
MCFG entry would correspond to one host bridge and one segment.

But it looks like that things are more complicated than that.


 And then the present x86 Linux code suggests that there might be
 systems actually violating this (the fact that each entry names not
 only a segment, but also a bus range also kind of suggests this
 despite the wording above); see commit 068258bc15 and its
 neighbors - even if it talks about the address ranges coming from
 other than ACPI tables, firmware wanting to express them by ACPI
 tables would have to violate that rule.

Interesting.


  I think that it is reasonable to expect device tree systems to respect this
  too. 
 
 Not really - as soon as we leave ACPI land, we're free to arrange
 things however they suit us best (of course in agreement with other
 components involved, like Dom0 in this case), and for that case the
 cited Linux commit is a proper reference that it can (and has been)
 done differently by system designers.

OK. It looks like everything should work OK with the hypercall proposed
by Ian anyway.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-12 Thread Jan Beulich
 On 12.03.15 at 11:33, stefano.stabell...@eu.citrix.com wrote:
 On Thu, 12 Mar 2015, Jan Beulich wrote:
  On 11.03.15 at 19:26, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 23 Feb 2015, Jan Beulich wrote:
  No - there can be multiple roots (i.e. host bridges) on a single
  segment. Segments are - afaict - purely a scalability extension
  allowing to overcome the 256 bus limit.
  
  Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
  stated:
  
  The MCFG table format allows for more than one memory mapped base
  address entry provided each entry (memory mapped configuration space
  base address allocation structure) corresponds to a unique PCI Segment
  Group consisting of 256 PCI buses. Multiple entries corresponding to a
  single PCI Segment Group is not allowed.
 
 For one, what you quote is in no contradiction to what I said. All it
 specifies is that there shouldn't be multiple MCFG table entries
 specifying the same segment. Whether on any such segment there
 is a single host bridge or multiple of them is of no interest here.
 
 I thought that we had already established that one host bridge
 corresponds to one PCI config memory region, see the last sentence in
 http://marc.info/?l=xen-develm=142529695117142.  Did I misunderstand
 it?  If a host bridge has a 1:1 relationship with CFG space, then each
 MCFG entry would correspond to one host bridge and one segment.

No, that sentence doesn't imply what you appear to think. Within
the same segment (and, for ACPI's sake within the same MCFG
region) you could have multiple host bridges. And then what calls
itself a host bridge (via class code) may or may not be one - often
there are many devices calling themselves such even on a single
bus (and my prior sentence specifically means to exclude those).
And finally there are systems with their PCI roots expressed only
in ACPI, without any specific PCI device serving as the host bridge.
There it is most obvious that firmware assigns both segment and
bus numbers to its liking.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-11 Thread Stefano Stabellini
On Mon, 23 Feb 2015, Jan Beulich wrote:
  On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
  On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
   That's the issue we are trying to resolve, with device tree there is no
   explicit segment ID, so we have an essentially unindexed set of PCI
   buses in both Xen and dom0.
  
  How that? What if two bus numbers are equal? There ought to be
  some kind of topology information. Or if all buses are distinct, then
  you don't need a segment number.
  
  It's very possible that I simply don't have the PCI terminology straight
  in my head, leading to me talking nonsense.
  
  I'll explain how I'm using it and perhaps you can put me straight...
  
  My understanding was that a PCI segment equates to a PCI host
  controller, i.e. a specific instance of some PCI host IP on an SoC.
 
 No - there can be multiple roots (i.e. host bridges) on a single
 segment. Segments are - afaict - purely a scalability extension
 allowing to overcome the 256 bus limit.

Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
stated:

The MCFG table format allows for more than one memory mapped base
address entry provided each entry (memory mapped configuration space
base address allocation structure) corresponds to a unique PCI Segment
Group consisting of 256 PCI buses. Multiple entries corresponding to a
single PCI Segment Group is not allowed.

I think that it is reasonable to expect device tree systems to respect this
too. 

On ACPI systems the MCFG contains all the info we need, however on
device tree systems the segment number is missing from the pcie node, so
we still need to find a way to agree with Dom0 on which host bridge
corresponds to which segnment.

In other words I think that we still need PHYSDEVOP_pci_host_bridge_add
(http://marc.info/?l=xen-develm=142470392016381) or equivalent, but
we can drop the bus field from the struct.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-03 Thread Manish Jaggi


On Monday 02 March 2015 05:18 PM, Ian Campbell wrote:

On Fri, 2015-02-27 at 17:15 +, Stefano Stabellini wrote:

On Fri, 27 Feb 2015, Ian Campbell wrote:

On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:

On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:

On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:

MMCFG is a Linux config option, not to be confused with
PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
is relevant.

My (possibly flawed) understanding was that pci_mmcfg_reserved was
intended to propagate the result of dom0 parsing some firmware table or
other to the hypevisor.

That's not flawed at all.

I think that's a first in this thread ;-)


In Linux dom0 we call it walking pci_mmcfg_list, which looking at
arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
over a struct acpi_table_mcfg (there also appears to be a bunch of
processor family derived entries, which I guess are quirks of some
sort).

Right - this parses ACPI tables (plus applies some knowledge about
certain specific systems/chipsets/CPUs) and verifies that the space
needed for the MMCFG region is properly reserved either in E820 or
in the ACPI specified resources (only if so Linux decides to use
MMCFG and consequently also tells Xen that it may use it).

Thanks.

So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
applies as is to Device Tree based ARM devices, including the need for
the PHYSDEVOP_pci_host_bridge_add call.

Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
intendend for passing down firmware information to Xen, as the
information that we need is exactly the same, I think it would be
acceptable to use the same hypercall on ARM too.

I strongly disagree, they have very different semantics and overloading
the existing interface would be both wrong and confusing.

It'll also make things harder in the ACPI case where we want to use the
existing hypercall for its original purpose (to propagate MCFG
information to Xen).


I am not hard set on this and the new hypercall is also a viable option.
However If we do introduce a new hypercall as Ian suggested, do we need
to take into account the possibility that an host bridge might have
multiple cfg memory ranges?

I don't believe so, a host bridge has a 1:1 relationship with CFG space.

Ian.


I agree with this flow. It fits into what we have implemented at our end.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-02 Thread Ian Campbell
On Fri, 2015-02-27 at 17:15 +, Stefano Stabellini wrote:
 On Fri, 27 Feb 2015, Ian Campbell wrote:
  On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:
On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
MMCFG is a Linux config option, not to be confused with
PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I 
don't
think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
is relevant.

My (possibly flawed) understanding was that pci_mmcfg_reserved was
intended to propagate the result of dom0 parsing some firmware table or
other to the hypevisor.
   
   That's not flawed at all.
  
  I think that's a first in this thread ;-)
  
In Linux dom0 we call it walking pci_mmcfg_list, which looking at
arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
over a struct acpi_table_mcfg (there also appears to be a bunch of
processor family derived entries, which I guess are quirks of some
sort).
   
   Right - this parses ACPI tables (plus applies some knowledge about
   certain specific systems/chipsets/CPUs) and verifies that the space
   needed for the MMCFG region is properly reserved either in E820 or
   in the ACPI specified resources (only if so Linux decides to use
   MMCFG and consequently also tells Xen that it may use it).
  
  Thanks.
  
  So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
  applies as is to Device Tree based ARM devices, including the need for
  the PHYSDEVOP_pci_host_bridge_add call.
 
 Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
 intendend for passing down firmware information to Xen, as the
 information that we need is exactly the same, I think it would be
 acceptable to use the same hypercall on ARM too.

I strongly disagree, they have very different semantics and overloading
the existing interface would be both wrong and confusing.

It'll also make things harder in the ACPI case where we want to use the
existing hypercall for its original purpose (to propagate MCFG
information to Xen).

 I am not hard set on this and the new hypercall is also a viable option.
 However If we do introduce a new hypercall as Ian suggested, do we need
 to take into account the possibility that an host bridge might have
 multiple cfg memory ranges?

I don't believe so, a host bridge has a 1:1 relationship with CFG space.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Jan Beulich
 On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
 MMCFG is a Linux config option, not to be confused with
 PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
 think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
 is relevant.
 
 My (possibly flawed) understanding was that pci_mmcfg_reserved was
 intended to propagate the result of dom0 parsing some firmware table or
 other to the hypevisor.

That's not flawed at all.

 In Linux dom0 we call it walking pci_mmcfg_list, which looking at
 arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
 over a struct acpi_table_mcfg (there also appears to be a bunch of
 processor family derived entries, which I guess are quirks of some
 sort).

Right - this parses ACPI tables (plus applies some knowledge about
certain specific systems/chipsets/CPUs) and verifies that the space
needed for the MMCFG region is properly reserved either in E820 or
in the ACPI specified resources (only if so Linux decides to use
MMCFG and consequently also tells Xen that it may use it).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Stefano Stabellini
On Fri, 27 Feb 2015, Ian Campbell wrote:
 On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:
   On 27.02.15 at 16:24, ian.campb...@citrix.com wrote:
   On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
   MMCFG is a Linux config option, not to be confused with
   PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
   think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
   is relevant.
   
   My (possibly flawed) understanding was that pci_mmcfg_reserved was
   intended to propagate the result of dom0 parsing some firmware table or
   other to the hypevisor.
  
  That's not flawed at all.
 
 I think that's a first in this thread ;-)
 
   In Linux dom0 we call it walking pci_mmcfg_list, which looking at
   arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
   over a struct acpi_table_mcfg (there also appears to be a bunch of
   processor family derived entries, which I guess are quirks of some
   sort).
  
  Right - this parses ACPI tables (plus applies some knowledge about
  certain specific systems/chipsets/CPUs) and verifies that the space
  needed for the MMCFG region is properly reserved either in E820 or
  in the ACPI specified resources (only if so Linux decides to use
  MMCFG and consequently also tells Xen that it may use it).
 
 Thanks.
 
 So I think what I wrote in 1424948710.14641.25.ca...@citrix.com
 applies as is to Device Tree based ARM devices, including the need for
 the PHYSDEVOP_pci_host_bridge_add call.

Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
intendend for passing down firmware information to Xen, as the
information that we need is exactly the same, I think it would be
acceptable to use the same hypercall on ARM too.

I am not hard set on this and the new hypercall is also a viable option.
However If we do introduce a new hypercall as Ian suggested, do we need
to take into account the possibility that an host bridge might have
multiple cfg memory ranges?


 On ACPI based devices we will have the MCFG table, and things follow
 much as for x86:
 
   * Xen should parse MCFG to discover the PCI host-bridges
   * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
 the same way as Xen/x86 does.
 
 The SBSA, an ARM standard for servers, mandates various things which
 we can rely on here because ACPI on ARM requires an SBSA compliant
 system. So things like odd quirks in PCI controllers or magic setup are
 spec'd out of our zone of caring (into the firmware I suppose), hence
 there is nothing like the DT_DEVICE_START stuff to register specific
 drivers etc.
 
 The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
 systems (any more than it is on x86). We can decide whether to omit it
 from dom0 or ignore it from Xen later on.
 
 (Manish, this is FYI, I don't expect you to implement ACPI support!)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Ian Campbell
On Fri, 2015-02-27 at 14:33 +, Stefano Stabellini wrote:
 On Thu, 26 Feb 2015, Ian Campbell wrote:
  On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
   Have you reached a conclusion?
  
  My current thinking on how PCI for Xen on ARM should look is thus:
  
  xen/arch/arm/pci.c:
  
  New file, containing core PCI infrastructure for ARM. Includes:
  
  pci_hostbridge_register(), which registers a host bridge:
  
  Registration includes:
  DT node pointer
  
  CFG space address
  
  pci_hostbridge_ops function table, which
  contains e.g. cfg space read/write ops, perhaps
  other stuff).
  
  Function for setting the (segment,bus) for a given host bridge.
  Lets say pci_hostbridge_setup(), the host bridge must have been
  previously registered. Looks up the host bridge via CFG space
  address and maps that to (segment,bus).
  
  Functions for looking up host bridges by various keys as needed
  (cfg base address, DT node, etc)
  
  pci_init() function, called from somewhere appropriate in
  setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
  gic_init() for the shape of this)
  
  Any other common helper functions for managing PCI devices, e.g.
  for implementing PHYSDEVOP_*, which cannot be made properly
  common (i.e. shared with x86).
  
  xen/drivers/pci/host-*.c (or pci/host/*.c):
  
  New files, one per supported PCI controller IP block. Each
  should use the normal DT_DEVICE infrastructure for probing,
  i.e.:
  DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST)
  
  Probe function should call pci_hostbridge_register() for each
  host bridge which the controller exposes.
  
  xen/arch/arm/physdev.c:
  
  Implements do_physdev_op handling PHYSDEVOP_*. Includes:
  
  New hypercall subop PHYSDEVOP_pci_host_bridge_add:
  
  As per 1424703761.27930.140.ca...@citrix.com which
  calls pci_hostbridge_setup() to map the (segment,bus) to
  a specific pci_hostbridge_ops (i.e. must have previously
  been registered with pci_hostbridge_register(), else
  error).
 
 I think that the new hypercall is unnecessary. We know the MMCFG address
 ranges belonging to a given host bridge from DT and
 PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
 a specific MMCFG.

My understanding from discussion with Jan was that this is not what this
hypercall does, or at least that this would be an abuse of the existing
interface. See: 54e75d87027800062...@mail.emea.novell.com

Anyway, what happens for when there is no MMCFG table to drive dom0's
calls to pci_mmcfg_reserved? Or a given host-bridge doesn't have special
flags and so isn't mentioned there.

I think a dedicated hypercall is better.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Ian Campbell wrote:
 On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
  Have you reached a conclusion?
 
 My current thinking on how PCI for Xen on ARM should look is thus:
 
 xen/arch/arm/pci.c:
 
 New file, containing core PCI infrastructure for ARM. Includes:
 
 pci_hostbridge_register(), which registers a host bridge:
 
 Registration includes:
 DT node pointer
 
 CFG space address
 
 pci_hostbridge_ops function table, which
 contains e.g. cfg space read/write ops, perhaps
 other stuff).
 
 Function for setting the (segment,bus) for a given host bridge.
 Lets say pci_hostbridge_setup(), the host bridge must have been
 previously registered. Looks up the host bridge via CFG space
 address and maps that to (segment,bus).
 
 Functions for looking up host bridges by various keys as needed
 (cfg base address, DT node, etc)
 
 pci_init() function, called from somewhere appropriate in
 setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
 gic_init() for the shape of this)
 
 Any other common helper functions for managing PCI devices, e.g.
 for implementing PHYSDEVOP_*, which cannot be made properly
 common (i.e. shared with x86).
 
 xen/drivers/pci/host-*.c (or pci/host/*.c):
 
 New files, one per supported PCI controller IP block. Each
 should use the normal DT_DEVICE infrastructure for probing,
 i.e.:
 DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST)
 
 Probe function should call pci_hostbridge_register() for each
 host bridge which the controller exposes.
 
 xen/arch/arm/physdev.c:
 
 Implements do_physdev_op handling PHYSDEVOP_*. Includes:
 
 New hypercall subop PHYSDEVOP_pci_host_bridge_add:
 
 As per 1424703761.27930.140.ca...@citrix.com which
 calls pci_hostbridge_setup() to map the (segment,bus) to
 a specific pci_hostbridge_ops (i.e. must have previously
 been registered with pci_hostbridge_register(), else
 error).

I think that the new hypercall is unnecessary. We know the MMCFG address
ranges belonging to a given host bridge from DT and
PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
a specific MMCFG. We don't need anything else: we can simply match the
host bridge based on the MMCFG address that dom0 tells us via
PHYSDEVOP_pci_mmcfg_reserved with the addresses on DT.

But we do need to support PHYSDEVOP_pci_mmcfg_reserved on ARM.


 PHYSDEVOP_pci_device_add/remove: Implement existing hypercall
 interface used by x86 for ARM.
 
 This requires that PHYSDEVOP_pci_host_bridge_add has
 been called for the (segment,bus) which it refers to,
 otherwise error.
 
 Looks up the host bridge and does whatever setup is
 required plus e.g. calling of pci_add_device().
 
 No doubt various other existing interfaces will need wiring up, e.g.
 pci_conf_{read,write}* should lookup the host bridge ops struct and call
 the associated method.
 
 I'm sure the above must be incomplete, but I hope the general shape
 makes sense?
 
I think it makes sense and it is along the lines of what I was thinking
too.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Ian Campbell
On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote:
 Hi Julien,
 
 On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote:
  On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
  Hi
 
  Hi Pranavkumar,
 
  Also if we just show only one vITS (or only one Virtual v2m frame)
  instead of two vITS
  then actual hardware interrupt number and virtual interrupt number
  which guest will see will become different
  This will hamper direct irq routing to guest.
 
  The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.
 
 Yes, but in case of GICv2m( I am not sure about ITS) in register
 MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
 generate an interrupt.
 If you write virq which is different that pirq (associated with the
 actual GICv2m frame ) then it will not trigger any interrupt.
 
 Now there is case which I am not sure how it can be solvable with one
 vITS/vGICv2m  -
 
 . Suppose we have two GICv2m frames and say oneis  having an address
 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
 MSI_SETSPI_NS register
 . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
 second has 0x80-0x88 associated.
 . Now there are two PCIe hosts, first using first GICv2m frame as a
 MSI parent and another using second frame.
 . Device on first host uses MSI_SETSPI_NS (0x1000) address along with
 a data (i.e. intr number say 0x64) and device on second host uses
 0x2000 and data 0x80
 
 Now if we show one vGICv2m frame in guest for both the devices then
 what address I will program in each device's config space for MSI and
 also what will the data value.
 Secondly device's write for these addresses will be transparent to cpu
 so how can we trap them while device wants to trigger any interrupt ?

 Please correct me if I misunderstood anything.

Is what you are suggesting a v2m specific issue?

I thought the whole point of the ITS stuff in GICv3 was that one could
program such virt-phys mappings into the hardware ITS and it would do
the translation (the T in ITS) such that the host got the pIRQ it was
expecting when the guest wrote the virtualised vIRQ information to the
device.

Caveat: If I've read the ITS bits of that doc at any point it was long
ago and I've forgotten everything I knew about it... And I've never read
anything about v2m at all ;-)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-27 Thread Pranavkumar Sawargaonkar
Hi Julien,

On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall julien.gr...@linaro.org wrote:
 On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
 Hi

 Hi Pranavkumar,

 Also if we just show only one vITS (or only one Virtual v2m frame)
 instead of two vITS
 then actual hardware interrupt number and virtual interrupt number
 which guest will see will become different
 This will hamper direct irq routing to guest.

 The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.

Yes, but in case of GICv2m( I am not sure about ITS) in register
MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
generate an interrupt.
If you write virq which is different that pirq (associated with the
actual GICv2m frame ) then it will not trigger any interrupt.

Now there is case which I am not sure how it can be solvable with one
vITS/vGICv2m  -

. Suppose we have two GICv2m frames and say oneis  having an address
0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
MSI_SETSPI_NS register
. Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
second has 0x80-0x88 associated.
. Now there are two PCIe hosts, first using first GICv2m frame as a
MSI parent and another using second frame.
. Device on first host uses MSI_SETSPI_NS (0x1000) address along with
a data (i.e. intr number say 0x64) and device on second host uses
0x2000 and data 0x80

Now if we show one vGICv2m frame in guest for both the devices then
what address I will program in each device's config space for MSI and
also what will the data value.
Secondly device's write for these addresses will be transparent to cpu
so how can we trap them while device wants to trigger any interrupt ?
Please correct me if I misunderstood anything.

Thanks,
Pranav




 I have a patch which allow virq != pirq:

 https://patches.linaro.org/43012/

 Regards,

 --
 Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Julien Grall
On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
 Hi

Hi Pranavkumar,

 Also if we just show only one vITS (or only one Virtual v2m frame)
 instead of two vITS
 then actual hardware interrupt number and virtual interrupt number
 which guest will see will become different
 This will hamper direct irq routing to guest.

The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.

I have a patch which allow virq != pirq:

https://patches.linaro.org/43012/

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Pranavkumar Sawargaonkar
Hi

On Thu, Feb 26, 2015 at 4:19 PM, Vijay Kilari vijay.kil...@gmail.com wrote:
 On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
 On 24/02/15 7:13 pm, Julien Grall wrote:
  On 24/02/15 00:23, Manish Jaggi wrote:
  Because you have to parse all the device tree to remove the reference
  to the second ITS. It's pointless and can be difficult to do it.
 
  Could you please describe the case where it is difficult
  You have to parse every node in the device tree and replace the
  msi-parent properties with only one ITS.
 Thats the idea.
 
  If you are able to emulate on ITS, you can do it for multiple one.
  keeping it simple and similar across dom0/domUs
  Consider a case where a domU is assigned two PCI devices which are
  attached to different nodes. (Node is an entity having its own cores are
  host controllers).
  The DOM0 view and guest view of the hardware are different.
 
  In the case of DOM0, we want to expose the same hardware layout as the
  host. So if there is 2 ITS then we should expose the 2 ITS.
 AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
 handled by xen and a virtualized interface is provided to the guest. So
 if none of SMMU in the layout of host is presented to dom0 the same can
 be valid for multiple ITS.

 SMMU is one of the things which Xen hides from dom0, for obvious
 reasons.

 Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
 reason for ITS to differ here.

 Since dom0 needs to be able to cope with being able to see all of the
 host host I/O devices (in the default no-passthrough case), it is
 possible, if not likely, that it will need the same amount of ITS
 resources (i.e. numbers of LPIs) as the host provides.

  In the case of the Guest, we (Xen) controls the memory layout.
 For Dom0 as well.

 Not true.

 For dom0 the memory layout is determined by the host memory layout. The
 MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
 regions of the host physical address space (often in 1:1, but with
 sufficient h/w support this need not be the case).

  Therefore
  we can expose only one ITS.
 If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
 GIC ITS emulation driver to work.
 It should check that request came from a dom0 handle it differently. I
 think this would be *difficult*.

 I don't think so. If the vITS is written to handle multiple instances
 (i.e. in a modular way, as it should be) then it shouldn't matter
 whether any given domain has 1 or many vITS. The fact that dom0 may have
 one or more and domU only (currently) has one then becomes largely
 uninteresting.

 I have few queries

 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
 command Q is
 mapped to which Physical ITS command Q.
 In case of linux, the ITS node is added as msi chip to pci using
 of_pci_msi_chip_add()
 and from pci_dev structure we can know which ITS to use.

 But in case of Xen, when ITS command is trapped we have only
 dev_id info from ITS command.

 2) If DomU is always given one virtual ITS node. If DomU is assinged
 with two different
 PCI devices connected to different physical ITS, then Xen vITS
 driver should know how to map
 PCI device to physical ITS

 For the two issues above, Xen should have mapping to pci segment and
 physical ITS node to use
 which can be queried by vITS driver and send command on to correct physical 
 ITS


Also if we just show only one vITS (or only one Virtual v2m frame)
instead of two vITS
then actual hardware interrupt number and virtual interrupt number
which guest will see will become different
This will hamper direct irq routing to guest.

-
Pranav



 Vijay

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Ian Campbell
On Thu, 2015-02-26 at 16:19 +0530, Vijay Kilari wrote:
 On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote:
  On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
  On 24/02/15 7:13 pm, Julien Grall wrote:
   On 24/02/15 00:23, Manish Jaggi wrote:
   Because you have to parse all the device tree to remove the reference
   to the second ITS. It's pointless and can be difficult to do it.
  
   Could you please describe the case where it is difficult
   You have to parse every node in the device tree and replace the
   msi-parent properties with only one ITS.
  Thats the idea.
  
   If you are able to emulate on ITS, you can do it for multiple one.
   keeping it simple and similar across dom0/domUs
   Consider a case where a domU is assigned two PCI devices which are
   attached to different nodes. (Node is an entity having its own cores are
   host controllers).
   The DOM0 view and guest view of the hardware are different.
  
   In the case of DOM0, we want to expose the same hardware layout as the
   host. So if there is 2 ITS then we should expose the 2 ITS.
  AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
  handled by xen and a virtualized interface is provided to the guest. So
  if none of SMMU in the layout of host is presented to dom0 the same can
  be valid for multiple ITS.
 
  SMMU is one of the things which Xen hides from dom0, for obvious
  reasons.
 
  Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
  reason for ITS to differ here.
 
  Since dom0 needs to be able to cope with being able to see all of the
  host host I/O devices (in the default no-passthrough case), it is
  possible, if not likely, that it will need the same amount of ITS
  resources (i.e. numbers of LPIs) as the host provides.
 
   In the case of the Guest, we (Xen) controls the memory layout.
  For Dom0 as well.
 
  Not true.
 
  For dom0 the memory layout is determined by the host memory layout. The
  MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
  regions of the host physical address space (often in 1:1, but with
  sufficient h/w support this need not be the case).
 
   Therefore
   we can expose only one ITS.
  If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
  GIC ITS emulation driver to work.
  It should check that request came from a dom0 handle it differently. I
  think this would be *difficult*.
 
  I don't think so. If the vITS is written to handle multiple instances
  (i.e. in a modular way, as it should be) then it shouldn't matter
  whether any given domain has 1 or many vITS. The fact that dom0 may have
  one or more and domU only (currently) has one then becomes largely
  uninteresting.
 
 I have few queries
 
 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
 command Q is
 mapped to which Physical ITS command Q.
 In case of linux, the ITS node is added as msi chip to pci using
 of_pci_msi_chip_add()
 and from pci_dev structure we can know which ITS to use.
 
 But in case of Xen, when ITS command is trapped we have only
 dev_id info from ITS command.

With the proper PCI infrastructure in place we can map the vdev_id to a
pdev_id, and from there to our own struct pci_dev 

The mapping from pdev_id to pci_dev is based on the
PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I
described just now in my mail to Manish in this thread (specifically
pci_device_add creates and registers struct pci_dev I think).

 
 2) If DomU is always given one virtual ITS node. If DomU is assinged
 with two different
 PCI devices connected to different physical ITS, then Xen vITS
 driver should know how to map
 PCI device to physical ITS

Correct, I think that all falls out from the proper tracking of the
vdev_id to pdev_id and from vits to pits for a given domain and the
management/tracking of the struct pci_dev.

Ian.

 For the two issues above, Xen should have mapping to pci segment and
 physical ITS node to use
 which can be queried by vITS driver and send command on to correct physical 
 ITS
 
 Vijay



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Julien Grall
Hi Ian,

On 26/02/15 11:12, Ian Campbell wrote:
 I have few queries

 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
 command Q is
 mapped to which Physical ITS command Q.
 In case of linux, the ITS node is added as msi chip to pci using
 of_pci_msi_chip_add()
 and from pci_dev structure we can know which ITS to use.

 But in case of Xen, when ITS command is trapped we have only
 dev_id info from ITS command.
 
 With the proper PCI infrastructure in place we can map the vdev_id to a
 pdev_id, and from there to our own struct pci_dev 
 
 The mapping from pdev_id to pci_dev is based on the
 PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I
 described just now in my mail to Manish in this thread (specifically
 pci_device_add creates and registers struct pci_dev I think).

We may need an hypercall to map the dev_id to a vdev_id.

IIRC, Vijay and Manish was already planned to add one.

 

 2) If DomU is always given one virtual ITS node. If DomU is assinged
 with two different
 PCI devices connected to different physical ITS, then Xen vITS
 driver should know how to map
 PCI device to physical ITS
 
 Correct, I think that all falls out from the proper tracking of the
 vdev_id to pdev_id and from vits to pits for a given domain and the
 management/tracking of the struct pci_dev.

I think this is the right way to go. Though I haven't read the ITS spec
closely.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Vijay Kilari
On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
 On 24/02/15 7:13 pm, Julien Grall wrote:
  On 24/02/15 00:23, Manish Jaggi wrote:
  Because you have to parse all the device tree to remove the reference
  to the second ITS. It's pointless and can be difficult to do it.
 
  Could you please describe the case where it is difficult
  You have to parse every node in the device tree and replace the
  msi-parent properties with only one ITS.
 Thats the idea.
 
  If you are able to emulate on ITS, you can do it for multiple one.
  keeping it simple and similar across dom0/domUs
  Consider a case where a domU is assigned two PCI devices which are
  attached to different nodes. (Node is an entity having its own cores are
  host controllers).
  The DOM0 view and guest view of the hardware are different.
 
  In the case of DOM0, we want to expose the same hardware layout as the
  host. So if there is 2 ITS then we should expose the 2 ITS.
 AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
 handled by xen and a virtualized interface is provided to the guest. So
 if none of SMMU in the layout of host is presented to dom0 the same can
 be valid for multiple ITS.

 SMMU is one of the things which Xen hides from dom0, for obvious
 reasons.

 Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
 reason for ITS to differ here.

 Since dom0 needs to be able to cope with being able to see all of the
 host host I/O devices (in the default no-passthrough case), it is
 possible, if not likely, that it will need the same amount of ITS
 resources (i.e. numbers of LPIs) as the host provides.

  In the case of the Guest, we (Xen) controls the memory layout.
 For Dom0 as well.

 Not true.

 For dom0 the memory layout is determined by the host memory layout. The
 MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
 regions of the host physical address space (often in 1:1, but with
 sufficient h/w support this need not be the case).

  Therefore
  we can expose only one ITS.
 If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
 GIC ITS emulation driver to work.
 It should check that request came from a dom0 handle it differently. I
 think this would be *difficult*.

 I don't think so. If the vITS is written to handle multiple instances
 (i.e. in a modular way, as it should be) then it shouldn't matter
 whether any given domain has 1 or many vITS. The fact that dom0 may have
 one or more and domU only (currently) has one then becomes largely
 uninteresting.

I have few queries

1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
command Q is
mapped to which Physical ITS command Q.
In case of linux, the ITS node is added as msi chip to pci using
of_pci_msi_chip_add()
and from pci_dev structure we can know which ITS to use.

But in case of Xen, when ITS command is trapped we have only
dev_id info from ITS command.

2) If DomU is always given one virtual ITS node. If DomU is assinged
with two different
PCI devices connected to different physical ITS, then Xen vITS
driver should know how to map
PCI device to physical ITS

For the two issues above, Xen should have mapping to pci segment and
physical ITS node to use
which can be queried by vITS driver and send command on to correct physical ITS

Vijay

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Manish Jaggi


On Monday 23 February 2015 09:50 PM, Jan Beulich wrote:

On 23.02.15 at 16:46, ian.campb...@citrix.com wrote:

On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote:

On 23.02.15 at 16:02, ian.campb...@citrix.com wrote:

Is the reason for the scan being of segment 0 only is that it is the one
which lives at the legacy PCI CFG addresses (or those magic I/O ports)?

Right - ideally we would scan all segments, but we need Dom0 to
tell us which MMCFG regions are safe to access,

Is this done via PHYSDEVOP_pci_mmcfg_reserved?

Yes.


  and hence can't
do that scan at boot time. But we also won't get away without
scanning, as we need to set up the IOMMU(s) to at least cover
the devices used for booting the system.

Which hopefully are all segment 0 or aren't needed until after dom0
tells Xen about them I suppose.

Right. With EFI one may be able to overcome this one day, but the
legacy BIOS doesn't even surface mechanisms (software interrupts)
to access devices outside of segment 0.


  (All devices on segment zero are supposed to
be accessible via config space access method 1.)

Is that the legacy   or magic ... again?

Yes (just that there are two of them).

Ian/Jan,
Have you reached a conclusion?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-26 Thread Ian Campbell
On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
 Have you reached a conclusion?

My current thinking on how PCI for Xen on ARM should look is thus:

xen/arch/arm/pci.c:

New file, containing core PCI infrastructure for ARM. Includes:

pci_hostbridge_register(), which registers a host bridge:

Registration includes:
DT node pointer

CFG space address

pci_hostbridge_ops function table, which
contains e.g. cfg space read/write ops, perhaps
other stuff).

Function for setting the (segment,bus) for a given host bridge.
Lets say pci_hostbridge_setup(), the host bridge must have been
previously registered. Looks up the host bridge via CFG space
address and maps that to (segment,bus).

Functions for looking up host bridges by various keys as needed
(cfg base address, DT node, etc)

pci_init() function, called from somewhere appropriate in
setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
gic_init() for the shape of this)

Any other common helper functions for managing PCI devices, e.g.
for implementing PHYSDEVOP_*, which cannot be made properly
common (i.e. shared with x86).

xen/drivers/pci/host-*.c (or pci/host/*.c):

New files, one per supported PCI controller IP block. Each
should use the normal DT_DEVICE infrastructure for probing,
i.e.:
DT_DEVICE_START(foo, FOO, DEVICE_PCIHOST)

Probe function should call pci_hostbridge_register() for each
host bridge which the controller exposes.

xen/arch/arm/physdev.c:

Implements do_physdev_op handling PHYSDEVOP_*. Includes:

New hypercall subop PHYSDEVOP_pci_host_bridge_add:

As per 1424703761.27930.140.ca...@citrix.com which
calls pci_hostbridge_setup() to map the (segment,bus) to
a specific pci_hostbridge_ops (i.e. must have previously
been registered with pci_hostbridge_register(), else
error).

PHYSDEVOP_pci_device_add/remove: Implement existing hypercall
interface used by x86 for ARM.

This requires that PHYSDEVOP_pci_host_bridge_add has
been called for the (segment,bus) which it refers to,
otherwise error.

Looks up the host bridge and does whatever setup is
required plus e.g. calling of pci_add_device().

No doubt various other existing interfaces will need wiring up, e.g.
pci_conf_{read,write}* should lookup the host bridge ops struct and call
the associated method.

I'm sure the above must be incomplete, but I hope the general shape
makes sense?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-25 Thread Ian Campbell
On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
 On 24/02/15 7:13 pm, Julien Grall wrote:
  On 24/02/15 00:23, Manish Jaggi wrote:
  Because you have to parse all the device tree to remove the reference
  to the second ITS. It's pointless and can be difficult to do it.
 
  Could you please describe the case where it is difficult
  You have to parse every node in the device tree and replace the
  msi-parent properties with only one ITS.
 Thats the idea.
 
  If you are able to emulate on ITS, you can do it for multiple one.
  keeping it simple and similar across dom0/domUs
  Consider a case where a domU is assigned two PCI devices which are
  attached to different nodes. (Node is an entity having its own cores are
  host controllers).
  The DOM0 view and guest view of the hardware are different.
 
  In the case of DOM0, we want to expose the same hardware layout as the
  host. So if there is 2 ITS then we should expose the 2 ITS.
 AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are 
 handled by xen and a virtualized interface is provided to the guest. So 
 if none of SMMU in the layout of host is presented to dom0 the same can 
 be valid for multiple ITS.

SMMU is one of the things which Xen hides from dom0, for obvious
reasons.

Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
reason for ITS to differ here.

Since dom0 needs to be able to cope with being able to see all of the
host host I/O devices (in the default no-passthrough case), it is
possible, if not likely, that it will need the same amount of ITS
resources (i.e. numbers of LPIs) as the host provides.

  In the case of the Guest, we (Xen) controls the memory layout.
 For Dom0 as well.

Not true.

For dom0 the memory layout is determined by the host memory layout. The
MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
regions of the host physical address space (often in 1:1, but with
sufficient h/w support this need not be the case).

  Therefore
  we can expose only one ITS.
 If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen 
 GIC ITS emulation driver to work.
 It should check that request came from a dom0 handle it differently. I 
 think this would be *difficult*.

I don't think so. If the vITS is written to handle multiple instances
(i.e. in a modular way, as it should be) then it shouldn't matter
whether any given domain has 1 or many vITS. The fact that dom0 may have
one or more and domU only (currently) has one then becomes largely
uninteresting.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-24 Thread Manish Jaggi


On 24/02/15 7:13 pm, Julien Grall wrote:

On 24/02/15 00:23, Manish Jaggi wrote:

Because you have to parse all the device tree to remove the reference
to the second ITS. It's pointless and can be difficult to do it.


Could you please describe the case where it is difficult

You have to parse every node in the device tree and replace the
msi-parent properties with only one ITS.

Thats the idea.



If you are able to emulate on ITS, you can do it for multiple one.

keeping it simple and similar across dom0/domUs
Consider a case where a domU is assigned two PCI devices which are
attached to different nodes. (Node is an entity having its own cores are
host controllers).

The DOM0 view and guest view of the hardware are different.

In the case of DOM0, we want to expose the same hardware layout as the
host. So if there is 2 ITS then we should expose the 2 ITS.
AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are 
handled by xen and a virtualized interface is provided to the guest. So 
if none of SMMU in the layout of host is presented to dom0 the same can 
be valid for multiple ITS.


In the case of the Guest, we (Xen) controls the memory layout.

For Dom0 as well.

Therefore
we can expose only one ITS.
If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen 
GIC ITS emulation driver to work.
It should check that request came from a dom0 handle it differently. I 
think this would be *difficult*.



IHMO, any ITS trap before this is wrong.

AFAIK guest always sees a virtual ITS, could you please explain what is
wrong in trapping.

I never say the trapping is wrong in all case The before was
here for any trap before the PCI has been added to Xen is, IHMO, wrong.


There is no trap before.

So I still don't understand why you need to parse the device tree node
for PCI device at boot time...

If it doesn't trap before, you should not need to know the PCI.

Regards,




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Manish Jaggi


On 20/02/15 8:09 pm, Ian Campbell wrote:

On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:

Another option might be a new hypercall (assuming one doesn't already
exist) to register a PCI bus which would take e.g. the PCI CFG base
address and return a new u16 segment id to be used for all subsequent
PCI related calls. This would require the dom0 OS to hook its
pci_bus_add function, which might be doable (more doable than handling
xen_segment_id DT properties I think).

This seems ok, i will try it out.

I recommend you let this subthread (e.g. the conversation with Jan)
settle upon a preferred course of action before implementing any one
suggestion.
Ian we have also to consider for NUMA / multi node where there are two 
or more its nodes.

pci0{

msi-parent = its0;
}

pci1{

msi-parent = its1;
}

This requires parsing pci nodes in xen and create a mapping between pci 
nodes and its. Xe would need to be aware of PCI nodes in device tree 
prior to dom0 sending a hypercall. Adding a property to pci node in 
device tree should be a good approach.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Julien Grall



On 23/02/2015 10:59, Manish Jaggi wrote:


On 20/02/15 8:09 pm, Ian Campbell wrote:

On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:

Another option might be a new hypercall (assuming one doesn't already
exist) to register a PCI bus which would take e.g. the PCI CFG base
address and return a new u16 segment id to be used for all subsequent
PCI related calls. This would require the dom0 OS to hook its
pci_bus_add function, which might be doable (more doable than handling
xen_segment_id DT properties I think).

This seems ok, i will try it out.

I recommend you let this subthread (e.g. the conversation with Jan)
settle upon a preferred course of action before implementing any one
suggestion.

Ian we have also to consider for NUMA / multi node where there are two
or more its nodes.
pci0{

msi-parent = its0;
}

pci1{

msi-parent = its1;
}

This requires parsing pci nodes in xen and create a mapping between pci
nodes and its. Xe would need to be aware of PCI nodes in device tree
prior to dom0 sending a hypercall. Adding a property to pci node in
device tree should be a good approach.


Why do you need it early? Wouldn't be sufficient to retrieve those 
information when the hypercall pci_device_add is called?


What about ACPI case? Does everything necessary live in static table?

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Jan Beulich
 On 23.02.15 at 13:45, ian.campb...@citrix.com wrote:
 On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote:
  On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
  On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
   That's the issue we are trying to resolve, with device tree there is no
   explicit segment ID, so we have an essentially unindexed set of PCI
   buses in both Xen and dom0.
  
  How that? What if two bus numbers are equal? There ought to be
  some kind of topology information. Or if all buses are distinct, then
  you don't need a segment number.
  
  It's very possible that I simply don't have the PCI terminology straight
  in my head, leading to me talking nonsense.
  
  I'll explain how I'm using it and perhaps you can put me straight...
  
  My understanding was that a PCI segment equates to a PCI host
  controller, i.e. a specific instance of some PCI host IP on an SoC.
 
 No - there can be multiple roots (i.e. host bridges)
 
 Where a host bridge == what I've been calling PCI host controller?

Some problems here may originate in this naming: I'm not aware of
anything named host controller in PCI. The root of a PCI hierarchy
(single or multiple buses) connects to the system bus via a host
bridge. Whether one or more of them sit in a single chip is of no
interest (on the x86 and ia64 sides at least).

 I suppose in principal a single controller might expose multiple host
 bridges, but I think we can logically treat such things as being
 multiple controllers (e.g. with multiple CFG spaces etc).

Perhaps.

  on a single
 segment. Segments are - afaict - purely a scalability extension
 allowing to overcome the 256 bus limit.
 
 Is the converse true -- i.e. can a single host bridge span multiple
 segments?

Not that I'm aware of.

 IOW is the mapping from segment-host bridge many-one or
 many-many?

Each segment may have multiple host bridges, each host bridge
connect devices on multiple buses. Any such hierarchy is entirely
separate from any other such hierarchy (both physically and in
terms of the SBDFs used to identify them).

 Maybe what I should read into what you are saying is that segments are
 purely a software and/or firmware concept with no real basis in the
 hardware?

Right - they just represent separation in hardware, but they have
no real equivalent there.

 In which case might we be at liberty to specify that on ARM+Device Tree
 systems (i.e. those where the f/w tables don't give an enumeration)
 there is a 1:1 mapping from segments to host bridges?

This again can only be answered knowing how bus number
assignment gets done on ARM (see also below). If all bus numbers
are distinct, there's no need for using segments numbers other
then zero. In the end, if you used segment numbers now, you may
end up closing the path to using them for much larger future setups.
But if that's not seen as a problem then yes, I think you could go
that route.

  So given a system with two PCI host controllers we end up with two
  segments (lets say A and B, but choosing those is the topic of this
  thread) and it is acceptable for both to contain a bus 0 with a device 1
  on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
  
  It sounds like you are saying that this is not actually acceptable and
  that 0:0.0 must be unique in the system irrespective of the associated
  segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?
 
 No, there can be multiple buses numbered zero. And at the same
 time a root bus doesn't need to be bus zero on its segment.
 
 0:0.0 was just an example I pulled out of thin air, it wasn't supposed
 to imply some special property of bus 0 e.g. being the root or anything
 like that.
 
 If there are multiple buses numbered 0 then are they distinguished via
 segment or something else?

Just by segment.

 What I don't get from this is where the BDF is being represented.
 
 It isn't, since this is representing the host controller not any given
 PCI devices which it contains.
 
 I thought in general BDFs were probed (or even configured) by the
 firmware and/or OS by walking over the CFG space and so aren't
 necessarily described anywhere in the firmware tables.

They're effectively getting assigned by firmware, yes. This mainly
affects bridges, which get stored (in their config space) the
secondary and subordinate bus numbers (bounding the bus range
they're responsible for when it comes to routing requests). If on
ARM firmware doesn't assign bus numbers, is bridge setup then a
job of the OS?

 FWIW the first 4 bytes in each line of interrupt-map are actually
 somehow matched against the masked (via interrupt-map-mask) against an
 encoding of the BDF to give the INTx routing, but BDFs aren't
 represented in the sense I think you meant in the example above.
 
 There is a capability to have child nodes of this root controller node
 which describe individual devices, and there is an encoding for the BDF
 in there, but these are not required. For reference I've 

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Ian Campbell
On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote:
  On 23.02.15 at 13:45, ian.campb...@citrix.com wrote:
  On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote:
   On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
   On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
That's the issue we are trying to resolve, with device tree there is 
no
explicit segment ID, so we have an essentially unindexed set of PCI
buses in both Xen and dom0.
   
   How that? What if two bus numbers are equal? There ought to be
   some kind of topology information. Or if all buses are distinct, then
   you don't need a segment number.
   
   It's very possible that I simply don't have the PCI terminology straight
   in my head, leading to me talking nonsense.
   
   I'll explain how I'm using it and perhaps you can put me straight...
   
   My understanding was that a PCI segment equates to a PCI host
   controller, i.e. a specific instance of some PCI host IP on an SoC.
  
  No - there can be multiple roots (i.e. host bridges)
  
  Where a host bridge == what I've been calling PCI host controller?
 
 Some problems here may originate in this naming: I'm not aware of
 anything named host controller in PCI. The root of a PCI hierarchy
 (single or multiple buses) connects to the system bus via a host
 bridge. Whether one or more of them sit in a single chip is of no
 interest (on the x86 and ia64 sides at least).

Yes, I think I've just been using the terminology wrongly, I mean host
bridge throughout.

There is generally one such bridge per controller (i.e. IP block) whic
his what I was trying to get at in the next paragraph. Lets just talk
about host bridges from now on to avoid confusion.

  I suppose in principal a single controller might expose multiple host
  bridges, but I think we can logically treat such things as being
  multiple controllers (e.g. with multiple CFG spaces etc).
 
 Perhaps.
 
  IOW is the mapping from segment-host bridge many-one or
  many-many?
 
 Each segment may have multiple host bridges, each host bridge
 connect devices on multiple buses. Any such hierarchy is entirely
 separate from any other such hierarchy (both physically and in
 terms of the SBDFs used to identify them).
 
  Maybe what I should read into what you are saying is that segments are
  purely a software and/or firmware concept with no real basis in the
  hardware?
 
 Right - they just represent separation in hardware, but they have
 no real equivalent there.

I think I now understand.

  In which case might we be at liberty to specify that on ARM+Device Tree
  systems (i.e. those where the f/w tables don't give an enumeration)
  there is a 1:1 mapping from segments to host bridges?
 
 This again can only be answered knowing how bus number
 assignment gets done on ARM (see also below). If all bus numbers
 are distinct, there's no need for using segments numbers other
 then zero. In the end, if you used segment numbers now, you may
 end up closing the path to using them for much larger future setups.
 But if that's not seen as a problem then yes, I think you could go
 that route.

Ultimately we just need to be able to go from the set of input
parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
bridge.

It seems like the appropriate pair is (segment,bus), which uniquely
corresponds to a single host bridge (and many such pairs may do so).

So I think the original question just goes from having to determine a
way to map a segment to a host bridge to how to map a (segment,bus)
tuple to a host bridge.

  What I don't get from this is where the BDF is being represented.
  
  It isn't, since this is representing the host controller not any given
  PCI devices which it contains.
  
  I thought in general BDFs were probed (or even configured) by the
  firmware and/or OS by walking over the CFG space and so aren't
  necessarily described anywhere in the firmware tables.
 
 They're effectively getting assigned by firmware, yes. This mainly
 affects bridges, which get stored (in their config space) the
 secondary and subordinate bus numbers (bounding the bus range
 they're responsible for when it comes to routing requests). If on
 ARM firmware doesn't assign bus numbers, is bridge setup then a
 job of the OS?

I'm not completely sure I think it depends on the particular firmware
(u-boot, EFI etc) but AIUI it can be the case that the OS does the
enumeration on at least some ARM platforms (quite how/when it knows to
do so I'm not sure).

  FWIW the first 4 bytes in each line of interrupt-map are actually
  somehow matched against the masked (via interrupt-map-mask) against an
  encoding of the BDF to give the INTx routing, but BDFs aren't
  represented in the sense I think you meant in the example above.
  
  There is a capability to have child nodes of this root controller node
  which describe individual devices, and there is an encoding for the BDF
  in there, but these are not required. For reference I've 

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Manish Jaggi


On 23/02/15 4:44 pm, Julien Grall wrote:



On 23/02/2015 10:59, Manish Jaggi wrote:


On 20/02/15 8:09 pm, Ian Campbell wrote:

On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:

Another option might be a new hypercall (assuming one doesn't already
exist) to register a PCI bus which would take e.g. the PCI CFG base
address and return a new u16 segment id to be used for all subsequent
PCI related calls. This would require the dom0 OS to hook its
pci_bus_add function, which might be doable (more doable than 
handling

xen_segment_id DT properties I think).

This seems ok, i will try it out.

I recommend you let this subthread (e.g. the conversation with Jan)
settle upon a preferred course of action before implementing any one
suggestion.

Ian we have also to consider for NUMA / multi node where there are two
or more its nodes.
pci0{

msi-parent = its0;
}

pci1{

msi-parent = its1;
}

This requires parsing pci nodes in xen and create a mapping between pci
nodes and its. Xe would need to be aware of PCI nodes in device tree
prior to dom0 sending a hypercall. Adding a property to pci node in
device tree should be a good approach.


Why do you need it early? Wouldn't be sufficient to retrieve those 
information when the hypercall pci_device_add is called?



The dom0/U device tree should have one 1 its node, xen should map to specific 
its when trapped.


What about ACPI case? Does everything necessary live in static table?

Regards,




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Ian Campbell
On Mon, 2015-02-23 at 08:43 +, Jan Beulich wrote:
  On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
  On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
   That's the issue we are trying to resolve, with device tree there is no
   explicit segment ID, so we have an essentially unindexed set of PCI
   buses in both Xen and dom0.
  
  How that? What if two bus numbers are equal? There ought to be
  some kind of topology information. Or if all buses are distinct, then
  you don't need a segment number.
  
  It's very possible that I simply don't have the PCI terminology straight
  in my head, leading to me talking nonsense.
  
  I'll explain how I'm using it and perhaps you can put me straight...
  
  My understanding was that a PCI segment equates to a PCI host
  controller, i.e. a specific instance of some PCI host IP on an SoC.
 
 No - there can be multiple roots (i.e. host bridges)

Where a host bridge == what I've been calling PCI host controller?

I suppose in principal a single controller might expose multiple host
bridges, but I think we can logically treat such things as being
multiple controllers (e.g. with multiple CFG spaces etc).

  on a single
 segment. Segments are - afaict - purely a scalability extension
 allowing to overcome the 256 bus limit.

Is the converse true -- i.e. can a single host bridge span multiple
segments? IOW is the mapping from segment-host bridge many-one or
many-many?

Maybe what I should read into what you are saying is that segments are
purely a software and/or firmware concept with no real basis in the
hardware?

In which case might we be at liberty to specify that on ARM+Device Tree
systems (i.e. those where the f/w tables don't give an enumeration)
there is a 1:1 mapping from segments to host bridges?

  A PCI host controller defines the root of a bus, within which the BDF
  need not be distinct due to the differing segments which are effectively
  a higher level namespace on the BDFs.
 
 The host controller really defines the root of a tree (often covering
 multiple buses, i.e. as soon as bridges come into play).

Right, I think that's the one thing I'd managed to understanding
correctly ;-)

  So given a system with two PCI host controllers we end up with two
  segments (lets say A and B, but choosing those is the topic of this
  thread) and it is acceptable for both to contain a bus 0 with a device 1
  on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
  
  It sounds like you are saying that this is not actually acceptable and
  that 0:0.0 must be unique in the system irrespective of the associated
  segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?
 
 No, there can be multiple buses numbered zero. And at the same
 time a root bus doesn't need to be bus zero on its segment.

0:0.0 was just an example I pulled out of thin air, it wasn't supposed
to imply some special property of bus 0 e.g. being the root or anything
like that.

If there are multiple buses numbered 0 then are they distinguished via
segment or something else?

  Just for reference a DT node describing a PCI host controller might look
  like (taking the APM Mustang one as an example):
  
  pcie0: pcie@1f2b {
  status = disabled;
  device_type = pci;
  compatible = apm,xgene-storm-pcie, 
  apm,xgene-pcie;
  #interrupt-cells = 1;
  #size-cells = 2;
  #address-cells = 3;
  reg =  0x00 0x1f2b 0x0 0x0001   /* 
  Controller registers */
  0xe0 0xd000 0x0 0x0004; /* PCI 
  config space */
  reg-names = csr, cfg;
  ranges = 0x0100 0x00 0x 0xe0 
  0x1000 0x00 0x0001   /* io */
0x0200 0x00 0x8000 0xe1 
  0x8000 0x00 0x8000; /* mem */
  dma-ranges = 0x4200 0x80 0x 0x80 
  0x 0x00 0x8000
0x4200 0x00 0x 0x00 
  0x 0x80 0x;
  interrupt-map-mask = 0x0 0x0 0x0 0x7;
  interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1
   0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1
   0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1
   0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1;
  dma-coherent;
  clocks = pcie0clk 0;
  };
  
  I expect most of this is uninteresting but the key thing is that there
  is no segment number nor topology relative to e.g. pcie1:
  pcie@1f2c (the node look identical except e.g. all the base
  addresses and interrupt numbers differ).
 
 What I don't get from this is where the BDF is being represented.

It isn't, since this 

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Ian Campbell
On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote:
  On 23.02.15 at 15:33, ian.campb...@citrix.com wrote:
  On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote:
   On 23.02.15 at 13:45, ian.campb...@citrix.com wrote:
   In which case might we be at liberty to specify that on ARM+Device Tree
   systems (i.e. those where the f/w tables don't give an enumeration)
   there is a 1:1 mapping from segments to host bridges?
  
  This again can only be answered knowing how bus number
  assignment gets done on ARM (see also below). If all bus numbers
  are distinct, there's no need for using segments numbers other
  then zero. In the end, if you used segment numbers now, you may
  end up closing the path to using them for much larger future setups.
  But if that's not seen as a problem then yes, I think you could go
  that route.
  
  Ultimately we just need to be able to go from the set of input
  parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
  bridge.
  
  It seems like the appropriate pair is (segment,bus), which uniquely
  corresponds to a single host bridge (and many such pairs may do so).
  
  So I think the original question just goes from having to determine a
  way to map a segment to a host bridge to how to map a (segment,bus)
  tuple to a host bridge.
 
 Right. Avoiding (at this point in time) non-zero segments if at all
 possible.

I think it sounds like we are going to leave that up to the dom0 OS and
whatever it does gets registered with Xen. So non-zero segments is no
longer (directly) up to the Xen code. I think that's fine.

   What I don't get from this is where the BDF is being represented.
   
   It isn't, since this is representing the host controller not any given
   PCI devices which it contains.
   
   I thought in general BDFs were probed (or even configured) by the
   firmware and/or OS by walking over the CFG space and so aren't
   necessarily described anywhere in the firmware tables.
  
  They're effectively getting assigned by firmware, yes. This mainly
  affects bridges, which get stored (in their config space) the
  secondary and subordinate bus numbers (bounding the bus range
  they're responsible for when it comes to routing requests). If on
  ARM firmware doesn't assign bus numbers, is bridge setup then a
  job of the OS?
  
  I'm not completely sure I think it depends on the particular firmware
  (u-boot, EFI etc) but AIUI it can be the case that the OS does the
  enumeration on at least some ARM platforms (quite how/when it knows to
  do so I'm not sure).
 
 In which case the Dom0 OS doing so would need to communicate
 its decisions to the hypervisor, as you suggest further down.

So more concretely something like:
#define PHYSDEVOP_pci_host_bridge_add XX
struct physdev_pci_host_bridge_add {
/* IN */
uint16_t seg;
uint8_t bus;
uint64_t address;
};
typedef struct physdev_pci_host_bridge_add 
physdev_pci_host_bridge_add_t;
DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);

Where seg+bus are enumerated/assigned by dom0 and address is some unique
property of the host bridge -- most likely its pci cfg space base
address (which is what physdev_pci_mmcfg_reserved also takes, I think?)

Do you think we would need start_bus + end_bus here? Xen could enumerate
this itself I think, and perhaps should even if dom0 tells us something?

 This
 basically replaces the bus scan (on segment 0) that Xen does on
 x86 (which topology information gets derived from).

Is the reason for the scan being of segment 0 only is that it is the one
which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 

What about other host bridges in segment 0 which aren't at that address?

You could do the others based on MMCFG tables if you wanted, right?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Julien Grall
On 23/02/15 11:50, Manish Jaggi wrote:
 
 On 23/02/15 4:44 pm, Julien Grall wrote:


 On 23/02/2015 10:59, Manish Jaggi wrote:

 On 20/02/15 8:09 pm, Ian Campbell wrote:
 On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
 Another option might be a new hypercall (assuming one doesn't already
 exist) to register a PCI bus which would take e.g. the PCI CFG base
 address and return a new u16 segment id to be used for all subsequent
 PCI related calls. This would require the dom0 OS to hook its
 pci_bus_add function, which might be doable (more doable than
 handling
 xen_segment_id DT properties I think).
 This seems ok, i will try it out.
 I recommend you let this subthread (e.g. the conversation with Jan)
 settle upon a preferred course of action before implementing any one
 suggestion.
 Ian we have also to consider for NUMA / multi node where there are two
 or more its nodes.
 pci0{

 msi-parent = its0;
 }

 pci1{

 msi-parent = its1;
 }

 This requires parsing pci nodes in xen and create a mapping between pci
 nodes and its. Xe would need to be aware of PCI nodes in device tree
 prior to dom0 sending a hypercall. Adding a property to pci node in
 device tree should be a good approach.

 Why do you need it early? Wouldn't be sufficient to retrieve those
 information when the hypercall pci_device_add is called?

 The dom0/U device tree should have one 1 its node, xen should map to
 specific its when trapped.

The DOM0 device tree should expose the same layout as the hardware. By
exposing only one ITS you make your life more complicate.

PHYSDEVOP_pci_device_add should be called before any initialization is
done. Therefore ITS should be configured for this PCI after Xen is aware
of the PCI.

IHMO, any ITS trap before this is wrong.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Jan Beulich
 On 23.02.15 at 15:33, ian.campb...@citrix.com wrote:
 On Mon, 2015-02-23 at 14:07 +, Jan Beulich wrote:
  On 23.02.15 at 13:45, ian.campb...@citrix.com wrote:
  In which case might we be at liberty to specify that on ARM+Device Tree
  systems (i.e. those where the f/w tables don't give an enumeration)
  there is a 1:1 mapping from segments to host bridges?
 
 This again can only be answered knowing how bus number
 assignment gets done on ARM (see also below). If all bus numbers
 are distinct, there's no need for using segments numbers other
 then zero. In the end, if you used segment numbers now, you may
 end up closing the path to using them for much larger future setups.
 But if that's not seen as a problem then yes, I think you could go
 that route.
 
 Ultimately we just need to be able to go from the set of input
 parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
 bridge.
 
 It seems like the appropriate pair is (segment,bus), which uniquely
 corresponds to a single host bridge (and many such pairs may do so).
 
 So I think the original question just goes from having to determine a
 way to map a segment to a host bridge to how to map a (segment,bus)
 tuple to a host bridge.

Right. Avoiding (at this point in time) non-zero segments if at all
possible.

  What I don't get from this is where the BDF is being represented.
  
  It isn't, since this is representing the host controller not any given
  PCI devices which it contains.
  
  I thought in general BDFs were probed (or even configured) by the
  firmware and/or OS by walking over the CFG space and so aren't
  necessarily described anywhere in the firmware tables.
 
 They're effectively getting assigned by firmware, yes. This mainly
 affects bridges, which get stored (in their config space) the
 secondary and subordinate bus numbers (bounding the bus range
 they're responsible for when it comes to routing requests). If on
 ARM firmware doesn't assign bus numbers, is bridge setup then a
 job of the OS?
 
 I'm not completely sure I think it depends on the particular firmware
 (u-boot, EFI etc) but AIUI it can be the case that the OS does the
 enumeration on at least some ARM platforms (quite how/when it knows to
 do so I'm not sure).

In which case the Dom0 OS doing so would need to communicate
its decisions to the hypervisor, as you suggest further down. This
basically replaces the bus scan (on segment 0) that Xen does on
x86 (which topology information gets derived from).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Jan Beulich
 On 23.02.15 at 16:02, ian.campb...@citrix.com wrote:
 On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote:
 In which case the Dom0 OS doing so would need to communicate
 its decisions to the hypervisor, as you suggest further down.
 
 So more concretely something like:
 #define PHYSDEVOP_pci_host_bridge_add XX
 struct physdev_pci_host_bridge_add {
 /* IN */
 uint16_t seg;
 uint8_t bus;
 uint64_t address;
 };
 typedef struct physdev_pci_host_bridge_add 
 physdev_pci_host_bridge_add_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);
 
 Where seg+bus are enumerated/assigned by dom0 and address is some unique
 property of the host bridge -- most likely its pci cfg space base
 address (which is what physdev_pci_mmcfg_reserved also takes, I think?)

Right.

 Do you think we would need start_bus + end_bus here? Xen could enumerate
 this itself I think, and perhaps should even if dom0 tells us something?

That depends - if what you get presented here by Dom0 is a PCI
device at seg:bus:00.0, and if all other setup was already
done on it, then you could read the secondary and subordinate
bus numbers from its config space. If that's not possible, then
Dom0 handing you these values would seem to be necessary.

As a result you may also need a hook from PCI device registration,
allowing to associate it with the right host bridge (and refusing to
add any for which there's none).

As an alternative, extending PHYSDEVOP_manage_pci_add_ext in
a suitable manner may be worth considering, provided (like on x86
and ia64) the host bridges get surfaced as distinct PCI devices.

 This
 basically replaces the bus scan (on segment 0) that Xen does on
 x86 (which topology information gets derived from).
 
 Is the reason for the scan being of segment 0 only is that it is the one
 which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 

Right - ideally we would scan all segments, but we need Dom0 to
tell us which MMCFG regions are safe to access, and hence can't
do that scan at boot time. But we also won't get away without
scanning, as we need to set up the IOMMU(s) to at least cover
the devices used for booting the system.

 What about other host bridges in segment 0 which aren't at that address?

At which address? (All devices on segment zero are supposed to
be accessible via config space access method 1.)

 You could do the others based on MMCFG tables if you wanted, right?

Yes, with the above mentioned caveat.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Ian Campbell
On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote:
  On 23.02.15 at 16:02, ian.campb...@citrix.com wrote:
  On Mon, 2015-02-23 at 14:45 +, Jan Beulich wrote:
  In which case the Dom0 OS doing so would need to communicate
  its decisions to the hypervisor, as you suggest further down.
  
  So more concretely something like:
  #define PHYSDEVOP_pci_host_bridge_add XX
  struct physdev_pci_host_bridge_add {
  /* IN */
  uint16_t seg;
  uint8_t bus;
  uint64_t address;
  };
  typedef struct physdev_pci_host_bridge_add 
  physdev_pci_host_bridge_add_t;
  DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);
  
  Where seg+bus are enumerated/assigned by dom0 and address is some unique
  property of the host bridge -- most likely its pci cfg space base
  address (which is what physdev_pci_mmcfg_reserved also takes, I think?)
 
 Right.
 
  Do you think we would need start_bus + end_bus here? Xen could enumerate
  this itself I think, and perhaps should even if dom0 tells us something?
 
 That depends - if what you get presented here by Dom0 is a PCI
 device at seg:bus:00.0, and if all other setup was already
 done on it, then you could read the secondary and subordinate
 bus numbers from its config space. If that's not possible, then
 Dom0 handing you these values would seem to be necessary.
 
 As a result you may also need a hook from PCI device registration,
 allowing to associate it with the right host bridge (and refusing to
 add any for which there's none).

Right.

My thinking was that PHYSDEVOP_pci_host_bridge_add would add an entry
into some mapping data structure from (segment,bus) to a handle
associated with the associated pci host bridge driver in Xen.

PHYSDEVOP_manage_pci_add would have to lookup the host bridge driver
from the (segment,bus) I think to construct the necessary linkage for
use later when we try to do things to the device, and it should indeed
fail if it can't find one.

 As an alternative, extending PHYSDEVOP_manage_pci_add_ext in
 a suitable manner may be worth considering, provided (like on x86
 and ia64) the host bridges get surfaced as distinct PCI devices.
 
  This
  basically replaces the bus scan (on segment 0) that Xen does on
  x86 (which topology information gets derived from).
  
  Is the reason for the scan being of segment 0 only is that it is the one
  which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 
 
 Right - ideally we would scan all segments, but we need Dom0 to
 tell us which MMCFG regions are safe to access,

Is this done via PHYSDEVOP_pci_mmcfg_reserved?

  and hence can't
 do that scan at boot time. But we also won't get away without
 scanning, as we need to set up the IOMMU(s) to at least cover
 the devices used for booting the system.

Which hopefully are all segment 0 or aren't needed until after dom0
tells Xen about them I suppose.

  What about other host bridges in segment 0 which aren't at that address?
 
 At which address?

I meant this to be a back reference to the legacy PCI CFG addresses (or
those magic I/O ports).

  (All devices on segment zero are supposed to
 be accessible via config space access method 1.)

Is that the legacy   or magic ... again?

  You could do the others based on MMCFG tables if you wanted, right?
 
 Yes, with the above mentioned caveat.
 
 Jan
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Jan Beulich
 On 23.02.15 at 16:46, ian.campb...@citrix.com wrote:
 On Mon, 2015-02-23 at 15:27 +, Jan Beulich wrote:
  On 23.02.15 at 16:02, ian.campb...@citrix.com wrote:
  Is the reason for the scan being of segment 0 only is that it is the one
  which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 
 
 Right - ideally we would scan all segments, but we need Dom0 to
 tell us which MMCFG regions are safe to access,
 
 Is this done via PHYSDEVOP_pci_mmcfg_reserved?

Yes.

  and hence can't
 do that scan at boot time. But we also won't get away without
 scanning, as we need to set up the IOMMU(s) to at least cover
 the devices used for booting the system.
 
 Which hopefully are all segment 0 or aren't needed until after dom0
 tells Xen about them I suppose.

Right. With EFI one may be able to overcome this one day, but the
legacy BIOS doesn't even surface mechanisms (software interrupts)
to access devices outside of segment 0.

  (All devices on segment zero are supposed to
 be accessible via config space access method 1.)
 
 Is that the legacy   or magic ... again?

Yes (just that there are two of them).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-23 Thread Jan Beulich
 On 20.02.15 at 18:33, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
  That's the issue we are trying to resolve, with device tree there is no
  explicit segment ID, so we have an essentially unindexed set of PCI
  buses in both Xen and dom0.
 
 How that? What if two bus numbers are equal? There ought to be
 some kind of topology information. Or if all buses are distinct, then
 you don't need a segment number.
 
 It's very possible that I simply don't have the PCI terminology straight
 in my head, leading to me talking nonsense.
 
 I'll explain how I'm using it and perhaps you can put me straight...
 
 My understanding was that a PCI segment equates to a PCI host
 controller, i.e. a specific instance of some PCI host IP on an SoC.

No - there can be multiple roots (i.e. host bridges) on a single
segment. Segments are - afaict - purely a scalability extension
allowing to overcome the 256 bus limit.

 I
 think that PCI specs etc perhaps call a segment a PCI domain, which we
 avoided in Xen due to the obvious potential for confusion.

Right, the two terms are getting mixed depending on where you
look.

 A PCI host controller defines the root of a bus, within which the BDF
 need not be distinct due to the differing segments which are effectively
 a higher level namespace on the BDFs.

The host controller really defines the root of a tree (often covering
multiple buses, i.e. as soon as bridges come into play).

 So given a system with two PCI host controllers we end up with two
 segments (lets say A and B, but choosing those is the topic of this
 thread) and it is acceptable for both to contain a bus 0 with a device 1
 on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
 
 It sounds like you are saying that this is not actually acceptable and
 that 0:0.0 must be unique in the system irrespective of the associated
 segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?

No, there can be multiple buses numbered zero. And at the same
time a root bus doesn't need to be bus zero on its segment.

 Just for reference a DT node describing a PCI host controller might look
 like (taking the APM Mustang one as an example):
 
 pcie0: pcie@1f2b {
 status = disabled;
 device_type = pci;
 compatible = apm,xgene-storm-pcie, apm,xgene-pcie;
 #interrupt-cells = 1;
 #size-cells = 2;
 #address-cells = 3;
 reg =  0x00 0x1f2b 0x0 0x0001   /* 
 Controller registers */
 0xe0 0xd000 0x0 0x0004; /* PCI 
 config space */
 reg-names = csr, cfg;
 ranges = 0x0100 0x00 0x 0xe0 0x1000 
 0x00 0x0001   /* io */
   0x0200 0x00 0x8000 0xe1 0x8000 
 0x00 0x8000; /* mem */
 dma-ranges = 0x4200 0x80 0x 0x80 
 0x 0x00 0x8000
   0x4200 0x00 0x 0x00 
 0x 0x80 0x;
 interrupt-map-mask = 0x0 0x0 0x0 0x7;
 interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1
  0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1
  0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1
  0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1;
 dma-coherent;
 clocks = pcie0clk 0;
 };
 
 I expect most of this is uninteresting but the key thing is that there
 is no segment number nor topology relative to e.g. pcie1:
 pcie@1f2c (the node look identical except e.g. all the base
 addresses and interrupt numbers differ).

What I don't get from this is where the BDF is being represented.
Yet that arrangement is fundamental to understand whether you
really need segments to properly disambiguate devices.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Julien Grall
On 20/02/15 12:10, Manish Jaggi wrote:
 
 On 20/02/15 5:33 pm, Julien Grall wrote:
 Hello Manish,

 On 20/02/15 11:34, Manish Jaggi wrote:
 The platform APIs are enhanced to provide support for parsing pci device
 tree nodes and storing the config-space address which is later used for
 pci_read/pci_write config calls.
 Can you explain why you choose to add per-platform callbacks rather than
 a generic solution?
 The platform code is similar to what linux has in
 drivers/pci/host/pci-platform.c. I have used the same concept.

Please explain it in the commit message, it helps us to understand why
you did it.

Anyway, based on what you said, your approach looks wrong.

Firstly, the platform code is DT-centric and we don't expect to have a
such things for ACPI.

Secondly, the PCI host code be shared between multiple platform.

Overall, I would prefer to have a separate file and structure for
handling PCI host. Also, I think we could re-use the Linux code for this
purpose.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Manish Jaggi


On 20/02/15 5:33 pm, Julien Grall wrote:

Hello Manish,

On 20/02/15 11:34, Manish Jaggi wrote:

The platform APIs are enhanced to provide support for parsing pci device
tree nodes and storing the config-space address which is later used for
pci_read/pci_write config calls.

Can you explain why you choose to add per-platform callbacks rather than
a generic solution?
The platform code is similar to what linux has in 
drivers/pci/host/pci-platform.c. I have used the same concept.


Regards,




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Jan Beulich
 On 20.02.15 at 14:45, ian.campb...@citrix.com wrote:
 (Jan, curious if you have any thoughts on this, hopefully I've left
 sufficient context for you to get what we are on about, the gist is we
 need some way for dom0 and Xen to agree on how a PCI segment ID maps to
 an actual PCI root controller, I think on x86 you either Just Know from
 PC legacy or ACPI tells you?)

Yeah, without information from ACPI we'd have no way to know how
to access the config space of segments other than 0. Both I/O port
based access methods don't have room for specifying a segment
number. Since the MMCFG addresses get set up by firmware, it is
also firmware telling us the segment numbers. If you don't get them
arranged in any particular order, ...

 On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
 I have added ABI that segment no maps to the position on pci node in xen 
 device tree. We had partially discussed about this during Linaro 
 connect. What is your teams view on this, should this be ok or we 
 introduce a property in device tree pci node {xen_segment_id = 1}
 
 The DT node ordering cannot be relied on this way, so we certainly need
 something else.
 
 Ideally we should find a solution which doesn't require new properties.
 
 The best solution would be to find some existing property of the PCI
 host controller which is well defined and unique to each host
 controller. I had been thinking that the base address of the PCI CFG
 space might be a good enough handle.

... this approach would seem reasonable.

 The only question is whether the data type used for segment id in the
 hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
 not sure if we are going to be able to make this pci_segment_t and have
 it differ for ARM.

Are you expecting to have more than 64k segments? Otherwise,
just sequentially assign segment numbers as you discover them or
get them reported by Dom0. You could even have Dom0 tell you
the segment numbers (just like we do on x86), thus eliminating the
need for an extra mechanism for Dom0 to learn them.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
On Fri, 2015-02-20 at 12:20 +, Julien Grall wrote:
 Overall, I would prefer to have a separate file and structure for
 handling PCI host. Also, I think we could re-use the Linux code for this
 purpose.

(caveat; I've not looked at the code yet)

I had expected that PCI host controllers would be discovered via the
existing device model stuff and compatible string matching, e.g.
DT_DEVICE_START(some_pcihost, SOME PCI HOST CONTROLLER, DEVICE_PCIBUS)
which would reference a set of compatible strings and a probe function,
the probe function would then call some pci bus registration function to
hook that bus into the system.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
(Jan, curious if you have any thoughts on this, hopefully I've left
sufficient context for you to get what we are on about, the gist is we
need some way for dom0 and Xen to agree on how a PCI segment ID maps to
an actual PCI root controller, I think on x86 you either Just Know from
PC legacy or ACPI tells you?)

On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
 I have added ABI that segment no maps to the position on pci node in xen 
 device tree. We had partially discussed about this during Linaro 
 connect. What is your teams view on this, should this be ok or we 
 introduce a property in device tree pci node {xen_segment_id = 1}

The DT node ordering cannot be relied on this way, so we certainly need
something else.

Ideally we should find a solution which doesn't require new properties.

The best solution would be to find some existing property of the PCI
host controller which is well defined and unique to each host
controller. I had been thinking that the base address of the PCI CFG
space might be a good enough handle.

The only question is whether the data type used for segment id in the
hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
not sure if we are going to be able to make this pci_segment_t and have
it differ for ARM.

Another option might be a new hypercall (assuming one doesn't already
exist) to register a PCI bus which would take e.g. the PCI CFG base
address and return a new u16 segment id to be used for all subsequent
PCI related calls. This would require the dom0 OS to hook its
pci_bus_add function, which might be doable (more doable than handling
xen_segment_id DT properties I think).

I'm not sure if this ends up being different on ACPI, where perhaps
MMCONFIG or some other table actually gives us a segment ID for each PCI
bus. Ideally whatever solution we end up with would fit into this model.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote:
  On 20.02.15 at 14:45, ian.campb...@citrix.com wrote:
  (Jan, curious if you have any thoughts on this, hopefully I've left
  sufficient context for you to get what we are on about, the gist is we
  need some way for dom0 and Xen to agree on how a PCI segment ID maps to
  an actual PCI root controller, I think on x86 you either Just Know from
  PC legacy or ACPI tells you?)
 
 Yeah, without information from ACPI we'd have no way to know how
 to access the config space of segments other than 0. Both I/O port
 based access methods don't have room for specifying a segment
 number. Since the MMCFG addresses get set up by firmware, it is
 also firmware telling us the segment numbers. If you don't get them
 arranged in any particular order, ...
 
  On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
  I have added ABI that segment no maps to the position on pci node in xen 
  device tree. We had partially discussed about this during Linaro 
  connect. What is your teams view on this, should this be ok or we 
  introduce a property in device tree pci node {xen_segment_id = 1}
  
  The DT node ordering cannot be relied on this way, so we certainly need
  something else.
  
  Ideally we should find a solution which doesn't require new properties.
  
  The best solution would be to find some existing property of the PCI
  host controller which is well defined and unique to each host
  controller. I had been thinking that the base address of the PCI CFG
  space might be a good enough handle.
 
 ... this approach would seem reasonable.
 
  The only question is whether the data type used for segment id in the
  hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
  not sure if we are going to be able to make this pci_segment_t and have
  it differ for ARM.
 
 Are you expecting to have more than 64k segments?

If we were to use the PCI CFG base address as the handle for a segment
then we would need a 64 bit field is all, it would of course be very
sparse ;-).

 Otherwise,
 just sequentially assign segment numbers as you discover them or
 get them reported by Dom0. You could even have Dom0 tell you
 the segment numbers (just like we do on x86),

Aha, how does this work on x86 then? I've been looking for a hypercall
which dom0 uses to tell Xen about PCI segments to no avail (I just find
ones for registering actual devices).

If there is an existing mechanism on x86 and it suits (or nearly so)
then I am entirely in favour of using it.

Ian.

  thus eliminating the
 need for an extra mechanism for Dom0 to learn them.
 
 Jan
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Jan Beulich
 On 20.02.15 at 15:26, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote:
 Otherwise,
 just sequentially assign segment numbers as you discover them or
 get them reported by Dom0. You could even have Dom0 tell you
 the segment numbers (just like we do on x86),
 
 Aha, how does this work on x86 then? I've been looking for a hypercall
 which dom0 uses to tell Xen about PCI segments to no avail (I just find
 ones for registering actual devices).

But that's the one, plus the MMCFG reporting one
(PHYSDEVOP_pci_mmcfg_reserved). Without ACPI, how do you
know on ARM how to access config space for a particular
segment?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
  Another option might be a new hypercall (assuming one doesn't already
  exist) to register a PCI bus which would take e.g. the PCI CFG base
  address and return a new u16 segment id to be used for all subsequent
  PCI related calls. This would require the dom0 OS to hook its
  pci_bus_add function, which might be doable (more doable than handling
  xen_segment_id DT properties I think).
 This seems ok, i will try it out.

I recommend you let this subthread (e.g. the conversation with Jan)
settle upon a preferred course of action before implementing any one
suggestion.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote:
  On 20.02.15 at 15:26, ian.campb...@citrix.com wrote:
  On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote:
  Otherwise,
  just sequentially assign segment numbers as you discover them or
  get them reported by Dom0. You could even have Dom0 tell you
  the segment numbers (just like we do on x86),
  
  Aha, how does this work on x86 then? I've been looking for a hypercall
  which dom0 uses to tell Xen about PCI segments to no avail (I just find
  ones for registering actual devices).
 
 But that's the one,

that == physdev_pci_device_add?

AFAICT that tells Xen about a given device existing on a particular
segment, but doesn't tell Xen any of the properties of that segment.

  plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).

This looks promising, but rather under-documented.

#define PHYSDEVOP_pci_mmcfg_reserved24
struct physdev_pci_mmcfg_reserved {
uint64_t address;
uint16_t segment;
uint8_t start_bus;
uint8_t end_bus;
uint32_t flags;
};

I suppose the first 4 fields correspond to entries in the MMCFG table?
Which x86 Xen can parse and so can dom0, so dom0 can then make this
hypercall, passing (address,segment,start_bus,end_bus) to set the flags?

What is address the address of? The CFG space I think?

On ARM with DT I think we only get given address, and something has to
make up segment, start/end_bus I'm not sure where we would get them
from.

So although I think we could perhaps bend this interface to ARMs needs
it would have rather different semantics to x86, i.e. instead of the
key being (address,segment,start_bus,end_bus) and the value being
flags it would be something like key = (address) and value =
(segment,start_bus,end_bus,flags).

I don't think reusing like that would be wise.

  Without ACPI, how do you
 know on ARM how to access config space for a particular
 segment?

That's the issue we are trying to resolve, with device tree there is no
explicit segment ID, so we have an essentially unindexed set of PCI
buses in both Xen and dom0.

So something somewhere needs to make up a segment ID for each PCI bus
and Xen and dom0 need to somehow agree on what the mapping is e.g. by
the one which made up the segment ID telling the other or some other TBD
means.

On x86 you solve this because both Xen and dom0 can parse the same table
and reach the same answer, sadly DT doesn't have everything needed in
it.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Jan Beulich
 On 20.02.15 at 16:01, ian.campb...@citrix.com wrote:
 On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote:
  plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).
 
 This looks promising, but rather under-documented.
 
 #define PHYSDEVOP_pci_mmcfg_reserved24
 struct physdev_pci_mmcfg_reserved {
 uint64_t address;
 uint16_t segment;
 uint8_t start_bus;
 uint8_t end_bus;
 uint32_t flags;
 };
 
 I suppose the first 4 fields correspond to entries in the MMCFG table?

Yes.

 Which x86 Xen can parse and so can dom0, so dom0 can then make this
 hypercall, passing (address,segment,start_bus,end_bus) to set the flags?

No, the flags are IN too - since Xen can parse the table itself, there
wouldn't be any need for the hypercall if there weren't many systems
which don't reserve the MMCFG address range(s) in E820 and/or ACPI
resources. Xen can check E820, but obtaining ACPI resource info
requires AML parsing.

 What is address the address of? The CFG space I think?

Yes, the base address of the MMCFG range (maybe suitably offset
by the bus number).

 On ARM with DT I think we only get given address, and something has to
 make up segment, start/end_bus I'm not sure where we would get them
 from.
 
 So although I think we could perhaps bend this interface to ARMs needs
 it would have rather different semantics to x86, i.e. instead of the
 key being (address,segment,start_bus,end_bus) and the value being
 flags it would be something like key = (address) and value =
 (segment,start_bus,end_bus,flags).
 
 I don't think reusing like that would be wise.

Yes, sufficiently different semantics would call for a new interface.

  Without ACPI, how do you
 know on ARM how to access config space for a particular
 segment?
 
 That's the issue we are trying to resolve, with device tree there is no
 explicit segment ID, so we have an essentially unindexed set of PCI
 buses in both Xen and dom0.

How that? What if two bus numbers are equal? There ought to be
some kind of topology information. Or if all buses are distinct, then
you don't need a segment number.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Julien Grall
On 20/02/15 15:13, Manish Jaggi wrote:
 On x86 you solve this because both Xen and dom0 can parse the same table
 and reach the same answer, sadly DT doesn't have everything needed in
 it.
 In fact xen and dom0 use the same device tree nodes and in the same
 order. xen creates the device tree for dom0.
 I think xen can enforce the ABI while creating device tree

While it's true for Linux, you can't assume that another OS (such as
FreeBSD) will parse the device tree in the same order.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Manish Jaggi


On 20/02/15 8:31 pm, Ian Campbell wrote:

On Fri, 2015-02-20 at 14:39 +, Jan Beulich wrote:

On 20.02.15 at 15:26, ian.campb...@citrix.com wrote:

On Fri, 2015-02-20 at 14:11 +, Jan Beulich wrote:

Otherwise,
just sequentially assign segment numbers as you discover them or
get them reported by Dom0. You could even have Dom0 tell you
the segment numbers (just like we do on x86),

Aha, how does this work on x86 then? I've been looking for a hypercall
which dom0 uses to tell Xen about PCI segments to no avail (I just find
ones for registering actual devices).

But that's the one,

that == physdev_pci_device_add?

AFAICT that tells Xen about a given device existing on a particular
segment, but doesn't tell Xen any of the properties of that segment.


  plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).

This looks promising, but rather under-documented.

 #define PHYSDEVOP_pci_mmcfg_reserved24
 struct physdev_pci_mmcfg_reserved {
 uint64_t address;
 uint16_t segment;
 uint8_t start_bus;
 uint8_t end_bus;
 uint32_t flags;
 };
 
I suppose the first 4 fields correspond to entries in the MMCFG table?

Which x86 Xen can parse and so can dom0, so dom0 can then make this
hypercall, passing (address,segment,start_bus,end_bus) to set the flags?

What is address the address of? The CFG space I think?

On ARM with DT I think we only get given address, and something has to
make up segment, start/end_bus I'm not sure where we would get them
from.

So although I think we could perhaps bend this interface to ARMs needs
it would have rather different semantics to x86, i.e. instead of the
key being (address,segment,start_bus,end_bus) and the value being
flags it would be something like key = (address) and value =
(segment,start_bus,end_bus,flags).

I don't think reusing like that would be wise.


  Without ACPI, how do you
know on ARM how to access config space for a particular
segment?

That's the issue we are trying to resolve, with device tree there is no
explicit segment ID, so we have an essentially unindexed set of PCI
buses in both Xen and dom0.

So something somewhere needs to make up a segment ID for each PCI bus
and Xen and dom0 need to somehow agree on what the mapping is e.g. by
the one which made up the segment ID telling the other or some other TBD
means.

On x86 you solve this because both Xen and dom0 can parse the same table
and reach the same answer, sadly DT doesn't have everything needed in
it.
In fact xen and dom0 use the same device tree nodes and in the same 
order. xen creates the device tree for dom0.

I think xen can enforce the ABI while creating device tree


Ian.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-02-20 Thread Ian Campbell
On Fri, 2015-02-20 at 15:15 +, Jan Beulich wrote:
  That's the issue we are trying to resolve, with device tree there is no
  explicit segment ID, so we have an essentially unindexed set of PCI
  buses in both Xen and dom0.
 
 How that? What if two bus numbers are equal? There ought to be
 some kind of topology information. Or if all buses are distinct, then
 you don't need a segment number.

It's very possible that I simply don't have the PCI terminology straight
in my head, leading to me talking nonsense.

I'll explain how I'm using it and perhaps you can put me straight...

My understanding was that a PCI segment equates to a PCI host
controller, i.e. a specific instance of some PCI host IP on an SoC. I
think that PCI specs etc perhaps call a segment a PCI domain, which we
avoided in Xen due to the obvious potential for confusion.

A PCI host controller defines the root of a bus, within which the BDF
need not be distinct due to the differing segments which are effectively
a higher level namespace on the BDFs.

So given a system with two PCI host controllers we end up with two
segments (lets say A and B, but choosing those is the topic of this
thread) and it is acceptable for both to contain a bus 0 with a device 1
on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.

It sounds like you are saying that this is not actually acceptable and
that 0:0.0 must be unique in the system irrespective of the associated
segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?

Just for reference a DT node describing a PCI host controller might look
like (taking the APM Mustang one as an example):

pcie0: pcie@1f2b {
status = disabled;
device_type = pci;
compatible = apm,xgene-storm-pcie, apm,xgene-pcie;
#interrupt-cells = 1;
#size-cells = 2;
#address-cells = 3;
reg =  0x00 0x1f2b 0x0 0x0001   /* Controller 
registers */
0xe0 0xd000 0x0 0x0004; /* PCI config 
space */
reg-names = csr, cfg;
ranges = 0x0100 0x00 0x 0xe0 0x1000 
0x00 0x0001   /* io */
  0x0200 0x00 0x8000 0xe1 0x8000 
0x00 0x8000; /* mem */
dma-ranges = 0x4200 0x80 0x 0x80 
0x 0x00 0x8000
  0x4200 0x00 0x 0x00 
0x 0x80 0x;
interrupt-map-mask = 0x0 0x0 0x0 0x7;
interrupt-map = 0x0 0x0 0x0 0x1 gic 0x0 0xc2 0x1
 0x0 0x0 0x0 0x2 gic 0x0 0xc3 0x1
 0x0 0x0 0x0 0x3 gic 0x0 0xc4 0x1
 0x0 0x0 0x0 0x4 gic 0x0 0xc5 0x1;
dma-coherent;
clocks = pcie0clk 0;
};

I expect most of this is uninteresting but the key thing is that there
is no segment number nor topology relative to e.g. pcie1:
pcie@1f2c (the node look identical except e.g. all the base
addresses and interrupt numbers differ).

(FWIW reg here shows that the PCI cfg space is at 0xe0d000,
interrupt-map shows that SPI(AKA GSI) 0xc2 is INTA and 0xc3 is INTB (I
think, a bit fuzzy...), ranges is the space where BARs live, I think you
can safely ignore everything else for the purposes of this
conversation).

Ian.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel