Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-12 Thread Boris Ostrovsky
On 9/10/19 9:15 PM, Igor Druzhinin wrote:
> On 10/09/2019 22:19, Boris Ostrovsky wrote:
>> On 9/10/19 4:36 PM, Igor Druzhinin wrote:
>>> On 10/09/2019 18:48, Boris Ostrovsky wrote:
 On 9/10/19 5:46 AM, Igor Druzhinin wrote:
> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>>> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>>>
 The other question I have is why you think it's worth keeping
 xen_mcfg_late() as a late initcall. How could MCFG info be updated
 between acpi_init() and late_initcalls being run? I'd think it can only
 happen when a new device is hotplugged.

>>> It was a precaution against setup_mcfg_map() calls that might add new
>>> areas that are not in MCFG table but for some reason have _CBA method.
>>> It's obviously a "firmware is broken" scenario so I don't have strong
>>> feelings to keep it here. Will prefer to remove in v2 if you want.
>> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
>> where you are calling xen_mcfg_late()?
>>
> setup_mcfg_map() calls are done in order of root bus discovery which
> happens *after* the previous root bus has been enumerated. So the order
> is: call setup_mcfg_map() for root bus 0, find that
> pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
> enumeration of bus 0, call xen_add_device() for every device there, call
> setup_mcfg_map() for root bus X, etc.
 Ah, yes. Multiple busses.

 If that's the case then why don't we need to call xen_mcfg_late() for
 the first device on each bus?

>>> Ideally, yes - we'd like to call it for every bus discovered. But boot
>>> time buses are already in MCFG (otherwise system boot might not simply
>>> work as Jan pointed out) so it's not strictly required. The only case is
>>> a potential PCI bus hot-plug but I'm not sure it actually works in
>>> practice and we certainly didn't support it before. It might be solved
>>> theoretically by subscribing to acpi_bus_type that is available after
>>> acpi_init().
>> OK. Then *I think* we can drop late_initcall() but I would really like
>> to hear when others think.

Since noone commented then can you send a v2 with second patch removing
the late call?

Also, in the first patch please limit the scope of pci_mcfg_reserved to
just xen_add_device().

-boris


>>
> Another thing that I implied by "not supporting" but want to explicitly
> call out is that currently Xen will refuse reserving any MCFG area
> unless it actually existed in MCFG table at boot. I don't clearly
> understand reasoning behind it but it might be worth relaxing at least
> size matching restriction on Xen side now with this change.
>
> Igor
>
> ___
> Xen-devel mailing list
> xen-de...@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel



Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-11 Thread Jan Beulich
On 11.09.2019 03:15, Igor Druzhinin wrote:
> Another thing that I implied by "not supporting" but want to explicitly
> call out is that currently Xen will refuse reserving any MCFG area
> unless it actually existed in MCFG table at boot. I don't clearly
> understand reasoning behind it but it might be worth relaxing at least
> size matching restriction on Xen side now with this change.

I guess it's because no-one had a system were it would be needed,
and hence could be tested.

Jan


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Igor Druzhinin
On 10/09/2019 22:19, Boris Ostrovsky wrote:
> On 9/10/19 4:36 PM, Igor Druzhinin wrote:
>> On 10/09/2019 18:48, Boris Ostrovsky wrote:
>>> On 9/10/19 5:46 AM, Igor Druzhinin wrote:
 On 10/09/2019 02:47, Boris Ostrovsky wrote:
> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>>
>>> The other question I have is why you think it's worth keeping
>>> xen_mcfg_late() as a late initcall. How could MCFG info be updated
>>> between acpi_init() and late_initcalls being run? I'd think it can only
>>> happen when a new device is hotplugged.
>>>
>> It was a precaution against setup_mcfg_map() calls that might add new
>> areas that are not in MCFG table but for some reason have _CBA method.
>> It's obviously a "firmware is broken" scenario so I don't have strong
>> feelings to keep it here. Will prefer to remove in v2 if you want.
> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
> where you are calling xen_mcfg_late()?
>
 setup_mcfg_map() calls are done in order of root bus discovery which
 happens *after* the previous root bus has been enumerated. So the order
 is: call setup_mcfg_map() for root bus 0, find that
 pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
 enumeration of bus 0, call xen_add_device() for every device there, call
 setup_mcfg_map() for root bus X, etc.
>>> Ah, yes. Multiple busses.
>>>
>>> If that's the case then why don't we need to call xen_mcfg_late() for
>>> the first device on each bus?
>>>
>> Ideally, yes - we'd like to call it for every bus discovered. But boot
>> time buses are already in MCFG (otherwise system boot might not simply
>> work as Jan pointed out) so it's not strictly required. The only case is
>> a potential PCI bus hot-plug but I'm not sure it actually works in
>> practice and we certainly didn't support it before. It might be solved
>> theoretically by subscribing to acpi_bus_type that is available after
>> acpi_init().
> 
> OK. Then *I think* we can drop late_initcall() but I would really like
> to hear when others think.
> 

Another thing that I implied by "not supporting" but want to explicitly
call out is that currently Xen will refuse reserving any MCFG area
unless it actually existed in MCFG table at boot. I don't clearly
understand reasoning behind it but it might be worth relaxing at least
size matching restriction on Xen side now with this change.

Igor


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Boris Ostrovsky
On 9/10/19 4:36 PM, Igor Druzhinin wrote:
> On 10/09/2019 18:48, Boris Ostrovsky wrote:
>> On 9/10/19 5:46 AM, Igor Druzhinin wrote:
>>> On 10/09/2019 02:47, Boris Ostrovsky wrote:
 On 9/9/19 5:48 PM, Igor Druzhinin wrote:
> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>
>> The other question I have is why you think it's worth keeping
>> xen_mcfg_late() as a late initcall. How could MCFG info be updated
>> between acpi_init() and late_initcalls being run? I'd think it can only
>> happen when a new device is hotplugged.
>>
> It was a precaution against setup_mcfg_map() calls that might add new
> areas that are not in MCFG table but for some reason have _CBA method.
> It's obviously a "firmware is broken" scenario so I don't have strong
> feelings to keep it here. Will prefer to remove in v2 if you want.
 Isn't setup_mcfg_map() called before the first xen_add_device() which is 
 where you are calling xen_mcfg_late()?

>>> setup_mcfg_map() calls are done in order of root bus discovery which
>>> happens *after* the previous root bus has been enumerated. So the order
>>> is: call setup_mcfg_map() for root bus 0, find that
>>> pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
>>> enumeration of bus 0, call xen_add_device() for every device there, call
>>> setup_mcfg_map() for root bus X, etc.
>> Ah, yes. Multiple busses.
>>
>> If that's the case then why don't we need to call xen_mcfg_late() for
>> the first device on each bus?
>>
> Ideally, yes - we'd like to call it for every bus discovered. But boot
> time buses are already in MCFG (otherwise system boot might not simply
> work as Jan pointed out) so it's not strictly required. The only case is
> a potential PCI bus hot-plug but I'm not sure it actually works in
> practice and we certainly didn't support it before. It might be solved
> theoretically by subscribing to acpi_bus_type that is available after
> acpi_init().

OK. Then *I think* we can drop late_initcall() but I would really like
to hear when others think.

-boris





Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Igor Druzhinin
On 10/09/2019 18:48, Boris Ostrovsky wrote:
> On 9/10/19 5:46 AM, Igor Druzhinin wrote:
>> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
 On 09/09/2019 20:19, Boris Ostrovsky wrote:

> The other question I have is why you think it's worth keeping
> xen_mcfg_late() as a late initcall. How could MCFG info be updated
> between acpi_init() and late_initcalls being run? I'd think it can only
> happen when a new device is hotplugged.
>
 It was a precaution against setup_mcfg_map() calls that might add new
 areas that are not in MCFG table but for some reason have _CBA method.
 It's obviously a "firmware is broken" scenario so I don't have strong
 feelings to keep it here. Will prefer to remove in v2 if you want.
>>> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
>>> where you are calling xen_mcfg_late()?
>>>
>> setup_mcfg_map() calls are done in order of root bus discovery which
>> happens *after* the previous root bus has been enumerated. So the order
>> is: call setup_mcfg_map() for root bus 0, find that
>> pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
>> enumeration of bus 0, call xen_add_device() for every device there, call
>> setup_mcfg_map() for root bus X, etc.
> 
> Ah, yes. Multiple busses.
> 
> If that's the case then why don't we need to call xen_mcfg_late() for
> the first device on each bus?
> 

Ideally, yes - we'd like to call it for every bus discovered. But boot
time buses are already in MCFG (otherwise system boot might not simply
work as Jan pointed out) so it's not strictly required. The only case is
a potential PCI bus hot-plug but I'm not sure it actually works in
practice and we certainly didn't support it before. It might be solved
theoretically by subscribing to acpi_bus_type that is available after
acpi_init().

Igor


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Boris Ostrovsky
On 9/10/19 5:46 AM, Igor Druzhinin wrote:
> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>>> On 09/09/2019 20:19, Boris Ostrovsky wrote:
 On 9/8/19 7:37 PM, Igor Druzhinin wrote:
> On 09/09/2019 00:30, Boris Ostrovsky wrote:
>> On 9/8/19 5:11 PM, Igor Druzhinin wrote:
>>> On 08/09/2019 19:28, Boris Ostrovsky wrote:
 Would it be possible for us to parse MCFG ourselves in pci_xen_init()? 
 I
 realize that we'd be doing this twice (or maybe even three times since
 apparently both pci_arch_init()  and acpi_ini() do it).

>>> I don't thine it makes sense:
>>> a) it needs to be done after ACPI is initialized since we need to parse
>>> it to figure out the exact reserved region - that's why it's currently
>>> done in acpi_init() (see commit message for the reasons why)
>> Hmm... We should be able to parse ACPI tables by the time
>> pci_arch_init() is called. In fact, if you look at
>> pci_mmcfg_early_init() you will see that it does just that.
>>
> The point is not to parse MCFG after acpi_init but to parse DSDT for
> reserved resource which could be done only after ACPI initialization.
 OK, I think I understand now what you are trying to do --- you are
 essentially trying to account for the range inserted by
 setup_mcfg_map(), right?

>>> Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
>>> that's where MCFG areas are properly sized. 
>> pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT 
>> parsing, does it? setup_mcfg_map() OTOH does need it as it uses data from 
>> _CBA (or is it _CRS?), and I think that's why we can't parse MCFG prior to 
>> acpi_init(). So what I said above indeed won't work.
>>
> No, it uses is_acpi_reserved() (it's called indirectly so might be well
> hidden) to parse DSDT to find a reserved resource in it and size MCFG
> area accordingly. 


Right, I see it. Thanks for the explanation.


> setup_mcfg_map() is called for every root bus
> discovered and indeed tries to evaluate _CBA but at this point
> pci_mmcfg_late_init() has already finished MCFG registration for every
> cold-plugged bus (which information is described in MCFG table) so those
> calls are dummy.
>
>>> setup_mcfg_map() is mostly
>>> for bus hotplug where MCFG area is discovered by evaluating _CBA method;
>>> for cold-plugged buses it just confirms that MCFG area is already
>>> registered because it is mandated for them to be in MCFG table at boot time.
>>>
 The other question I have is why you think it's worth keeping
 xen_mcfg_late() as a late initcall. How could MCFG info be updated
 between acpi_init() and late_initcalls being run? I'd think it can only
 happen when a new device is hotplugged.

>>> It was a precaution against setup_mcfg_map() calls that might add new
>>> areas that are not in MCFG table but for some reason have _CBA method.
>>> It's obviously a "firmware is broken" scenario so I don't have strong
>>> feelings to keep it here. Will prefer to remove in v2 if you want.
>> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
>> where you are calling xen_mcfg_late()?
>>
> setup_mcfg_map() calls are done in order of root bus discovery which
> happens *after* the previous root bus has been enumerated. So the order
> is: call setup_mcfg_map() for root bus 0, find that
> pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
> enumeration of bus 0, call xen_add_device() for every device there, call
> setup_mcfg_map() for root bus X, etc.

Ah, yes. Multiple busses.

If that's the case then why don't we need to call xen_mcfg_late() for
the first device on each bus?

-boris


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Igor Druzhinin
On 10/09/2019 10:55, Jan Beulich wrote:
> On 10.09.2019 11:46, Igor Druzhinin wrote:
>> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
 Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
 that's where MCFG areas are properly sized. 
>>>
>>> pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT 
>>> parsing, does it? setup_mcfg_map() OTOH does need it as it uses data from 
>>> _CBA (or is it _CRS?), and I think that's why we can't parse MCFG prior to 
>>> acpi_init(). So what I said above indeed won't work.
>>>
>>
>> No, it uses is_acpi_reserved() (it's called indirectly so might be well
>> hidden) to parse DSDT to find a reserved resource in it and size MCFG
>> area accordingly. setup_mcfg_map() is called for every root bus
>> discovered and indeed tries to evaluate _CBA but at this point
>> pci_mmcfg_late_init() has already finished MCFG registration for every
>> cold-plugged bus (which information is described in MCFG table) so those
>> calls are dummy.
> 
> I don't think they're strictly dummy. Even for boot time available devices
> iirc there's no strict requirement for there to be respective data in MCFG.
> Such a requirement exists only for devices which are actually needed to
> start the OS (disk or network, perhaps video or alike), or maybe even just
> its loader.
> 

This was my interpretation of 4.1.3 of "PCI Frimware specification":
"Memory mapped configuration base addresses for non-hot pluggable host
bridges must be described using MCFG table." Although, I admit that
"non-hot pluggable" might mean available at boot as well.

Igor


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Jan Beulich
On 10.09.2019 11:46, Igor Druzhinin wrote:
> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>>> Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
>>> that's where MCFG areas are properly sized. 
>>
>> pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT 
>> parsing, does it? setup_mcfg_map() OTOH does need it as it uses data from 
>> _CBA (or is it _CRS?), and I think that's why we can't parse MCFG prior to 
>> acpi_init(). So what I said above indeed won't work.
>>
> 
> No, it uses is_acpi_reserved() (it's called indirectly so might be well
> hidden) to parse DSDT to find a reserved resource in it and size MCFG
> area accordingly. setup_mcfg_map() is called for every root bus
> discovered and indeed tries to evaluate _CBA but at this point
> pci_mmcfg_late_init() has already finished MCFG registration for every
> cold-plugged bus (which information is described in MCFG table) so those
> calls are dummy.

I don't think they're strictly dummy. Even for boot time available devices
iirc there's no strict requirement for there to be respective data in MCFG.
Such a requirement exists only for devices which are actually needed to
start the OS (disk or network, perhaps video or alike), or maybe even just
its loader.

Jan


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Igor Druzhinin
On 10/09/2019 02:47, Boris Ostrovsky wrote:
> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>>> On 9/8/19 7:37 PM, Igor Druzhinin wrote:
 On 09/09/2019 00:30, Boris Ostrovsky wrote:
> On 9/8/19 5:11 PM, Igor Druzhinin wrote:
>> On 08/09/2019 19:28, Boris Ostrovsky wrote:
>>> Would it be possible for us to parse MCFG ourselves in pci_xen_init()? I
>>> realize that we'd be doing this twice (or maybe even three times since
>>> apparently both pci_arch_init()  and acpi_ini() do it).
>>>
>> I don't thine it makes sense:
>> a) it needs to be done after ACPI is initialized since we need to parse
>> it to figure out the exact reserved region - that's why it's currently
>> done in acpi_init() (see commit message for the reasons why)
> Hmm... We should be able to parse ACPI tables by the time
> pci_arch_init() is called. In fact, if you look at
> pci_mmcfg_early_init() you will see that it does just that.
>
 The point is not to parse MCFG after acpi_init but to parse DSDT for
 reserved resource which could be done only after ACPI initialization.
>>> OK, I think I understand now what you are trying to do --- you are
>>> essentially trying to account for the range inserted by
>>> setup_mcfg_map(), right?
>>>
>> Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
>> that's where MCFG areas are properly sized. 
> 
> pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT 
> parsing, does it? setup_mcfg_map() OTOH does need it as it uses data from 
> _CBA (or is it _CRS?), and I think that's why we can't parse MCFG prior to 
> acpi_init(). So what I said above indeed won't work.
> 

No, it uses is_acpi_reserved() (it's called indirectly so might be well
hidden) to parse DSDT to find a reserved resource in it and size MCFG
area accordingly. setup_mcfg_map() is called for every root bus
discovered and indeed tries to evaluate _CBA but at this point
pci_mmcfg_late_init() has already finished MCFG registration for every
cold-plugged bus (which information is described in MCFG table) so those
calls are dummy.

>> setup_mcfg_map() is mostly
>> for bus hotplug where MCFG area is discovered by evaluating _CBA method;
>> for cold-plugged buses it just confirms that MCFG area is already
>> registered because it is mandated for them to be in MCFG table at boot time.
>>
>>> The other question I have is why you think it's worth keeping
>>> xen_mcfg_late() as a late initcall. How could MCFG info be updated
>>> between acpi_init() and late_initcalls being run? I'd think it can only
>>> happen when a new device is hotplugged.
>>>
>> It was a precaution against setup_mcfg_map() calls that might add new
>> areas that are not in MCFG table but for some reason have _CBA method.
>> It's obviously a "firmware is broken" scenario so I don't have strong
>> feelings to keep it here. Will prefer to remove in v2 if you want.
> 
> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
> where you are calling xen_mcfg_late()?
> 

setup_mcfg_map() calls are done in order of root bus discovery which
happens *after* the previous root bus has been enumerated. So the order
is: call setup_mcfg_map() for root bus 0, find that
pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
enumeration of bus 0, call xen_add_device() for every device there, call
setup_mcfg_map() for root bus X, etc.

Igor



Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-09 Thread Boris Ostrovsky
On 9/9/19 5:48 PM, Igor Druzhinin wrote:
> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>> On 9/8/19 7:37 PM, Igor Druzhinin wrote:
>>> On 09/09/2019 00:30, Boris Ostrovsky wrote:
 On 9/8/19 5:11 PM, Igor Druzhinin wrote:
> On 08/09/2019 19:28, Boris Ostrovsky wrote:
>> On 9/6/19 7:00 PM, Igor Druzhinin wrote:
>>> On 06/09/2019 23:30, Boris Ostrovsky wrote:
 Where is MCFG parsed? pci_arch_init()?
> It happens twice:
>>> 1) first time early one in pci_arch_init() that is arch_initcall - that
>>> time pci_mmcfg_list will be freed immediately there because MCFG area is
>>> not reserved in E820;
>>> 2) second time late one in acpi_init() which is subsystem_initcall right
>>> before where PCI enumeration starts - this time ACPI tables will be
>>> checked for a reserved resource and pci_mmcfg_list will be finally
>>> populated.
>>>
>>> The problem is that on a system that doesn't have MCFG area reserved in
>>> E820 pci_mmcfg_list is empty before acpi_init() and our PCI hooks are
>>> called in the same place. So MCFG is still not in use by Xen at this
>>> point since we haven't reached our xen_mcfg_late().
>> Would it be possible for us to parse MCFG ourselves in pci_xen_init()? I
>> realize that we'd be doing this twice (or maybe even three times since
>> apparently both pci_arch_init()  and acpi_ini() do it).
>>
> I don't thine it makes sense:
> a) it needs to be done after ACPI is initialized since we need to parse
> it to figure out the exact reserved region - that's why it's currently
> done in acpi_init() (see commit message for the reasons why)
 Hmm... We should be able to parse ACPI tables by the time
 pci_arch_init() is called. In fact, if you look at
 pci_mmcfg_early_init() you will see that it does just that.

>>> The point is not to parse MCFG after acpi_init but to parse DSDT for
>>> reserved resource which could be done only after ACPI initialization.
>> OK, I think I understand now what you are trying to do --- you are
>> essentially trying to account for the range inserted by
>> setup_mcfg_map(), right?
>>
> Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
> that's where MCFG areas are properly sized. 

pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT parsing, 
does it? setup_mcfg_map() OTOH does need it as it uses data from _CBA (or is it 
_CRS?), and I think that's why we can't parse MCFG prior to acpi_init(). So 
what I said above indeed won't work.

> setup_mcfg_map() is mostly
> for bus hotplug where MCFG area is discovered by evaluating _CBA method;
> for cold-plugged buses it just confirms that MCFG area is already
> registered because it is mandated for them to be in MCFG table at boot time.
>
>> The other question I have is why you think it's worth keeping
>> xen_mcfg_late() as a late initcall. How could MCFG info be updated
>> between acpi_init() and late_initcalls being run? I'd think it can only
>> happen when a new device is hotplugged.
>>
> It was a precaution against setup_mcfg_map() calls that might add new
> areas that are not in MCFG table but for some reason have _CBA method.
> It's obviously a "firmware is broken" scenario so I don't have strong
> feelings to keep it here. Will prefer to remove in v2 if you want.

Isn't setup_mcfg_map() called before the first xen_add_device() which is where 
you are calling xen_mcfg_late()?


-boris