Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-11-02 Thread Jan Beulich
>>> On 01.11.16 at 15:39,  wrote:
> Also I realized that "Range Size and Alignment Requirement" aren't meet
> with the code I wrote - as the size (2^n) must be aligned on the
> 2^n boundary, and that is certainly not meet.

Yes, this would better be obeyed to.

> Anyhow the point here is that with modifications here I will
> still run in the variable MTRR limit if I am to cover most of the
> space. I can do up to a certain value. And that 'value' could
> become the pci_high_mem_end?

Yes - moving the boundary to require fewer MTRRs is certainly
an option. Also remember that we are required to leave a few
MTRRs for OS use.

> Or perhaps revisit a6a822324:
> Author: Keir Fraser 
> Date:   Wed Apr 16 13:36:44 2008 +0100
> 
> x86, hvm: Lots of MTRR/PAT emulation cleanup.
> 
>  - Move MTRR MSR initialisation into hvmloader.
>  - Simplify initialisation logic by overlaying UC on default WB rather
>than vice versa.
>  - Clean up hypervisor HVM MTRR/PAE code's interface with rest of
>hypervisor.
> 
> 
> As the default MTRR is WB. If that was UC we could just set MTRRs
> for RAM regions and have the type be WB for those regions?
> 
> I am not sure thought if that is a good direction either?

Actually I think we should pick the variant requiring fewer MTRRs.
I've seen BIOSes of both kinds. Otoh I've never been really
convinced using WB as the default is really that good an idea.

> And that actually worked out nicely. Linux sees the new _CRS regions
> and I got [this includes two extra regions - so that the HT region
> is not touched]:
> 
>  ...
>  pci_bus :00: root bus resource [io  0x-0x0cf7 window]
>  pci_bus :00: root bus resource [io  0x0d00-0x window]
>  pci_bus :00: root bus resource [mem 0x000a-0x000b window]
>  pci_bus :00: root bus resource [mem 0xf000-0xfbff window]
>  pci_bus :00: root bus resource [mem 0x10fc0-0xfcfffe window]
>  pci_bus :00: root bus resource [mem 0x100-0x 
> window]
>  pci_bus :00: root bus resource [bus 00-ff]
> 
> from:
> pci_bus :00: root bus resource [io  0x-0x0cf7 window]
> pci_bus :00: root bus resource [io  0x0d00-0x window]
> pci_bus :00: root bus resource [mem 0x000a-0x000b window]
> pci_bus :00: root bus resource [mem 0xe000-0xfbff window]
> pci_bus :00: root bus resource [bus 00-ff]
> 
> Except that when I tried this with Windows 2000 I found out that
> its AML interpreter blows up if any of the values are bigger than
> 8GB. With a bit of extra AML duct-tape that got solved, albeit I need
> to verify other Windows platforms. Which reminds me - you had dabbled
> in this - are there any other surprises I should be aware of ?

The only thing I remember is the WinXP issue with qword fields (as
mentioned in a comment in dsdt.asl).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-11-01 Thread Konrad Rzeszutek Wilk
. snip..
> I modified it be subtractive, and got it to start with
> large areas and then smaller and smaller:
> 
> (d2)  - CPU0 ... 36-bit phys ... fixed MTRRs ... Cover @04344(MB) to 
> 65536(M
> (d2) B) with 7 MTRRs.
> (d2) MTRR 1 @04344(MB)  37112(MB)
> (d2) MTRR 2 @37112(MB)  53496(MB)
> (d2) MTRR 3 @53496(MB)  61688(MB)
> (d2) MTRR 4 @61688(MB)  63736(MB)
> (d2) MTRR 5 @63736(MB)  64760(MB)
> (d2) MTRR 6 @64760(MB)  65272(MB)
> (d2) MTRR 7 @65272(MB)  65528(MB)
> (d2) var MTRRs [8/8] ... done.
> 
> But of course on 48-bit hardware, even with this we ran out of MTRRs:
> (d1)  - CPU0 ... 48-bit phys ... fixed MTRRs ... Cover @04344(MB) to 
> 0268435456(
> (d1) MB) with 7 MTRRs.
> (d1) MTRR 1 @04344(MB)  0134222072(MB)
> (d1) MTRR 2 @0134222072(MB) 0201330936(MB)
> (d1) MTRR 3 @0201330936(MB) 0234885368(MB)
> (d1) MTRR 4 @0234885368(MB) 0251662584(MB)
> (d1) MTRR 5 @0251662584(MB) 0260051192(MB)
> (d1) MTRR 6 @0260051192(MB) 0264245496(MB)
> (d1) MTRR 7 @0264245496(MB) 0266342648(MB)
> (d1) var MTRRs [8/8] ... done.

For comparison here is what the existing code does (pls ignore the 'MTRR 1'):

(d35) MB) with 7 MTRRs.
(d35) MTRR 1@04344(MB)  04352(MB)[8(MB)]
(d35) MTRR 1@04352(MB)  04608(MB)[00256(MB)]
(d35) MTRR 1@04608(MB)  05120(MB)[00512(MB)]
(d35) MTRR 1@05120(MB)  06144(MB)[01024(MB)]
(d35) MTRR 1@06144(MB)  08192(MB)[02048(MB)]
(d35) MTRR 1@08192(MB)  16384(MB)[08192(MB)]
(d35) MTRR 1@16384(MB)  32768(MB)[16384(MB)]

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-11-01 Thread Konrad Rzeszutek Wilk
On Thu, Oct 13, 2016 at 03:20:24AM -0600, Jan Beulich wrote:
> >>> On 12.10.16 at 23:15,  wrote:
> > On Wed, Sep 28, 2016 at 03:21:08AM -0600, Jan Beulich wrote:
> >> >>> On 27.09.16 at 16:43,  wrote:
> >> > If the guest is booted with 'pci' we nicely expand the MMIO region below
> >> > 4GB and try to fit in the BARs in there. If that fails (not enough
> >> > space) we move it above the memory (64-bit). And throughout all of this
> >> > we also update the _CRS field to cover these ranges.
> >> > 
> >> > (Note, I need to check if the 64-bit area is also set, I think it is).
> >> > 
> >> > But the situation is different if we hot-plug a device that has too big
> >> > BAR to fit in the MMIO region. We move it in the 64-bit area but we
> >> > don't update the _CRS. Which means that Linux will complain (unless
> >> > booted with pci=nocrs)). Not sure about Windows but I would assume so
> >> > to.
> >> > 
> >> > I was wondering what would be a good way to solve this? I looked at some
> >> > Dell machines to see how they deal with hotplug PCIe devices and they
> >> > just declared all the memory in the _CRS (including RAM).
> >> > 
> >> > We could do a hybrid - during bootup make the _CRS region have entry from
> >> > end of RAM to .. end of memory?
> >> 
> >> End of physical address space you mean? Generally yes, but we
> >> need to be a little careful there: For one, on AMD we'd better not
> >> overlap with the HT area. And then there's this MTRR related
> >> comment next to the setting of pci_hi_mem_end (albeit both HT
> >> area start and end of PA space should be aligned well enough).

This got interesting. The existing code that sets the variable
MTRR ran out of MTRRs to cover say 1<<36 of space. The reason
is that it starts at low granularity sizes (4KB) and then builds up
from there. To cover say from 4GB to 64GB we ran out of MTRRs.
I modified it be subtractive, and got it to start with
large areas and then smaller and smaller:

(d2)  - CPU0 ... 36-bit phys ... fixed MTRRs ... Cover @04344(MB) to 
65536(M
(d2) B) with 7 MTRRs.
(d2) MTRR 1 @04344(MB)  37112(MB)
(d2) MTRR 2 @37112(MB)  53496(MB)
(d2) MTRR 3 @53496(MB)  61688(MB)
(d2) MTRR 4 @61688(MB)  63736(MB)
(d2) MTRR 5 @63736(MB)  64760(MB)
(d2) MTRR 6 @64760(MB)  65272(MB)
(d2) MTRR 7 @65272(MB)  65528(MB)
(d2) var MTRRs [8/8] ... done.

But of course on 48-bit hardware, even with this we ran out of MTRRs:
(d1)  - CPU0 ... 48-bit phys ... fixed MTRRs ... Cover @04344(MB) to 
0268435456(
(d1) MB) with 7 MTRRs.
(d1) MTRR 1 @04344(MB)  0134222072(MB)
(d1) MTRR 2 @0134222072(MB) 0201330936(MB)
(d1) MTRR 3 @0201330936(MB) 0234885368(MB)
(d1) MTRR 4 @0234885368(MB) 0251662584(MB)
(d1) MTRR 5 @0251662584(MB) 0260051192(MB)
(d1) MTRR 6 @0260051192(MB) 0264245496(MB)
(d1) MTRR 7 @0264245496(MB) 0266342648(MB)
(d1) var MTRRs [8/8] ... done.

[I figured that it would be OK to set the UC MTRR even for the
HT region: FC   -> FF   as you surely don't want WB there?]

Then it ocurred to me that maybe I am overthinking it and
should just pick the biggest one:

(d32) Multiprocessor initialisation:
(d32)  - CPU0 ... 48-bit phys ... fixed MTRRs ... Cover @04344(MB) to 
0268435456(
(d32) MB) with 7 MTRRs.
(d32) MTRR 1@04344(MB)  0268439800(MB)
(d32) var MTRRs [1/8] ... done.

Which would cover _past_ the CPU end, but that surely won't be healthy
to the CPU? The Intel SDM doesn't mention what happens then.

Also I realized that "Range Size and Alignment Requirement" aren't meet
with the code I wrote - as the size (2^n) must be aligned on the
2^n boundary, and that is certainly not meet.

Anyhow the point here is that with modifications here I will
still run in the variable MTRR limit if I am to cover most of the
space. I can do up to a certain value. And that 'value' could
become the pci_high_mem_end?

Or perhaps revisit a6a822324:
Author: Keir Fraser 
Date:   Wed Apr 16 13:36:44 2008 +0100

x86, hvm: Lots of MTRR/PAT emulation cleanup.

 - Move MTRR MSR initialisation into hvmloader.
 - Simplify initialisation logic by overlaying UC on default WB rather
   than vice versa.
 - Clean up hypervisor HVM MTRR/PAE code's interface with rest of
   hypervisor.


As the default MTRR is WB. If that was UC we could just set MTRRs
for RAM regions and have the type be WB for those regions?

I am not sure thought if that is a good direction either?

> >> 
> >> > Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
> >> > perhaps modify the last _CRS entry) when PCIe devices are hotplugged?
> >> 
> >> While that would be the most flexible variant, I'd be afraid of this
> >> getting rather complicated. Or have you already got some
> >> reasonable layout of how this would look like?
> > 
> > I did this and while 

Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-10-13 Thread Jan Beulich
>>> On 12.10.16 at 23:15,  wrote:
> On Wed, Sep 28, 2016 at 03:21:08AM -0600, Jan Beulich wrote:
>> >>> On 27.09.16 at 16:43,  wrote:
>> > If the guest is booted with 'pci' we nicely expand the MMIO region below
>> > 4GB and try to fit in the BARs in there. If that fails (not enough
>> > space) we move it above the memory (64-bit). And throughout all of this
>> > we also update the _CRS field to cover these ranges.
>> > 
>> > (Note, I need to check if the 64-bit area is also set, I think it is).
>> > 
>> > But the situation is different if we hot-plug a device that has too big
>> > BAR to fit in the MMIO region. We move it in the 64-bit area but we
>> > don't update the _CRS. Which means that Linux will complain (unless
>> > booted with pci=nocrs)). Not sure about Windows but I would assume so
>> > to.
>> > 
>> > I was wondering what would be a good way to solve this? I looked at some
>> > Dell machines to see how they deal with hotplug PCIe devices and they
>> > just declared all the memory in the _CRS (including RAM).
>> > 
>> > We could do a hybrid - during bootup make the _CRS region have entry from
>> > end of RAM to .. end of memory?
>> 
>> End of physical address space you mean? Generally yes, but we
>> need to be a little careful there: For one, on AMD we'd better not
>> overlap with the HT area. And then there's this MTRR related
>> comment next to the setting of pci_hi_mem_end (albeit both HT
>> area start and end of PA space should be aligned well enough).
>> 
>> > Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
>> > perhaps modify the last _CRS entry) when PCIe devices are hotplugged?
>> 
>> While that would be the most flexible variant, I'd be afraid of this
>> getting rather complicated. Or have you already got some
>> reasonable layout of how this would look like?
> 
> I did this and while all the plumbing works great and I can see that
> the pci_hi_len gets incremented by the size of the 64-bit BARS of the
> new device (and also decremented if hot-unplugged) I hit a snag:
> 
> Linux evaluates this only once (actually twice, but only during bootup).

Ah - quite reasonable to expect this won't change.

> For right now let me jump with the "simpler" solution of just
> hardcoding the end of physical address space and see how that works out.

Right.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-10-12 Thread Konrad Rzeszutek Wilk
On Wed, Sep 28, 2016 at 03:21:08AM -0600, Jan Beulich wrote:
> >>> On 27.09.16 at 16:43,  wrote:
> > If the guest is booted with 'pci' we nicely expand the MMIO region below
> > 4GB and try to fit in the BARs in there. If that fails (not enough
> > space) we move it above the memory (64-bit). And throughout all of this
> > we also update the _CRS field to cover these ranges.
> > 
> > (Note, I need to check if the 64-bit area is also set, I think it is).
> > 
> > But the situation is different if we hot-plug a device that has too big
> > BAR to fit in the MMIO region. We move it in the 64-bit area but we
> > don't update the _CRS. Which means that Linux will complain (unless
> > booted with pci=nocrs)). Not sure about Windows but I would assume so
> > to.
> > 
> > I was wondering what would be a good way to solve this? I looked at some
> > Dell machines to see how they deal with hotplug PCIe devices and they
> > just declared all the memory in the _CRS (including RAM).
> > 
> > We could do a hybrid - during bootup make the _CRS region have entry from
> > end of RAM to .. end of memory?
> 
> End of physical address space you mean? Generally yes, but we
> need to be a little careful there: For one, on AMD we'd better not
> overlap with the HT area. And then there's this MTRR related
> comment next to the setting of pci_hi_mem_end (albeit both HT
> area start and end of PA space should be aligned well enough).
> 
> > Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
> > perhaps modify the last _CRS entry) when PCIe devices are hotplugged?
> 
> While that would be the most flexible variant, I'd be afraid of this
> getting rather complicated. Or have you already got some
> reasonable layout of how this would look like?

I did this and while all the plumbing works great and I can see that
the pci_hi_len gets incremented by the size of the 64-bit BARS of the
new device (and also decremented if hot-unplugged) I hit a snag:

Linux evaluates this only once (actually twice, but only during bootup).

That is if I did the hotplug when the guest is in GRUB and boot
Linux is quite happy. But if I did it after Linux has booted the
PNP0A03 _CRS is not evaluated again.

The only way I can see it evaulating this is if a new bridge
is added and DMAR hotplug support ("Remapping Hardware Unit Hot Plug")
is exposed to the guest. See in Linux code acpi_pci_root_add and
if (hotadd && dmar_device_add(handle))

This means: 
- adding in QEMU bridge support for each new hotplugged device, 
- and Intel VT-d in the guest support.

That I think will take a bit of time to get right.

For right now let me jump with the "simpler" solution of just
hardcoding the end of physical address space and see how that works out.

> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-09-28 Thread Konrad Rzeszutek Wilk
On Wed, Sep 28, 2016 at 03:21:08AM -0600, Jan Beulich wrote:
> >>> On 27.09.16 at 16:43,  wrote:
> > If the guest is booted with 'pci' we nicely expand the MMIO region below
> > 4GB and try to fit in the BARs in there. If that fails (not enough
> > space) we move it above the memory (64-bit). And throughout all of this
> > we also update the _CRS field to cover these ranges.
> > 
> > (Note, I need to check if the 64-bit area is also set, I think it is).
> > 
> > But the situation is different if we hot-plug a device that has too big
> > BAR to fit in the MMIO region. We move it in the 64-bit area but we
> > don't update the _CRS. Which means that Linux will complain (unless
> > booted with pci=nocrs)). Not sure about Windows but I would assume so
> > to.
> > 
> > I was wondering what would be a good way to solve this? I looked at some
> > Dell machines to see how they deal with hotplug PCIe devices and they
> > just declared all the memory in the _CRS (including RAM).
> > 
> > We could do a hybrid - during bootup make the _CRS region have entry from
> > end of RAM to .. end of memory?
> 
> End of physical address space you mean? Generally yes, but we

Yes.
> need to be a little careful there: For one, on AMD we'd better not
> overlap with the HT area. And then there's this MTRR related
> comment next to the setting of pci_hi_mem_end (albeit both HT
> area start and end of PA space should be aligned well enough).


> 
> > Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
> > perhaps modify the last _CRS entry) when PCIe devices are hotplugged?
> 
> While that would be the most flexible variant, I'd be afraid of this
> getting rather complicated. Or have you already got some
> reasonable layout of how this would look like?

Nothing yet sadly, just soliciting input at this point.

Thanks again for the tidbit about HT.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-09-28 Thread Jan Beulich
>>> On 27.09.16 at 16:43,  wrote:
> If the guest is booted with 'pci' we nicely expand the MMIO region below
> 4GB and try to fit in the BARs in there. If that fails (not enough
> space) we move it above the memory (64-bit). And throughout all of this
> we also update the _CRS field to cover these ranges.
> 
> (Note, I need to check if the 64-bit area is also set, I think it is).
> 
> But the situation is different if we hot-plug a device that has too big
> BAR to fit in the MMIO region. We move it in the 64-bit area but we
> don't update the _CRS. Which means that Linux will complain (unless
> booted with pci=nocrs)). Not sure about Windows but I would assume so
> to.
> 
> I was wondering what would be a good way to solve this? I looked at some
> Dell machines to see how they deal with hotplug PCIe devices and they
> just declared all the memory in the _CRS (including RAM).
> 
> We could do a hybrid - during bootup make the _CRS region have entry from
> end of RAM to .. end of memory?

End of physical address space you mean? Generally yes, but we
need to be a little careful there: For one, on AMD we'd better not
overlap with the HT area. And then there's this MTRR related
comment next to the setting of pci_hi_mem_end (albeit both HT
area start and end of PA space should be aligned well enough).

> Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
> perhaps modify the last _CRS entry) when PCIe devices are hotplugged?

While that would be the most flexible variant, I'd be afraid of this
getting rather complicated. Or have you already got some
reasonable layout of how this would look like?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] PCIe devices that are hotplugged after MMIO has been setup fail due to _CRS not covering 64-bit area

2016-09-27 Thread Konrad Rzeszutek Wilk
Hey!

If the guest is booted with 'pci' we nicely expand the MMIO region below
4GB and try to fit in the BARs in there. If that fails (not enough
space) we move it above the memory (64-bit). And throughout all of this
we also update the _CRS field to cover these ranges.

(Note, I need to check if the 64-bit area is also set, I think it is).

But the situation is different if we hot-plug a device that has too big
BAR to fit in the MMIO region. We move it in the 64-bit area but we
don't update the _CRS. Which means that Linux will complain (unless
booted with pci=nocrs)). Not sure about Windows but I would assume so
to.

I was wondering what would be a good way to solve this? I looked at some
Dell machines to see how they deal with hotplug PCIe devices and they
just declared all the memory in the _CRS (including RAM).

We could do a hybrid - during bootup make the _CRS region have entry from
end of RAM to .. end of memory?

Or perhaps add some extra logic between QEMU and ACPI AML to expand (or
perhaps modify the last _CRS entry) when PCIe devices are hotplugged?

I am wondering what folks think is the best way going forward?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel