Re: [Qemu-devel] How to reserve guest physical region for ACPI

2016-01-07 Thread Igor Mammedov
On Tue, 5 Jan 2016 18:43:02 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote:
> > > > bios-linker-loader is a great interface for initializing some
> > > > guest owned data and linking it together but I think it adds
> > > > unnecessary complexity and is misused if it's used to handle
> > > > device owned data/on device memory in this and VMGID cases.
> > > 
> > > I want a generic interface for guest to enumerate these things.  linker
> > > seems quite reasonable but if you see a reason why it won't do, or want
> > > to propose a better interface, fine.
> > > 
> > > PCI would do, too - though windows guys had concerns about
> > > returning PCI BARs from ACPI.  
> > There were potential issues with pSeries bootloader that treated
> > PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
> > Could you point out to discussion about windows issues?
> > 
> > What VMGEN patches that used PCI for mapping purposes were
> > stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
> > class id but we couldn't agree on it.
> > 
> > VMGEN v13 with full discussion is here
> > https://patchwork.ozlabs.org/patch/443554/
> > So to continue with this route we would need to pick some other
> > driver less class id so windows won't prompt for driver or
> > maybe supply our own driver stub to guarantee that no one
> > would touch it. Any suggestions?  
> 
> Pick any device/vendor id pair for which windows specifies no driver.
> There's a small risk that this will conflict with some
> guest but I think it's minimal.
device/vendor id pair was QEMU specific so doesn't conflicts with anything
issue we were trying to solve was to prevent windows asking for driver
even though it does so only once if told not to ask again.

That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less
device descriptor in INF file which matches as the last resort if
there isn't any other diver that's matched device by device/vendor id pair.

> 
> 
> > > 
> > >   
> > > > There was RFC on list to make BIOS boot from NVDIMM already
> > > > doing some ACPI table lookup/parsing. Now if they were forced
> > > > to also parse and execute AML to initialize QEMU with guest
> > > > allocated address that would complicate them quite a bit.
> > > 
> > > If they just need to find a table by name, it won't be
> > > too bad, would it?  
> > that's what they were doing scanning memory for static NVDIMM table.
> > However if it were DataTable, BIOS side would have to execute
> > AML so that the table address could be told to QEMU.  
> 
> Not at all. You can find any table by its signature without
> parsing AML.
yep, and then BIOS would need to tell its address to QEMU
writing to IO port which is allocated statically in QEMU
for this purpose and is described in AML only on guest side.

> 
> 
> > In case of direct mapping or PCI BAR there is no need to initialize
> > QEMU side from AML.
> > That also saves us IO port where this address should be written
> > if bios-linker-loader approach is used.
> >   
> > >   
> > > > While with NVDIMM control memory region mapped directly by QEMU,
> > > > respective patches don't need in any way to initialize QEMU,
> > > > all they would need just read necessary data from control region.
> > > > 
> > > > Also using bios-linker-loader takes away some usable RAM
> > > > from guest and in the end that doesn't scale,
> > > > the more devices I add the less usable RAM is left for guest OS
> > > > while all the device needs is a piece of GPA address space
> > > > that would belong to it.
> > > 
> > > I don't get this comment. I don't think it's MMIO that is wanted.
> > > If it's backed by qemu virtual memory then it's RAM.  
> > Then why don't allocate video card VRAM the same way and try to explain
> > user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb'
> > only has 64Mb of available RAM because of we think that on device VRAM
> > is also RAM.
> > 
> > Maybe I've used MMIO term wrongly here but it roughly reflects the idea
> > that on device memory (whether it's VRAM, NVDIMM control block or VMGEN
> > area) is not allocated from guest's usable RAM (as described in E820)
> > but rather directly mapped in guest's GPA and doesn't consume available
> > RAM as guest sees it. That's also the way it's done on real hardware.
> > 
> > What we need in case of VMGEN ID and NVDIMM is on device memory
> > that could be directly accessed by guest.
> > Both direct mapping or PCI BAR do that job and we could use simple
> > static AML without any patching.  
> 
> At least with VMGEN the issue is that there's an AML method
> that returns the physical address.
> Then if guest OS moves the BAR (which is legal), it will break
> since caller has no way to know it's related to the BAR.
I've found a following MS doc "Firmware Allocation of PCI Device Resources in 
Windows". It looks like when MS implemented resource rebalancing in

Re: [Qemu-devel] How to reserve guest physical region for ACPI

2016-01-07 Thread Michael S. Tsirkin
On Thu, Jan 07, 2016 at 11:30:25AM +0100, Igor Mammedov wrote:
> On Tue, 5 Jan 2016 18:43:02 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote:
> > > > > bios-linker-loader is a great interface for initializing some
> > > > > guest owned data and linking it together but I think it adds
> > > > > unnecessary complexity and is misused if it's used to handle
> > > > > device owned data/on device memory in this and VMGID cases.
> > > > 
> > > > I want a generic interface for guest to enumerate these things.  linker
> > > > seems quite reasonable but if you see a reason why it won't do, or want
> > > > to propose a better interface, fine.
> > > > 
> > > > PCI would do, too - though windows guys had concerns about
> > > > returning PCI BARs from ACPI.  
> > > There were potential issues with pSeries bootloader that treated
> > > PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
> > > Could you point out to discussion about windows issues?
> > > 
> > > What VMGEN patches that used PCI for mapping purposes were
> > > stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
> > > class id but we couldn't agree on it.
> > > 
> > > VMGEN v13 with full discussion is here
> > > https://patchwork.ozlabs.org/patch/443554/
> > > So to continue with this route we would need to pick some other
> > > driver less class id so windows won't prompt for driver or
> > > maybe supply our own driver stub to guarantee that no one
> > > would touch it. Any suggestions?  
> > 
> > Pick any device/vendor id pair for which windows specifies no driver.
> > There's a small risk that this will conflict with some
> > guest but I think it's minimal.
> device/vendor id pair was QEMU specific so doesn't conflicts with anything
> issue we were trying to solve was to prevent windows asking for driver
> even though it does so only once if told not to ask again.
> 
> That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less
> device descriptor in INF file which matches as the last resort if
> there isn't any other diver that's matched device by device/vendor id pair.

I think this is the only class in this inf.
If you can't use it, you must use an existing device/vendor id pair,
there's some risk involved but probably not much.

> > 
> > 
> > > > 
> > > >   
> > > > > There was RFC on list to make BIOS boot from NVDIMM already
> > > > > doing some ACPI table lookup/parsing. Now if they were forced
> > > > > to also parse and execute AML to initialize QEMU with guest
> > > > > allocated address that would complicate them quite a bit.
> > > > 
> > > > If they just need to find a table by name, it won't be
> > > > too bad, would it?  
> > > that's what they were doing scanning memory for static NVDIMM table.
> > > However if it were DataTable, BIOS side would have to execute
> > > AML so that the table address could be told to QEMU.  
> > 
> > Not at all. You can find any table by its signature without
> > parsing AML.
> yep, and then BIOS would need to tell its address to QEMU
> writing to IO port which is allocated statically in QEMU
> for this purpose and is described in AML only on guest side.

io ports are an ABI too but they are way easier to
maintain.

> > 
> > 
> > > In case of direct mapping or PCI BAR there is no need to initialize
> > > QEMU side from AML.
> > > That also saves us IO port where this address should be written
> > > if bios-linker-loader approach is used.
> > >   
> > > >   
> > > > > While with NVDIMM control memory region mapped directly by QEMU,
> > > > > respective patches don't need in any way to initialize QEMU,
> > > > > all they would need just read necessary data from control region.
> > > > > 
> > > > > Also using bios-linker-loader takes away some usable RAM
> > > > > from guest and in the end that doesn't scale,
> > > > > the more devices I add the less usable RAM is left for guest OS
> > > > > while all the device needs is a piece of GPA address space
> > > > > that would belong to it.
> > > > 
> > > > I don't get this comment. I don't think it's MMIO that is wanted.
> > > > If it's backed by qemu virtual memory then it's RAM.  
> > > Then why don't allocate video card VRAM the same way and try to explain
> > > user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb'
> > > only has 64Mb of available RAM because of we think that on device VRAM
> > > is also RAM.
> > > 
> > > Maybe I've used MMIO term wrongly here but it roughly reflects the idea
> > > that on device memory (whether it's VRAM, NVDIMM control block or VMGEN
> > > area) is not allocated from guest's usable RAM (as described in E820)
> > > but rather directly mapped in guest's GPA and doesn't consume available
> > > RAM as guest sees it. That's also the way it's done on real hardware.
> > > 
> > > What we need in case of VMGEN ID and NVDIMM is on device memory
> > > that could be directly accessed by guest.
> > > Both 

Re: How to reserve guest physical region for ACPI

2016-01-07 Thread Igor Mammedov
On Mon, 4 Jan 2016 21:17:31 +0100
Laszlo Ersek  wrote:

> Michael CC'd me on the grandparent of the email below. I'll try to add
> my thoughts in a single go, with regard to OVMF.
> 
> On 12/30/15 20:52, Michael S. Tsirkin wrote:
> > On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
> >> On Mon, 28 Dec 2015 14:50:15 +0200
> >> "Michael S. Tsirkin"  wrote:
> >>  
> >>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
> 
>  Hi Michael, Paolo,
> 
>  Now it is the time to return to the challenge that how to reserve guest
>  physical region internally used by ACPI.
> 
>  Igor suggested that:
>  | An alternative place to allocate reserve from could be high memory.
>  | For pc we have "reserved-memory-end" which currently makes sure
>  | that hotpluggable memory range isn't used by firmware
>  (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) 
>   
> 
> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> reason is that nobody wrote that patch, nor asked for the patch to be
> written. (Not implying that just requesting the patch would be
> sufficient for the patch to be written.)
Hijacking this part of thread to check if OVMF would work with memory-hotplug
and if it needs "reserved-memory-end" support at all.

How OVMF determines which GPA ranges to use for initializing PCI BARs
at boot time, more specifically 64-bit BARs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to reserve guest physical region for ACPI

2016-01-06 Thread Igor Mammedov
On Tue, 5 Jan 2016 18:22:33 +0100
Laszlo Ersek  wrote:

> On 01/05/16 18:08, Igor Mammedov wrote:
> > On Mon, 4 Jan 2016 21:17:31 +0100
> > Laszlo Ersek  wrote:
> >   
> >> Michael CC'd me on the grandparent of the email below. I'll try to add
> >> my thoughts in a single go, with regard to OVMF.
> >>
> >> On 12/30/15 20:52, Michael S. Tsirkin wrote:  
> >>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>  On Mon, 28 Dec 2015 14:50:15 +0200
>  "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:
> >>
> >> Hi Michael, Paolo,
> >>
> >> Now it is the time to return to the challenge that how to reserve guest
> >> physical region internally used by ACPI.
> >>
> >> Igor suggested that:
> >> | An alternative place to allocate reserve from could be high memory.
> >> | For pc we have "reserved-memory-end" which currently makes sure
> >> | that hotpluggable memory range isn't used by firmware
> >> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> >> 
> >>
> >> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> >> reason is that nobody wrote that patch, nor asked for the patch to be
> >> written. (Not implying that just requesting the patch would be
> >> sufficient for the patch to be written.)
> >>  
> > I don't want to tie things to reserved-memory-end because this
> > does not scale: next time we need to reserve memory,
> > we'll need to find yet another way to figure out what is where.
>  Could you elaborate a bit more on a problem you're seeing?
> 
>  To me it looks like it scales rather well.
>  For example lets imagine that we adding a device
>  that has some on device memory that should be mapped into GPA
>  code to do so would look like:
> 
>    pc_machine_device_plug_cb(dev)
>    {
> ...
> if (dev == OUR_NEW_DEVICE_TYPE) {
> memory_region_add_subregion(as, current_reserved_end, >mr);
> set_new_reserved_end(current_reserved_end + 
>  memory_region_size(>mr));
> }
>    }
> 
>  we can practically add any number of new devices that way.
> >>>
> >>> Yes but we'll have to build a host side allocator for these, and that's
> >>> nasty. We'll also have to maintain these addresses indefinitely (at
> >>> least per machine version) as they are guest visible.
> >>> Not only that, there's no way for guest to know if we move things
> >>> around, so basically we'll never be able to change addresses.
> >>>
> >>> 
>   
> > I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> > support 64 bit RAM instead
> >>
> >> This looks quite doable in OVMF, as long as the blob to allocate from
> >> high memory contains *zero* ACPI tables.
> >>
> >> (
> >> Namely, each ACPI table is installed from the containing fw_cfg blob
> >> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
> >> own allocation policy for the *copies* of ACPI tables it installs.
> >>
> >> This allocation policy is left unspecified in the section of the UEFI
> >> spec that governs EFI_ACPI_TABLE_PROTOCOL.
> >>
> >> The current policy in edk2 (= the reference implementation) seems to be
> >> "allocate from under 4GB". It is currently being changed to "try to
> >> allocate from under 4GB, and if that fails, retry from high memory". (It
> >> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
> >> )
> >>  
> > (and maybe a way to allocate and
> > zero-initialize buffer without loading it through fwcfg),
> >>
> >> Sounds reasonable.
> >>  
> > this way bios
> > does the allocation, and addresses can be patched into acpi.
>  and then guest side needs to parse/execute some AML that would
>  initialize QEMU side so it would know where to write data.
> >>>
> >>> Well not really - we can put it in a data table, by itself
> >>> so it's easy to find.
> >>
> >> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
> >> acpi_get_table_with_size()?
> >>  
> >>>
> >>> AML is only needed if access from ACPI is desired.
> >>>
> >>> 
>  bios-linker-loader is a great interface for initializing some
>  guest owned data and linking it together but I think it adds
>  unnecessary complexity and is misused if it's used to handle
>  device owned data/on device memory in this and VMGID cases.
> >>>
> >>> I want a generic interface for guest to enumerate these things.  linker
> >>> seems quite reasonable but if you see a reason why it won't do, or want
> >>> to propose a better interface, fine.
> >>
> >> * The guest could do the following:
> >> - while processing the ALLOCATE commands, it would make a note where in
> >> GPA space each fw_cfg blob gets allocated
> >> - at the 

Re: How to reserve guest physical region for ACPI

2016-01-06 Thread Laszlo Ersek
On 01/06/16 14:39, Igor Mammedov wrote:
> On Tue, 5 Jan 2016 18:22:33 +0100
> Laszlo Ersek  wrote:
> 
>> On 01/05/16 18:08, Igor Mammedov wrote:
>>> On Mon, 4 Jan 2016 21:17:31 +0100
>>> Laszlo Ersek  wrote:
>>>   
 Michael CC'd me on the grandparent of the email below. I'll try to add
 my thoughts in a single go, with regard to OVMF.

 On 12/30/15 20:52, Michael S. Tsirkin wrote:  
> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>> On Mon, 28 Dec 2015 14:50:15 +0200
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:

 Hi Michael, Paolo,

 Now it is the time to return to the challenge that how to reserve guest
 physical region internally used by ACPI.

 Igor suggested that:
 | An alternative place to allocate reserve from could be high memory.
 | For pc we have "reserved-memory-end" which currently makes sure
 | that hotpluggable memory range isn't used by firmware
 (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
 

 OVMF has no support for the "reserved-memory-end" fw_cfg file. The
 reason is that nobody wrote that patch, nor asked for the patch to be
 written. (Not implying that just requesting the patch would be
 sufficient for the patch to be written.)
  
>>> I don't want to tie things to reserved-memory-end because this
>>> does not scale: next time we need to reserve memory,
>>> we'll need to find yet another way to figure out what is where.
>> Could you elaborate a bit more on a problem you're seeing?
>>
>> To me it looks like it scales rather well.
>> For example lets imagine that we adding a device
>> that has some on device memory that should be mapped into GPA
>> code to do so would look like:
>>
>>   pc_machine_device_plug_cb(dev)
>>   {
>>...
>>if (dev == OUR_NEW_DEVICE_TYPE) {
>>memory_region_add_subregion(as, current_reserved_end, >mr);
>>set_new_reserved_end(current_reserved_end + 
>> memory_region_size(>mr));
>>}
>>   }
>>
>> we can practically add any number of new devices that way.
>
> Yes but we'll have to build a host side allocator for these, and that's
> nasty. We'll also have to maintain these addresses indefinitely (at
> least per machine version) as they are guest visible.
> Not only that, there's no way for guest to know if we move things
> around, so basically we'll never be able to change addresses.
>
> 
>>  
>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
>>> support 64 bit RAM instead

 This looks quite doable in OVMF, as long as the blob to allocate from
 high memory contains *zero* ACPI tables.

 (
 Namely, each ACPI table is installed from the containing fw_cfg blob
 with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
 own allocation policy for the *copies* of ACPI tables it installs.

 This allocation policy is left unspecified in the section of the UEFI
 spec that governs EFI_ACPI_TABLE_PROTOCOL.

 The current policy in edk2 (= the reference implementation) seems to be
 "allocate from under 4GB". It is currently being changed to "try to
 allocate from under 4GB, and if that fails, retry from high memory". (It
 is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
 )
  
>>> (and maybe a way to allocate and
>>> zero-initialize buffer without loading it through fwcfg),

 Sounds reasonable.
  
>>> this way bios
>>> does the allocation, and addresses can be patched into acpi.
>> and then guest side needs to parse/execute some AML that would
>> initialize QEMU side so it would know where to write data.
>
> Well not really - we can put it in a data table, by itself
> so it's easy to find.

 Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
 acpi_get_table_with_size()?
  
>
> AML is only needed if access from ACPI is desired.
>
> 
>> bios-linker-loader is a great interface for initializing some
>> guest owned data and linking it together but I think it adds
>> unnecessary complexity and is misused if it's used to handle
>> device owned data/on device memory in this and VMGID cases.
>
> I want a generic interface for guest to enumerate these things.  linker
> seems quite reasonable but if you see a reason why it won't do, or want
> to propose a better interface, fine.

 * The guest could do the following:
 - while processing the ALLOCATE commands, it would make a note where in
 GPA space 

Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Igor Mammedov
On Wed, 30 Dec 2015 21:52:32 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
> > On Mon, 28 Dec 2015 14:50:15 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
> > > > 
> > > > Hi Michael, Paolo,
> > > > 
> > > > Now it is the time to return to the challenge that how to reserve guest
> > > > physical region internally used by ACPI.
> > > > 
> > > > Igor suggested that:
> > > > | An alternative place to allocate reserve from could be high memory.
> > > > | For pc we have "reserved-memory-end" which currently makes sure
> > > > | that hotpluggable memory range isn't used by firmware
> > > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> > > >   
> > > 
> > > I don't want to tie things to reserved-memory-end because this
> > > does not scale: next time we need to reserve memory,
> > > we'll need to find yet another way to figure out what is where.  
> > Could you elaborate a bit more on a problem you're seeing?
> > 
> > To me it looks like it scales rather well.
> > For example lets imagine that we adding a device
> > that has some on device memory that should be mapped into GPA
> > code to do so would look like:
> > 
> >   pc_machine_device_plug_cb(dev)
> >   {
> >...
> >if (dev == OUR_NEW_DEVICE_TYPE) {
> >memory_region_add_subregion(as, current_reserved_end, >mr);
> >set_new_reserved_end(current_reserved_end + 
> > memory_region_size(>mr));
> >}
> >   }
> > 
> > we can practically add any number of new devices that way.  
> 
> Yes but we'll have to build a host side allocator for these, and that's
> nasty. We'll also have to maintain these addresses indefinitely (at
> least per machine version) as they are guest visible.
> Not only that, there's no way for guest to know if we move things
> around, so basically we'll never be able to change addresses.
simplistic GPA allocator in snippet above does the job,

if one unconditionally adds a device in new version then yes
code has to have compat code based on machine version.
But that applies to any device that gas a state to migrate
or to any address space layout change.

However device that directly maps addresses doesn't have to
have fixed address though, it could behave the same way as
PCI device with BARs, with only difference that its
MemoryRegions are mapped before guest is running vs
BARs mapped by BIOS.
It could be worth to create a generic base device class
that would do above. Then it could be inherited from and
extended by concrete device implementations.

> >
> > > I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> > > support 64 bit RAM instead (and maybe a way to allocate and
> > > zero-initialize buffer without loading it through fwcfg), this way bios
> > > does the allocation, and addresses can be patched into acpi.  
> > and then guest side needs to parse/execute some AML that would
> > initialize QEMU side so it would know where to write data.  
> 
> Well not really - we can put it in a data table, by itself
> so it's easy to find.
> 
> AML is only needed if access from ACPI is desired.
in both cases (VMGEN, NVDIMM) access from ACPI is required
as minimum to write address back to QEMU and for NVDIM
to pass _DSM method data between guest and QEMU.

> 
> 
> > bios-linker-loader is a great interface for initializing some
> > guest owned data and linking it together but I think it adds
> > unnecessary complexity and is misused if it's used to handle
> > device owned data/on device memory in this and VMGID cases.  
> 
> I want a generic interface for guest to enumerate these things.  linker
> seems quite reasonable but if you see a reason why it won't do, or want
> to propose a better interface, fine.
> 
> PCI would do, too - though windows guys had concerns about
> returning PCI BARs from ACPI.
There were potential issues with pSeries bootloader that treated
PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
Could you point out to discussion about windows issues?

What VMGEN patches that used PCI for mapping purposes were
stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
class id but we couldn't agree on it.

VMGEN v13 with full discussion is here
https://patchwork.ozlabs.org/patch/443554/
So to continue with this route we would need to pick some other
driver less class id so windows won't prompt for driver or
maybe supply our own driver stub to guarantee that no one
would touch it. Any suggestions?

> 
> 
> > There was RFC on list to make BIOS boot from NVDIMM already
> > doing some ACPI table lookup/parsing. Now if they were forced
> > to also parse and execute AML to initialize QEMU with guest
> > allocated address that would complicate them quite a bit.  
> 
> If they just need to find a table by name, it won't be
> too bad, would it?
that's what they were doing scanning memory for 

Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Igor Mammedov
On Mon, 4 Jan 2016 21:17:31 +0100
Laszlo Ersek  wrote:

> Michael CC'd me on the grandparent of the email below. I'll try to add
> my thoughts in a single go, with regard to OVMF.
> 
> On 12/30/15 20:52, Michael S. Tsirkin wrote:
> > On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
> >> On Mon, 28 Dec 2015 14:50:15 +0200
> >> "Michael S. Tsirkin"  wrote:
> >>  
> >>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
> 
>  Hi Michael, Paolo,
> 
>  Now it is the time to return to the challenge that how to reserve guest
>  physical region internally used by ACPI.
> 
>  Igor suggested that:
>  | An alternative place to allocate reserve from could be high memory.
>  | For pc we have "reserved-memory-end" which currently makes sure
>  | that hotpluggable memory range isn't used by firmware
>  (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) 
>   
> 
> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> reason is that nobody wrote that patch, nor asked for the patch to be
> written. (Not implying that just requesting the patch would be
> sufficient for the patch to be written.)
> 
> >>> I don't want to tie things to reserved-memory-end because this
> >>> does not scale: next time we need to reserve memory,
> >>> we'll need to find yet another way to figure out what is where.  
> >> Could you elaborate a bit more on a problem you're seeing?
> >>
> >> To me it looks like it scales rather well.
> >> For example lets imagine that we adding a device
> >> that has some on device memory that should be mapped into GPA
> >> code to do so would look like:
> >>
> >>   pc_machine_device_plug_cb(dev)
> >>   {
> >>...
> >>if (dev == OUR_NEW_DEVICE_TYPE) {
> >>memory_region_add_subregion(as, current_reserved_end, >mr);
> >>set_new_reserved_end(current_reserved_end + 
> >> memory_region_size(>mr));
> >>}
> >>   }
> >>
> >> we can practically add any number of new devices that way.  
> > 
> > Yes but we'll have to build a host side allocator for these, and that's
> > nasty. We'll also have to maintain these addresses indefinitely (at
> > least per machine version) as they are guest visible.
> > Not only that, there's no way for guest to know if we move things
> > around, so basically we'll never be able to change addresses.
> > 
> >   
> >>
> >>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> >>> support 64 bit RAM instead  
> 
> This looks quite doable in OVMF, as long as the blob to allocate from
> high memory contains *zero* ACPI tables.
> 
> (
> Namely, each ACPI table is installed from the containing fw_cfg blob
> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
> own allocation policy for the *copies* of ACPI tables it installs.
> 
> This allocation policy is left unspecified in the section of the UEFI
> spec that governs EFI_ACPI_TABLE_PROTOCOL.
> 
> The current policy in edk2 (= the reference implementation) seems to be
> "allocate from under 4GB". It is currently being changed to "try to
> allocate from under 4GB, and if that fails, retry from high memory". (It
> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
> )
> 
> >>> (and maybe a way to allocate and
> >>> zero-initialize buffer without loading it through fwcfg),  
> 
> Sounds reasonable.
> 
> >>> this way bios
> >>> does the allocation, and addresses can be patched into acpi.  
> >> and then guest side needs to parse/execute some AML that would
> >> initialize QEMU side so it would know where to write data.  
> > 
> > Well not really - we can put it in a data table, by itself
> > so it's easy to find.  
> 
> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
> acpi_get_table_with_size()?
> 
> > 
> > AML is only needed if access from ACPI is desired.
> > 
> >   
> >> bios-linker-loader is a great interface for initializing some
> >> guest owned data and linking it together but I think it adds
> >> unnecessary complexity and is misused if it's used to handle
> >> device owned data/on device memory in this and VMGID cases.  
> > 
> > I want a generic interface for guest to enumerate these things.  linker
> > seems quite reasonable but if you see a reason why it won't do, or want
> > to propose a better interface, fine.  
> 
> * The guest could do the following:
> - while processing the ALLOCATE commands, it would make a note where in
> GPA space each fw_cfg blob gets allocated
> - at the end the guest would prepare a temporary array with a predefined
> record format, that associates each fw_cfg blob's name with the concrete
> allocation address
> - it would create an FWCfgDmaAccess stucture pointing at this array,
> with a new "control" bit set (or something similar)
> - the guest could write the address of the FWCfgDmaAccess struct to the
> appropriate register, as always.
> 
> * Another idea 

Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Xiao Guangrong



On 01/06/2016 12:43 AM, Michael S. Tsirkin wrote:


Yes - if address is static, you need to put it outside
the table. Can come right before or right after this.


Also if OperationRegion() is used, then one has to patch
DefOpRegion directly as RegionOffset must be Integer,
using variable names is not permitted there.


I am not sure the comment was understood correctly.
The comment says really "we can't use DataTableRegion
so here is an alternative".

so how are you going to access data at which patched
NameString point to?
for that you'd need a normal patched OperationRegion
as well since DataTableRegion isn't usable.


For VMGENID you would patch the method that
returns the address - you do not need an op region
as you never access it.

I don't know about NVDIMM. Maybe OperationRegion can
use the patched NameString? Will need some thought.


The ACPI spec says that the offsetTerm in OperationRegion
is evaluated as Int, so the named object is allowed to be
used in OperationRegion, that is exact what my patchset
is doing (http://marc.info/?l=kvm=145193395624537=2):

+dsm_mem = aml_arg(3);
+aml_append(method, aml_store(aml_call0(NVDIMM_GET_DSM_MEM), dsm_mem));

+aml_append(method, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
+dsm_mem, TARGET_PAGE_SIZE));

We hide the int64 object which is patched by BIOS in the method,
NVDIMM_GET_DSM_MEM, to make windows XP happy.

However, the disadvantages i see are:
a) as Igor pointed out, we need a way to tell QEMU what is the patched
   address, in NVDIMM ACPI, we used a 64 bit IO ports to pass the address
   to QEMU.

b) BIOS allocated memory is RAM based so it stops us to use MMIO in ACPI,
   MMIO is the more scalable resource than IO port as it has larger region
   and supports 64 bits operation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Laszlo Ersek
On 01/05/16 18:08, Igor Mammedov wrote:
> On Mon, 4 Jan 2016 21:17:31 +0100
> Laszlo Ersek  wrote:
> 
>> Michael CC'd me on the grandparent of the email below. I'll try to add
>> my thoughts in a single go, with regard to OVMF.
>>
>> On 12/30/15 20:52, Michael S. Tsirkin wrote:
>>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
 On Mon, 28 Dec 2015 14:50:15 +0200
 "Michael S. Tsirkin"  wrote:
  
> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
>>
>> Hi Michael, Paolo,
>>
>> Now it is the time to return to the challenge that how to reserve guest
>> physical region internally used by ACPI.
>>
>> Igor suggested that:
>> | An alternative place to allocate reserve from could be high memory.
>> | For pc we have "reserved-memory-end" which currently makes sure
>> | that hotpluggable memory range isn't used by firmware
>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) 
>>  
>>
>> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
>> reason is that nobody wrote that patch, nor asked for the patch to be
>> written. (Not implying that just requesting the patch would be
>> sufficient for the patch to be written.)
>>
> I don't want to tie things to reserved-memory-end because this
> does not scale: next time we need to reserve memory,
> we'll need to find yet another way to figure out what is where.  
 Could you elaborate a bit more on a problem you're seeing?

 To me it looks like it scales rather well.
 For example lets imagine that we adding a device
 that has some on device memory that should be mapped into GPA
 code to do so would look like:

   pc_machine_device_plug_cb(dev)
   {
...
if (dev == OUR_NEW_DEVICE_TYPE) {
memory_region_add_subregion(as, current_reserved_end, >mr);
set_new_reserved_end(current_reserved_end + 
 memory_region_size(>mr));
}
   }

 we can practically add any number of new devices that way.  
>>>
>>> Yes but we'll have to build a host side allocator for these, and that's
>>> nasty. We'll also have to maintain these addresses indefinitely (at
>>> least per machine version) as they are guest visible.
>>> Not only that, there's no way for guest to know if we move things
>>> around, so basically we'll never be able to change addresses.
>>>
>>>   

> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> support 64 bit RAM instead  
>>
>> This looks quite doable in OVMF, as long as the blob to allocate from
>> high memory contains *zero* ACPI tables.
>>
>> (
>> Namely, each ACPI table is installed from the containing fw_cfg blob
>> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
>> own allocation policy for the *copies* of ACPI tables it installs.
>>
>> This allocation policy is left unspecified in the section of the UEFI
>> spec that governs EFI_ACPI_TABLE_PROTOCOL.
>>
>> The current policy in edk2 (= the reference implementation) seems to be
>> "allocate from under 4GB". It is currently being changed to "try to
>> allocate from under 4GB, and if that fails, retry from high memory". (It
>> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
>> )
>>
> (and maybe a way to allocate and
> zero-initialize buffer without loading it through fwcfg),  
>>
>> Sounds reasonable.
>>
> this way bios
> does the allocation, and addresses can be patched into acpi.  
 and then guest side needs to parse/execute some AML that would
 initialize QEMU side so it would know where to write data.  
>>>
>>> Well not really - we can put it in a data table, by itself
>>> so it's easy to find.  
>>
>> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
>> acpi_get_table_with_size()?
>>
>>>
>>> AML is only needed if access from ACPI is desired.
>>>
>>>   
 bios-linker-loader is a great interface for initializing some
 guest owned data and linking it together but I think it adds
 unnecessary complexity and is misused if it's used to handle
 device owned data/on device memory in this and VMGID cases.  
>>>
>>> I want a generic interface for guest to enumerate these things.  linker
>>> seems quite reasonable but if you see a reason why it won't do, or want
>>> to propose a better interface, fine.  
>>
>> * The guest could do the following:
>> - while processing the ALLOCATE commands, it would make a note where in
>> GPA space each fw_cfg blob gets allocated
>> - at the end the guest would prepare a temporary array with a predefined
>> record format, that associates each fw_cfg blob's name with the concrete
>> allocation address
>> - it would create an FWCfgDmaAccess stucture pointing at this array,
>> with a new "control" bit set (or something similar)
>> - the guest could write the address of the FWCfgDmaAccess 

Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Michael S. Tsirkin
On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote:
> > > bios-linker-loader is a great interface for initializing some
> > > guest owned data and linking it together but I think it adds
> > > unnecessary complexity and is misused if it's used to handle
> > > device owned data/on device memory in this and VMGID cases.  
> > 
> > I want a generic interface for guest to enumerate these things.  linker
> > seems quite reasonable but if you see a reason why it won't do, or want
> > to propose a better interface, fine.
> > 
> > PCI would do, too - though windows guys had concerns about
> > returning PCI BARs from ACPI.
> There were potential issues with pSeries bootloader that treated
> PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
> Could you point out to discussion about windows issues?
> 
> What VMGEN patches that used PCI for mapping purposes were
> stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
> class id but we couldn't agree on it.
> 
> VMGEN v13 with full discussion is here
> https://patchwork.ozlabs.org/patch/443554/
> So to continue with this route we would need to pick some other
> driver less class id so windows won't prompt for driver or
> maybe supply our own driver stub to guarantee that no one
> would touch it. Any suggestions?

Pick any device/vendor id pair for which windows specifies no driver.
There's a small risk that this will conflict with some
guest but I think it's minimal.


> > 
> > 
> > > There was RFC on list to make BIOS boot from NVDIMM already
> > > doing some ACPI table lookup/parsing. Now if they were forced
> > > to also parse and execute AML to initialize QEMU with guest
> > > allocated address that would complicate them quite a bit.  
> > 
> > If they just need to find a table by name, it won't be
> > too bad, would it?
> that's what they were doing scanning memory for static NVDIMM table.
> However if it were DataTable, BIOS side would have to execute
> AML so that the table address could be told to QEMU.

Not at all. You can find any table by its signature without
parsing AML.


> In case of direct mapping or PCI BAR there is no need to initialize
> QEMU side from AML.
> That also saves us IO port where this address should be written
> if bios-linker-loader approach is used.
> 
> > 
> > > While with NVDIMM control memory region mapped directly by QEMU,
> > > respective patches don't need in any way to initialize QEMU,
> > > all they would need just read necessary data from control region.
> > > 
> > > Also using bios-linker-loader takes away some usable RAM
> > > from guest and in the end that doesn't scale,
> > > the more devices I add the less usable RAM is left for guest OS
> > > while all the device needs is a piece of GPA address space
> > > that would belong to it.  
> > 
> > I don't get this comment. I don't think it's MMIO that is wanted.
> > If it's backed by qemu virtual memory then it's RAM.
> Then why don't allocate video card VRAM the same way and try to explain
> user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb'
> only has 64Mb of available RAM because of we think that on device VRAM
> is also RAM.
> 
> Maybe I've used MMIO term wrongly here but it roughly reflects the idea
> that on device memory (whether it's VRAM, NVDIMM control block or VMGEN
> area) is not allocated from guest's usable RAM (as described in E820)
> but rather directly mapped in guest's GPA and doesn't consume available
> RAM as guest sees it. That's also the way it's done on real hardware.
> 
> What we need in case of VMGEN ID and NVDIMM is on device memory
> that could be directly accessed by guest.
> Both direct mapping or PCI BAR do that job and we could use simple
> static AML without any patching.

At least with VMGEN the issue is that there's an AML method
that returns the physical address.
Then if guest OS moves the BAR (which is legal), it will break
since caller has no way to know it's related to the BAR.


> > > > 
> > > > See patch at the bottom that might be handy.
> > > >   
> > > > > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> > > > > | when writing ASL one shall make sure that only XP supported
> > > > > | features are in global scope, which is evaluated when tables
> > > > > | are loaded and features of rev2 and higher are inside methods.
> > > > > | That way XP doesn't crash as far as it doesn't evaluate unsupported
> > > > > | opcode and one can guard those opcodes checking _REV object if 
> > > > > neccesary.
> > > > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)
> > > > >   
> > > > 
> > > > Yes, this technique works.
> > > > 
> > > > An alternative is to add an XSDT, XP ignores that.
> > > > XSDT at the moment breaks OVMF (because it loads both
> > > > the RSDT and the XSDT, which is wrong), but I think
> > > > Laszlo was working on a fix for that.  
> > > Using XSDT would increase ACPI tables occupied RAM
> > > as it would duplicate DSDT + non XP 

Re: How to reserve guest physical region for ACPI

2016-01-05 Thread Laszlo Ersek
On 01/05/16 17:43, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote:
 bios-linker-loader is a great interface for initializing some
 guest owned data and linking it together but I think it adds
 unnecessary complexity and is misused if it's used to handle
 device owned data/on device memory in this and VMGID cases.  
>>>
>>> I want a generic interface for guest to enumerate these things.  linker
>>> seems quite reasonable but if you see a reason why it won't do, or want
>>> to propose a better interface, fine.
>>>
>>> PCI would do, too - though windows guys had concerns about
>>> returning PCI BARs from ACPI.
>> There were potential issues with pSeries bootloader that treated
>> PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
>> Could you point out to discussion about windows issues?
>>
>> What VMGEN patches that used PCI for mapping purposes were
>> stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
>> class id but we couldn't agree on it.
>>
>> VMGEN v13 with full discussion is here
>> https://patchwork.ozlabs.org/patch/443554/
>> So to continue with this route we would need to pick some other
>> driver less class id so windows won't prompt for driver or
>> maybe supply our own driver stub to guarantee that no one
>> would touch it. Any suggestions?
> 
> Pick any device/vendor id pair for which windows specifies no driver.
> There's a small risk that this will conflict with some
> guest but I think it's minimal.
> 
> 
>>>
>>>
 There was RFC on list to make BIOS boot from NVDIMM already
 doing some ACPI table lookup/parsing. Now if they were forced
 to also parse and execute AML to initialize QEMU with guest
 allocated address that would complicate them quite a bit.  
>>>
>>> If they just need to find a table by name, it won't be
>>> too bad, would it?
>> that's what they were doing scanning memory for static NVDIMM table.
>> However if it were DataTable, BIOS side would have to execute
>> AML so that the table address could be told to QEMU.
> 
> Not at all. You can find any table by its signature without
> parsing AML.
> 
> 
>> In case of direct mapping or PCI BAR there is no need to initialize
>> QEMU side from AML.
>> That also saves us IO port where this address should be written
>> if bios-linker-loader approach is used.
>>
>>>
 While with NVDIMM control memory region mapped directly by QEMU,
 respective patches don't need in any way to initialize QEMU,
 all they would need just read necessary data from control region.

 Also using bios-linker-loader takes away some usable RAM
 from guest and in the end that doesn't scale,
 the more devices I add the less usable RAM is left for guest OS
 while all the device needs is a piece of GPA address space
 that would belong to it.  
>>>
>>> I don't get this comment. I don't think it's MMIO that is wanted.
>>> If it's backed by qemu virtual memory then it's RAM.
>> Then why don't allocate video card VRAM the same way and try to explain
>> user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb'
>> only has 64Mb of available RAM because of we think that on device VRAM
>> is also RAM.
>>
>> Maybe I've used MMIO term wrongly here but it roughly reflects the idea
>> that on device memory (whether it's VRAM, NVDIMM control block or VMGEN
>> area) is not allocated from guest's usable RAM (as described in E820)
>> but rather directly mapped in guest's GPA and doesn't consume available
>> RAM as guest sees it. That's also the way it's done on real hardware.
>>
>> What we need in case of VMGEN ID and NVDIMM is on device memory
>> that could be directly accessed by guest.
>> Both direct mapping or PCI BAR do that job and we could use simple
>> static AML without any patching.
> 
> At least with VMGEN the issue is that there's an AML method
> that returns the physical address.
> Then if guest OS moves the BAR (which is legal), it will break
> since caller has no way to know it's related to the BAR.
> 
> 
>
> See patch at the bottom that might be handy.
>   
>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
>> | when writing ASL one shall make sure that only XP supported
>> | features are in global scope, which is evaluated when tables
>> | are loaded and features of rev2 and higher are inside methods.
>> | That way XP doesn't crash as far as it doesn't evaluate unsupported
>> | opcode and one can guard those opcodes checking _REV object if 
>> neccesary.
>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) 
>>  
>
> Yes, this technique works.
>
> An alternative is to add an XSDT, XP ignores that.
> XSDT at the moment breaks OVMF (because it loads both
> the RSDT and the XSDT, which is wrong), but I think
> Laszlo was working on a fix for that.  
 Using XSDT would increase ACPI tables occupied RAM
 

Re: How to reserve guest physical region for ACPI

2016-01-04 Thread Laszlo Ersek
Michael CC'd me on the grandparent of the email below. I'll try to add
my thoughts in a single go, with regard to OVMF.

On 12/30/15 20:52, Michael S. Tsirkin wrote:
> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>> On Mon, 28 Dec 2015 14:50:15 +0200
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:

 Hi Michael, Paolo,

 Now it is the time to return to the challenge that how to reserve guest
 physical region internally used by ACPI.

 Igor suggested that:
 | An alternative place to allocate reserve from could be high memory.
 | For pc we have "reserved-memory-end" which currently makes sure
 | that hotpluggable memory range isn't used by firmware
 (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)

OVMF has no support for the "reserved-memory-end" fw_cfg file. The
reason is that nobody wrote that patch, nor asked for the patch to be
written. (Not implying that just requesting the patch would be
sufficient for the patch to be written.)

>>> I don't want to tie things to reserved-memory-end because this
>>> does not scale: next time we need to reserve memory,
>>> we'll need to find yet another way to figure out what is where.
>> Could you elaborate a bit more on a problem you're seeing?
>>
>> To me it looks like it scales rather well.
>> For example lets imagine that we adding a device
>> that has some on device memory that should be mapped into GPA
>> code to do so would look like:
>>
>>   pc_machine_device_plug_cb(dev)
>>   {
>>...
>>if (dev == OUR_NEW_DEVICE_TYPE) {
>>memory_region_add_subregion(as, current_reserved_end, >mr);
>>set_new_reserved_end(current_reserved_end + 
>> memory_region_size(>mr));
>>}
>>   }
>>
>> we can practically add any number of new devices that way.
> 
> Yes but we'll have to build a host side allocator for these, and that's
> nasty. We'll also have to maintain these addresses indefinitely (at
> least per machine version) as they are guest visible.
> Not only that, there's no way for guest to know if we move things
> around, so basically we'll never be able to change addresses.
> 
> 
>>  
>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
>>> support 64 bit RAM instead

This looks quite doable in OVMF, as long as the blob to allocate from
high memory contains *zero* ACPI tables.

(
Namely, each ACPI table is installed from the containing fw_cfg blob
with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
own allocation policy for the *copies* of ACPI tables it installs.

This allocation policy is left unspecified in the section of the UEFI
spec that governs EFI_ACPI_TABLE_PROTOCOL.

The current policy in edk2 (= the reference implementation) seems to be
"allocate from under 4GB". It is currently being changed to "try to
allocate from under 4GB, and if that fails, retry from high memory". (It
is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
)

>>> (and maybe a way to allocate and
>>> zero-initialize buffer without loading it through fwcfg),

Sounds reasonable.

>>> this way bios
>>> does the allocation, and addresses can be patched into acpi.
>> and then guest side needs to parse/execute some AML that would
>> initialize QEMU side so it would know where to write data.
> 
> Well not really - we can put it in a data table, by itself
> so it's easy to find.

Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
acpi_get_table_with_size()?

> 
> AML is only needed if access from ACPI is desired.
> 
> 
>> bios-linker-loader is a great interface for initializing some
>> guest owned data and linking it together but I think it adds
>> unnecessary complexity and is misused if it's used to handle
>> device owned data/on device memory in this and VMGID cases.
> 
> I want a generic interface for guest to enumerate these things.  linker
> seems quite reasonable but if you see a reason why it won't do, or want
> to propose a better interface, fine.

* The guest could do the following:
- while processing the ALLOCATE commands, it would make a note where in
GPA space each fw_cfg blob gets allocated
- at the end the guest would prepare a temporary array with a predefined
record format, that associates each fw_cfg blob's name with the concrete
allocation address
- it would create an FWCfgDmaAccess stucture pointing at this array,
with a new "control" bit set (or something similar)
- the guest could write the address of the FWCfgDmaAccess struct to the
appropriate register, as always.

* Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command,
specifying:
- the fw_cfg blob's name, for which to retrieve the guest-allocated
  address (this command could only follow the matching ALLOCATE
  command, never precede it)
- a flag whether the address should be written to IO or MMIO space
  (would be likely IO on x86, MMIO on ARM)
- a unique 

Re: How to reserve guest physical region for ACPI

2015-12-30 Thread Michael S. Tsirkin
On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
> On Mon, 28 Dec 2015 14:50:15 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:
> > > 
> > > Hi Michael, Paolo,
> > > 
> > > Now it is the time to return to the challenge that how to reserve guest
> > > physical region internally used by ACPI.
> > > 
> > > Igor suggested that:
> > > | An alternative place to allocate reserve from could be high memory.
> > > | For pc we have "reserved-memory-end" which currently makes sure
> > > | that hotpluggable memory range isn't used by firmware
> > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> > 
> > I don't want to tie things to reserved-memory-end because this
> > does not scale: next time we need to reserve memory,
> > we'll need to find yet another way to figure out what is where.
> Could you elaborate a bit more on a problem you're seeing?
> 
> To me it looks like it scales rather well.
> For example lets imagine that we adding a device
> that has some on device memory that should be mapped into GPA
> code to do so would look like:
> 
>   pc_machine_device_plug_cb(dev)
>   {
>...
>if (dev == OUR_NEW_DEVICE_TYPE) {
>memory_region_add_subregion(as, current_reserved_end, >mr);
>set_new_reserved_end(current_reserved_end + 
> memory_region_size(>mr));
>}
>   }
> 
> we can practically add any number of new devices that way.

Yes but we'll have to build a host side allocator for these, and that's
nasty. We'll also have to maintain these addresses indefinitely (at
least per machine version) as they are guest visible.
Not only that, there's no way for guest to know if we move things
around, so basically we'll never be able to change addresses.


>  
> > I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> > support 64 bit RAM instead (and maybe a way to allocate and
> > zero-initialize buffer without loading it through fwcfg), this way bios
> > does the allocation, and addresses can be patched into acpi.
> and then guest side needs to parse/execute some AML that would
> initialize QEMU side so it would know where to write data.

Well not really - we can put it in a data table, by itself
so it's easy to find.

AML is only needed if access from ACPI is desired.


> bios-linker-loader is a great interface for initializing some
> guest owned data and linking it together but I think it adds
> unnecessary complexity and is misused if it's used to handle
> device owned data/on device memory in this and VMGID cases.

I want a generic interface for guest to enumerate these things.  linker
seems quite reasonable but if you see a reason why it won't do, or want
to propose a better interface, fine.

PCI would do, too - though windows guys had concerns about
returning PCI BARs from ACPI.


> There was RFC on list to make BIOS boot from NVDIMM already
> doing some ACPI table lookup/parsing. Now if they were forced
> to also parse and execute AML to initialize QEMU with guest
> allocated address that would complicate them quite a bit.

If they just need to find a table by name, it won't be
too bad, would it?

> While with NVDIMM control memory region mapped directly by QEMU,
> respective patches don't need in any way to initialize QEMU,
> all they would need just read necessary data from control region.
> 
> Also using bios-linker-loader takes away some usable RAM
> from guest and in the end that doesn't scale,
> the more devices I add the less usable RAM is left for guest OS
> while all the device needs is a piece of GPA address space
> that would belong to it.

I don't get this comment. I don't think it's MMIO that is wanted.
If it's backed by qemu virtual memory then it's RAM.

> > 
> > See patch at the bottom that might be handy.
> > 
> > > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> > > | when writing ASL one shall make sure that only XP supported
> > > | features are in global scope, which is evaluated when tables
> > > | are loaded and features of rev2 and higher are inside methods.
> > > | That way XP doesn't crash as far as it doesn't evaluate unsupported
> > > | opcode and one can guard those opcodes checking _REV object if 
> > > neccesary.
> > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)
> > 
> > Yes, this technique works.
> > 
> > An alternative is to add an XSDT, XP ignores that.
> > XSDT at the moment breaks OVMF (because it loads both
> > the RSDT and the XSDT, which is wrong), but I think
> > Laszlo was working on a fix for that.
> Using XSDT would increase ACPI tables occupied RAM
> as it would duplicate DSDT + non XP supported AML
> at global namespace.

Not at all - I posted patches linking to same
tables from both RSDT and XSDT at some point.
Only the list of pointers would be different.

> So far we've managed keep DSDT compatible with XP while
> introducing features from v2 and higher ACPI 

Re: How to reserve guest physical region for ACPI

2015-12-30 Thread Igor Mammedov
On Mon, 28 Dec 2015 14:50:15 +0200
"Michael S. Tsirkin"  wrote:

> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:
> > 
> > Hi Michael, Paolo,
> > 
> > Now it is the time to return to the challenge that how to reserve guest
> > physical region internally used by ACPI.
> > 
> > Igor suggested that:
> > | An alternative place to allocate reserve from could be high memory.
> > | For pc we have "reserved-memory-end" which currently makes sure
> > | that hotpluggable memory range isn't used by firmware
> > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> 
> I don't want to tie things to reserved-memory-end because this
> does not scale: next time we need to reserve memory,
> we'll need to find yet another way to figure out what is where.
Could you elaborate a bit more on a problem you're seeing?

To me it looks like it scales rather well.
For example lets imagine that we adding a device
that has some on device memory that should be mapped into GPA
code to do so would look like:

  pc_machine_device_plug_cb(dev)
  {
   ...
   if (dev == OUR_NEW_DEVICE_TYPE) {
   memory_region_add_subregion(as, current_reserved_end, >mr);
   set_new_reserved_end(current_reserved_end + 
memory_region_size(>mr));
   }
  }

we can practically add any number of new devices that way.

 
> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> support 64 bit RAM instead (and maybe a way to allocate and
> zero-initialize buffer without loading it through fwcfg), this way bios
> does the allocation, and addresses can be patched into acpi.
and then guest side needs to parse/execute some AML that would
initialize QEMU side so it would know where to write data.

bios-linker-loader is a great interface for initializing some
guest owned data and linking it together but I think it adds
unnecessary complexity and is misused if it's used to handle
device owned data/on device memory in this and VMGID cases.

There was RFC on list to make BIOS boot from NVDIMM already
doing some ACPI table lookup/parsing. Now if they were forced
to also parse and execute AML to initialize QEMU with guest
allocated address that would complicate them quite a bit.
While with NVDIMM control memory region mapped directly by QEMU,
respective patches don't need in any way to initialize QEMU,
all they would need just read necessary data from control region.

Also using bios-linker-loader takes away some usable RAM
from guest and in the end that doesn't scale,
the more devices I add the less usable RAM is left for guest OS
while all the device needs is a piece of GPA address space
that would belong to it.

> 
> See patch at the bottom that might be handy.
> 
> > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> > | when writing ASL one shall make sure that only XP supported
> > | features are in global scope, which is evaluated when tables
> > | are loaded and features of rev2 and higher are inside methods.
> > | That way XP doesn't crash as far as it doesn't evaluate unsupported
> > | opcode and one can guard those opcodes checking _REV object if neccesary.
> > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)
> 
> Yes, this technique works.
> 
> An alternative is to add an XSDT, XP ignores that.
> XSDT at the moment breaks OVMF (because it loads both
> the RSDT and the XSDT, which is wrong), but I think
> Laszlo was working on a fix for that.
Using XSDT would increase ACPI tables occupied RAM
as it would duplicate DSDT + non XP supported AML
at global namespace.

So far we've managed keep DSDT compatible with XP while
introducing features from v2 and higher ACPI revisions as
AML that is only evaluated on demand.
We can continue doing so unless we have to unconditionally
add incompatible AML at global scope.


> 
> > Michael, Paolo, what do you think about these ideas?
> > 
> > Thanks!
> 
> 
> 
> So using a patch below, we can add Name(PQRS, 0x0) at the top of the
> SSDT (or bottom, or add a separate SSDT just for that).  It returns the
> current offset so we can add that to the linker.
> 
> Won't work if you append the Name to the Aml structure (these can be
> nested to arbitrary depth using aml_append), so using plain GArray for
> this API makes sense to me.
> 
> --->
> 
> acpi: add build_append_named_dword, returning an offset in buffer
> 
> This is a very limited form of support for runtime patching -
> similar in functionality to what we can do with ACPI_EXTRACT
> macros in python, but implemented in C.
> 
> This is to allow ACPI code direct access to data tables -
> which is exactly what DataTableRegion is there for, except
> no known windows release so far implements DataTableRegion.
unsupported means Windows will BSOD, so it's practically
unusable unless MS will patch currently existing Windows
versions.

Another thing about DataTableRegion is that ACPI tables are
supposed to have static content which matches checksum in
table the 

Re: How to reserve guest physical region for ACPI

2015-12-28 Thread Michael S. Tsirkin
On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:
> 
> Hi Michael, Paolo,
> 
> Now it is the time to return to the challenge that how to reserve guest
> physical region internally used by ACPI.
> 
> Igor suggested that:
> | An alternative place to allocate reserve from could be high memory.
> | For pc we have "reserved-memory-end" which currently makes sure
> | that hotpluggable memory range isn't used by firmware
> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)

I don't want to tie things to reserved-memory-end because this
does not scale: next time we need to reserve memory,
we'll need to find yet another way to figure out what is where.

I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
support 64 bit RAM instead (and maybe a way to allocate and
zero-initialize buffer without loading it through fwcfg), this way bios
does the allocation, and addresses can be patched into acpi.

See patch at the bottom that might be handy.

> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> | when writing ASL one shall make sure that only XP supported
> | features are in global scope, which is evaluated when tables
> | are loaded and features of rev2 and higher are inside methods.
> | That way XP doesn't crash as far as it doesn't evaluate unsupported
> | opcode and one can guard those opcodes checking _REV object if neccesary.
> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)

Yes, this technique works.

An alternative is to add an XSDT, XP ignores that.
XSDT at the moment breaks OVMF (because it loads both
the RSDT and the XSDT, which is wrong), but I think
Laszlo was working on a fix for that.

> Michael, Paolo, what do you think about these ideas?
> 
> Thanks!



So using a patch below, we can add Name(PQRS, 0x0) at the top of the
SSDT (or bottom, or add a separate SSDT just for that).  It returns the
current offset so we can add that to the linker.

Won't work if you append the Name to the Aml structure (these can be
nested to arbitrary depth using aml_append), so using plain GArray for
this API makes sense to me.

--->

acpi: add build_append_named_dword, returning an offset in buffer

This is a very limited form of support for runtime patching -
similar in functionality to what we can do with ACPI_EXTRACT
macros in python, but implemented in C.

This is to allow ACPI code direct access to data tables -
which is exactly what DataTableRegion is there for, except
no known windows release so far implements DataTableRegion.

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1b632dc..f8998ea 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, 
bool mfre);
 void
 build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets);
 
+int
+build_append_named_dword(GArray *array, const char *name_format, ...);
+
 #endif
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..7f9fa65 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value)
 }
 }
 
+/* Build NAME(, 0x0) where 0x0 is encoded as a qword,
+ * and return the offset to 0x0 for runtime patching.
+ *
+ * Warning: runtime patching is best avoided. Only use this as
+ * a replacement for DataTableRegion (for guests that don't
+ * support it).
+ */
+int
+build_append_named_qword(GArray *array, const char *name_format, ...)
+{
+int offset;
+va_list ap;
+
+va_start(ap, name_format);
+build_append_namestringv(array, name_format, ap);
+va_end(ap);
+
+build_append_byte(array, 0x0E); /* QWordPrefix */
+
+offset = array->len;
+build_append_int_noprefix(array, 0x0, 8);
+assert(array->len == offset + 8);
+
+return offset;
+}
+
 static GPtrArray *alloc_list;
 
 static Aml *aml_alloc(void)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to reserve guest physical region for ACPI

2015-12-27 Thread Xiao Guangrong


Hi Michael, Paolo,

Now it is the time to return to the challenge that how to reserve guest
physical region internally used by ACPI.

Igor suggested that:
| An alternative place to allocate reserve from could be high memory.
| For pc we have "reserved-memory-end" which currently makes sure
| that hotpluggable memory range isn't used by firmware
(https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)

he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
| when writing ASL one shall make sure that only XP supported
| features are in global scope, which is evaluated when tables
| are loaded and features of rev2 and higher are inside methods.
| That way XP doesn't crash as far as it doesn't evaluate unsupported
| opcode and one can guard those opcodes checking _REV object if neccesary.
(https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)

Michael, Paolo, what do you think about these ideas?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html