Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Michael S. Tsirkin
On Tue, Feb 16, 2016 at 02:36:49PM +0200, Marcel Apfelbaum wrote:
> >>2. PCI devices with no driver installed are not re-mapped. This can be OK
> >> from the Windows point of view because Resources Window does not show 
> >> the MMIO range
> >> for this device.
> >>
> >> If the other (re-mapped) device is working, is pure luck. Both Memory 
> >> Regions occupy the same range
> >> and have the same priority.
> >>
> >>We need to think about how to solve this.
> >>One way would be to defer the BAR activation to the guest OS, but I am not 
> >>sure of the consequences.
> >deferring won't solve problem as rebalancing could happen later
> >and make BARs overlap.
> 
> Why not? If we do not activate the BAR in firmware and Windows does not have 
> a driver
> for it, will not activate it at all, right?
> Why would Windows activate the device BAR if it can't use it? At least this 
> is what I hope.
> Any other idea would be appreciated.
> 

I wonder whether this is related to setting PnP in CMOS.
See e.g.  http://oss.sgi.com/LDP/HOWTO/Plug-and-Play-HOWTO-4.html
and https://support.microsoft.com/en-us/kb/321779



> >I've noticed that at startup Windows unmaps and then maps BARs
> >at the same addresses where BIOS've put them before.
> 
> Including devices without a working driver?
> 
> 
> Thanks,
> Marcel
> 
> >
> >>And this does not solve the ivshmem problem.
> >So far the only way to avoid overlapping BARs due to Windows
> >doing rebalancing for driver-less devices is to pin such
> >BARs statically with _CRS in ACPI table but as Michael said
> >it fragments PCI address-space.
> >
> >>
> >>Thanks,
> >>Marcel
> >>
> >>
> >>
> >>
> >>
> >



Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Michael S. Tsirkin
On Tue, Feb 16, 2016 at 02:51:25PM +0100, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 14:36:49 +0200
> Marcel Apfelbaum  wrote:
> 
> > On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> > > On Tue, 16 Feb 2016 12:05:33 +0200
> > > Marcel Apfelbaum  wrote:
> > >  
> > >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:  
> > >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> >  On Tue, 9 Feb 2016 14:17:44 +0200
> >  "Michael S. Tsirkin"  wrote:
> >   
> > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > >>> So the linker interface solves this rather neatly:
> > >>> bios allocates memory, bios passes memory map to guest.
> > >>> Served us well for several years without need for extensions,
> > >>> and it does solve the VM GEN ID problem, even though
> > >>> 1. it was never designed for huge areas like nvdimm seems to want 
> > >>> to use
> > >>> 2. we might want to add a new 64 bit flag to avoid touching low 
> > >>> memory  
> > >> linker interface is fine for some readonly data, like ACPI tables
> > >> especially fixed tables not so for AML ones is one wants to patch it.
> > >>
> > >> However now when you want to use it for other purposes you start
> > >> adding extensions and other guest->QEMU channels to communicate
> > >> patching info back.
> > >> It steals guest's memory which is also not nice and doesn't scale 
> > >> well.  
> > >
> > > This is an argument I don't get. memory is memory. call it guest 
> > > memory
> > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > but much slower.
> > >
> > > ...  
> >  It however matters for user, he pays for guest with XXX RAM but gets 
> >  less
> >  than that. And that will be getting worse as a number of such devices
> >  increases.
> >   
> > >>> OK fine, but returning PCI BAR address to guest is wrong.
> > >>> How about reading it from ACPI then? Is it really
> > >>> broken unless there's *also* a driver?  
> > >> I don't get question, MS Spec requires address (ADDR method),
> > >> and it's read by ACPI (AML).  
> > >
> > > You were unhappy about DMA into guest memory.
> > > As a replacement for DMA, we could have AML read from
> > > e.g. PCI and write into RAM.
> > > This way we don't need to pass address to QEMU.  
> >  That sounds better as it saves us from allocation of IO port
> >  and QEMU don't need to write into guest memory, the only question is
> >  if PCI_Config opregion would work with driver-less PCI device.  
> > >>>
> > >>> Or PCI BAR for that reason. I don't know for sure.
> > >>>  
> > 
> >  And it's still pretty much not test-able since it would require
> >  fully running OSPM to execute AML side.  
> > >>>
> > >>> AML is not testable, but that's nothing new.
> > >>> You can test reading from PCI.
> > >>>  
> > >  
> > >> As for working PCI_Config OpRegion without driver, I haven't tried,
> > >> but I wouldn't be surprised if it doesn't, taking in account that
> > >> MS introduced _DSM doesn't.
> > >>  
> > >>>
> > >>>  
> > >>  Just compare with a graphics card design, where on device 
> > >> memory
> > >>  is mapped directly at some GPA not wasting RAM that guest 
> > >> could
> > >>  use for other tasks.  
> > >
> > > This might have been true 20 years ago.  Most modern cards do 
> > > DMA.  
> > 
> >  Modern cards, with it's own RAM, map its VRAM in address space 
> >  directly
> >  and allow users use it (GEM API). So they do not waste 
> >  conventional RAM.
> >  For example NVIDIA VRAM is mapped as PCI BARs the same way like in 
> >  this
> >  series (even PCI class id is the same)  
> > >>>
> > >>> Don't know enough about graphics really, I'm not sure how these are
> > >>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > >>> mostly use guest RAM, not on card RAM.
> > >>>  
> > >>  VMGENID and NVDIMM use-cases look to me exactly the same, 
> > >> i.e.
> > >>  instead of consuming guest's RAM they should be mapped at
> > >>  some GPA and their memory accessed directly.  
> > >
> > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > address. This breaks the straight-forward approach of using a
> > > rebalanceable PCI BAR.  
> > 
> >  For PCI rebalance to work on Windows, one has to provide working 
> >  PCI driver
> >  otherwise OS will ignore it when rebalancing happens and
> >  might map something else over ignored BAR.  
> > >>>
> > >>> Does it disable the BAR then? Or just move 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Igor Mammedov
On Tue, 16 Feb 2016 14:36:49 +0200
Marcel Apfelbaum  wrote:

> On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> > On Tue, 16 Feb 2016 12:05:33 +0200
> > Marcel Apfelbaum  wrote:
> >  
> >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:  
> >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
>  On Tue, 9 Feb 2016 14:17:44 +0200
>  "Michael S. Tsirkin"  wrote:
>   
> > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> >>> So the linker interface solves this rather neatly:
> >>> bios allocates memory, bios passes memory map to guest.
> >>> Served us well for several years without need for extensions,
> >>> and it does solve the VM GEN ID problem, even though
> >>> 1. it was never designed for huge areas like nvdimm seems to want to 
> >>> use
> >>> 2. we might want to add a new 64 bit flag to avoid touching low 
> >>> memory  
> >> linker interface is fine for some readonly data, like ACPI tables
> >> especially fixed tables not so for AML ones is one wants to patch it.
> >>
> >> However now when you want to use it for other purposes you start
> >> adding extensions and other guest->QEMU channels to communicate
> >> patching info back.
> >> It steals guest's memory which is also not nice and doesn't scale 
> >> well.  
> >
> > This is an argument I don't get. memory is memory. call it guest memory
> > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > but much slower.
> >
> > ...  
>  It however matters for user, he pays for guest with XXX RAM but gets less
>  than that. And that will be getting worse as a number of such devices
>  increases.
>   
> >>> OK fine, but returning PCI BAR address to guest is wrong.
> >>> How about reading it from ACPI then? Is it really
> >>> broken unless there's *also* a driver?  
> >> I don't get question, MS Spec requires address (ADDR method),
> >> and it's read by ACPI (AML).  
> >
> > You were unhappy about DMA into guest memory.
> > As a replacement for DMA, we could have AML read from
> > e.g. PCI and write into RAM.
> > This way we don't need to pass address to QEMU.  
>  That sounds better as it saves us from allocation of IO port
>  and QEMU don't need to write into guest memory, the only question is
>  if PCI_Config opregion would work with driver-less PCI device.  
> >>>
> >>> Or PCI BAR for that reason. I don't know for sure.
> >>>  
> 
>  And it's still pretty much not test-able since it would require
>  fully running OSPM to execute AML side.  
> >>>
> >>> AML is not testable, but that's nothing new.
> >>> You can test reading from PCI.
> >>>  
> >  
> >> As for working PCI_Config OpRegion without driver, I haven't tried,
> >> but I wouldn't be surprised if it doesn't, taking in account that
> >> MS introduced _DSM doesn't.
> >>  
> >>>
> >>>  
> >>  Just compare with a graphics card design, where on device 
> >> memory
> >>  is mapped directly at some GPA not wasting RAM that guest 
> >> could
> >>  use for other tasks.  
> >
> > This might have been true 20 years ago.  Most modern cards do DMA.  
> 
>  Modern cards, with it's own RAM, map its VRAM in address space 
>  directly
>  and allow users use it (GEM API). So they do not waste conventional 
>  RAM.
>  For example NVIDIA VRAM is mapped as PCI BARs the same way like in 
>  this
>  series (even PCI class id is the same)  
> >>>
> >>> Don't know enough about graphics really, I'm not sure how these are
> >>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >>> mostly use guest RAM, not on card RAM.
> >>>  
> >>  VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >>  instead of consuming guest's RAM they should be mapped at
> >>  some GPA and their memory accessed directly.  
> >
> > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > address. This breaks the straight-forward approach of using a
> > rebalanceable PCI BAR.  
> 
>  For PCI rebalance to work on Windows, one has to provide working PCI 
>  driver
>  otherwise OS will ignore it when rebalancing happens and
>  might map something else over ignored BAR.  
> >>>
> >>> Does it disable the BAR then? Or just move it elsewhere?  
> >> it doesn't, it just blindly ignores BARs existence and maps BAR of
> >> another device with driver over it.  
> >
> > Interesting. On classical PCI this is a forbidden configuration.
> > Maybe we do something that confuses windows?
> > Could you tell me how to reproduce this 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Marcel Apfelbaum

On 02/16/2016 02:17 PM, Igor Mammedov wrote:

On Tue, 16 Feb 2016 12:05:33 +0200
Marcel Apfelbaum  wrote:


On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:

On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin"  wrote:


On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:

So the linker interface solves this rather neatly:
bios allocates memory, bios passes memory map to guest.
Served us well for several years without need for extensions,
and it does solve the VM GEN ID problem, even though
1. it was never designed for huge areas like nvdimm seems to want to use
2. we might want to add a new 64 bit flag to avoid touching low memory

linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.


This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.


OK fine, but returning PCI BAR address to guest is wrong.
How about reading it from ACPI then? Is it really
broken unless there's *also* a driver?

I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).


You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.


Or PCI BAR for that reason. I don't know for sure.



And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.


AML is not testable, but that's nothing new.
You can test reading from PCI.




As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.





 Just compare with a graphics card design, where on device memory
 is mapped directly at some GPA not wasting RAM that guest could
 use for other tasks.


This might have been true 20 years ago.  Most modern cards do DMA.


Modern cards, with it's own RAM, map its VRAM in address space directly
and allow users use it (GEM API). So they do not waste conventional RAM.
For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
series (even PCI class id is the same)


Don't know enough about graphics really, I'm not sure how these are
relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
mostly use guest RAM, not on card RAM.


 VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
 instead of consuming guest's RAM they should be mapped at
 some GPA and their memory accessed directly.


VMGENID is tied to a spec that rather arbitrarily asks for a fixed
address. This breaks the straight-forward approach of using a
rebalanceable PCI BAR.


For PCI rebalance to work on Windows, one has to provide working PCI driver
otherwise OS will ignore it when rebalancing happens and
might map something else over ignored BAR.


Does it disable the BAR then? Or just move it elsewhere?

it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.


Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
   in my case it was
 01:01.0 0,0xfe80+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i 
| nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe80+0x2
creates overlapping BAR with ivshmem



Thanks!
We need to figure this out because currently this does not
work properly (or maybe it works, but merely by chance).
Me and Marcel will 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Igor Mammedov
On Tue, 16 Feb 2016 12:05:33 +0200
Marcel Apfelbaum  wrote:

> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> >> On Tue, 9 Feb 2016 14:17:44 +0200
> >> "Michael S. Tsirkin"  wrote:
> >>  
> >>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > So the linker interface solves this rather neatly:
> > bios allocates memory, bios passes memory map to guest.
> > Served us well for several years without need for extensions,
> > and it does solve the VM GEN ID problem, even though
> > 1. it was never designed for huge areas like nvdimm seems to want to use
> > 2. we might want to add a new 64 bit flag to avoid touching low memory  
>  linker interface is fine for some readonly data, like ACPI tables
>  especially fixed tables not so for AML ones is one wants to patch it.
> 
>  However now when you want to use it for other purposes you start
>  adding extensions and other guest->QEMU channels to communicate
>  patching info back.
>  It steals guest's memory which is also not nice and doesn't scale well.  
> >>>
> >>> This is an argument I don't get. memory is memory. call it guest memory
> >>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>> but much slower.
> >>>
> >>> ...  
> >> It however matters for user, he pays for guest with XXX RAM but gets less
> >> than that. And that will be getting worse as a number of such devices
> >> increases.
> >>  
> > OK fine, but returning PCI BAR address to guest is wrong.
> > How about reading it from ACPI then? Is it really
> > broken unless there's *also* a driver?  
>  I don't get question, MS Spec requires address (ADDR method),
>  and it's read by ACPI (AML).  
> >>>
> >>> You were unhappy about DMA into guest memory.
> >>> As a replacement for DMA, we could have AML read from
> >>> e.g. PCI and write into RAM.
> >>> This way we don't need to pass address to QEMU.  
> >> That sounds better as it saves us from allocation of IO port
> >> and QEMU don't need to write into guest memory, the only question is
> >> if PCI_Config opregion would work with driver-less PCI device.  
> >
> > Or PCI BAR for that reason. I don't know for sure.
> >  
> >>
> >> And it's still pretty much not test-able since it would require
> >> fully running OSPM to execute AML side.  
> >
> > AML is not testable, but that's nothing new.
> > You can test reading from PCI.
> >  
> >>>  
>  As for working PCI_Config OpRegion without driver, I haven't tried,
>  but I wouldn't be surprised if it doesn't, taking in account that
>  MS introduced _DSM doesn't.
>   
> >
> >  
>  Just compare with a graphics card design, where on device memory
>  is mapped directly at some GPA not wasting RAM that guest could
>  use for other tasks.  
> >>>
> >>> This might have been true 20 years ago.  Most modern cards do DMA.  
> >>
> >> Modern cards, with it's own RAM, map its VRAM in address space directly
> >> and allow users use it (GEM API). So they do not waste conventional 
> >> RAM.
> >> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >> series (even PCI class id is the same)  
> >
> > Don't know enough about graphics really, I'm not sure how these are
> > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > mostly use guest RAM, not on card RAM.
> >  
>  VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>  instead of consuming guest's RAM they should be mapped at
>  some GPA and their memory accessed directly.  
> >>>
> >>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>> address. This breaks the straight-forward approach of using a
> >>> rebalanceable PCI BAR.  
> >>
> >> For PCI rebalance to work on Windows, one has to provide working PCI 
> >> driver
> >> otherwise OS will ignore it when rebalancing happens and
> >> might map something else over ignored BAR.  
> >
> > Does it disable the BAR then? Or just move it elsewhere?  
>  it doesn't, it just blindly ignores BARs existence and maps BAR of
>  another device with driver over it.  
> >>>
> >>> Interesting. On classical PCI this is a forbidden configuration.
> >>> Maybe we do something that confuses windows?
> >>> Could you tell me how to reproduce this behaviour?  
> >> #cat > t << EOF
> >> pci_update_mappings_del
> >> pci_update_mappings_add
> >> EOF
> >>
> >> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>
> >> wait till OS boots, note BARs 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-16 Thread Marcel Apfelbaum

On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:

On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin"  wrote:


On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:

So the linker interface solves this rather neatly:
bios allocates memory, bios passes memory map to guest.
Served us well for several years without need for extensions,
and it does solve the VM GEN ID problem, even though
1. it was never designed for huge areas like nvdimm seems to want to use
2. we might want to add a new 64 bit flag to avoid touching low memory

linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.


This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.


OK fine, but returning PCI BAR address to guest is wrong.
How about reading it from ACPI then? Is it really
broken unless there's *also* a driver?

I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).


You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.


Or PCI BAR for that reason. I don't know for sure.



And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.


AML is not testable, but that's nothing new.
You can test reading from PCI.




As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.





Just compare with a graphics card design, where on device memory
is mapped directly at some GPA not wasting RAM that guest could
use for other tasks.


This might have been true 20 years ago.  Most modern cards do DMA.


Modern cards, with it's own RAM, map its VRAM in address space directly
and allow users use it (GEM API). So they do not waste conventional RAM.
For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
series (even PCI class id is the same)


Don't know enough about graphics really, I'm not sure how these are
relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
mostly use guest RAM, not on card RAM.


VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
instead of consuming guest's RAM they should be mapped at
some GPA and their memory accessed directly.


VMGENID is tied to a spec that rather arbitrarily asks for a fixed
address. This breaks the straight-forward approach of using a
rebalanceable PCI BAR.


For PCI rebalance to work on Windows, one has to provide working PCI driver
otherwise OS will ignore it when rebalancing happens and
might map something else over ignored BAR.


Does it disable the BAR then? Or just move it elsewhere?

it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.


Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
  in my case it was
01:01.0 0,0xfe80+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i 
| nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe80+0x2
creates overlapping BAR with ivshmem



Thanks!
We need to figure this out because currently this does not
work properly (or maybe it works, but merely by chance).
Me and Marcel will play with this.



I checked and indeed we have 2 separate problems:

1. ivshmem is declared as PCI RAM controller and Windows *does* have 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-15 Thread Igor Mammedov
On Mon, 15 Feb 2016 13:26:29 +0200
"Michael S. Tsirkin"  wrote:

> On Mon, Feb 15, 2016 at 11:30:24AM +0100, Igor Mammedov wrote:
> > On Thu, 11 Feb 2016 18:30:19 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> > > > On Tue, 9 Feb 2016 14:17:44 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > > > > So the linker interface solves this rather neatly:
> > > > > > > bios allocates memory, bios passes memory map to guest.
> > > > > > > Served us well for several years without need for extensions,
> > > > > > > and it does solve the VM GEN ID problem, even though
> > > > > > > 1. it was never designed for huge areas like nvdimm seems to want 
> > > > > > > to use
> > > > > > > 2. we might want to add a new 64 bit flag to avoid touching low 
> > > > > > > memory  
> > > > > > linker interface is fine for some readonly data, like ACPI tables
> > > > > > especially fixed tables not so for AML ones is one wants to patch 
> > > > > > it.
> > > > > > 
> > > > > > However now when you want to use it for other purposes you start
> > > > > > adding extensions and other guest->QEMU channels to communicate
> > > > > > patching info back.
> > > > > > It steals guest's memory which is also not nice and doesn't scale 
> > > > > > well.  
> > > > > 
> > > > > This is an argument I don't get. memory is memory. call it guest 
> > > > > memory
> > > > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > > > but much slower.
> > > > > 
> > > > > ...
> > > > It however matters for user, he pays for guest with XXX RAM but gets 
> > > > less
> > > > than that. And that will be getting worse as a number of such devices
> > > > increases.
> > > > 
> > > > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > > > How about reading it from ACPI then? Is it really
> > > > > > > broken unless there's *also* a driver?  
> > > > > > I don't get question, MS Spec requires address (ADDR method),
> > > > > > and it's read by ACPI (AML).  
> > > > > 
> > > > > You were unhappy about DMA into guest memory.
> > > > > As a replacement for DMA, we could have AML read from
> > > > > e.g. PCI and write into RAM.
> > > > > This way we don't need to pass address to QEMU.
> > > > That sounds better as it saves us from allocation of IO port
> > > > and QEMU don't need to write into guest memory, the only question is
> > > > if PCI_Config opregion would work with driver-less PCI device.
> > > 
> > > Or PCI BAR for that reason. I don't know for sure.  
> > unfortunately BAR doesn't work for driver-less PCI device,
> > but maybe we can add vendor specific PCI_Confog to always present
> > LPC/ISA bridges and make it do the job like it does it for allocating
> > IO ports for CPU/MEM hotplug now.
> >   
> > >   
> > > > 
> > > > And it's still pretty much not test-able since it would require
> > > > fully running OSPM to execute AML side.
> > > 
> > > AML is not testable, but that's nothing new.
> > > You can test reading from PCI.
> > >   
> > > > > 
> > > > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > > > MS introduced _DSM doesn't.
> > > > > >   
> > > > > > > 
> > > > > > >   
> > > > > > > > > >Just compare with a graphics card design, where on 
> > > > > > > > > > device memory
> > > > > > > > > >is mapped directly at some GPA not wasting RAM that 
> > > > > > > > > > guest could
> > > > > > > > > >use for other tasks.  
> > > > > > > > > 
> > > > > > > > > This might have been true 20 years ago.  Most modern cards do 
> > > > > > > > > DMA.
> > > > > > > > 
> > > > > > > > Modern cards, with it's own RAM, map its VRAM in address space 
> > > > > > > > directly
> > > > > > > > and allow users use it (GEM API). So they do not waste 
> > > > > > > > conventional RAM.
> > > > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like 
> > > > > > > > in this
> > > > > > > > series (even PCI class id is the same)
> > > > > > > 
> > > > > > > Don't know enough about graphics really, I'm not sure how these 
> > > > > > > are
> > > > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems 
> > > > > > > to
> > > > > > > mostly use guest RAM, not on card RAM.
> > > > > > >   
> > > > > > > > > >VMGENID and NVDIMM use-cases look to me exactly the 
> > > > > > > > > > same, i.e.
> > > > > > > > > >instead of consuming guest's RAM they should be mapped at
> > > > > > > > > >some GPA and their memory accessed directly.  
> > > > > > > > > 
> > > > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a 
> > > > > > > > > fixed
> > > > > > > > > address. This 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-15 Thread Michael S. Tsirkin
On Mon, Feb 15, 2016 at 11:30:24AM +0100, Igor Mammedov wrote:
> On Thu, 11 Feb 2016 18:30:19 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> > > On Tue, 9 Feb 2016 14:17:44 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > > > > > So the linker interface solves this rather neatly:
> > > > > > bios allocates memory, bios passes memory map to guest.
> > > > > > Served us well for several years without need for extensions,
> > > > > > and it does solve the VM GEN ID problem, even though
> > > > > > 1. it was never designed for huge areas like nvdimm seems to want 
> > > > > > to use
> > > > > > 2. we might want to add a new 64 bit flag to avoid touching low 
> > > > > > memory
> > > > > linker interface is fine for some readonly data, like ACPI tables
> > > > > especially fixed tables not so for AML ones is one wants to patch it.
> > > > > 
> > > > > However now when you want to use it for other purposes you start
> > > > > adding extensions and other guest->QEMU channels to communicate
> > > > > patching info back.
> > > > > It steals guest's memory which is also not nice and doesn't scale 
> > > > > well.
> > > > 
> > > > This is an argument I don't get. memory is memory. call it guest memory
> > > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > > but much slower.
> > > > 
> > > > ...  
> > > It however matters for user, he pays for guest with XXX RAM but gets less
> > > than that. And that will be getting worse as a number of such devices
> > > increases.
> > >   
> > > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > > How about reading it from ACPI then? Is it really
> > > > > > broken unless there's *also* a driver?
> > > > > I don't get question, MS Spec requires address (ADDR method),
> > > > > and it's read by ACPI (AML).
> > > > 
> > > > You were unhappy about DMA into guest memory.
> > > > As a replacement for DMA, we could have AML read from
> > > > e.g. PCI and write into RAM.
> > > > This way we don't need to pass address to QEMU.  
> > > That sounds better as it saves us from allocation of IO port
> > > and QEMU don't need to write into guest memory, the only question is
> > > if PCI_Config opregion would work with driver-less PCI device.  
> > 
> > Or PCI BAR for that reason. I don't know for sure.
> unfortunately BAR doesn't work for driver-less PCI device,
> but maybe we can add vendor specific PCI_Confog to always present
> LPC/ISA bridges and make it do the job like it does it for allocating
> IO ports for CPU/MEM hotplug now.
> 
> > 
> > > 
> > > And it's still pretty much not test-able since it would require
> > > fully running OSPM to execute AML side.  
> > 
> > AML is not testable, but that's nothing new.
> > You can test reading from PCI.
> > 
> > > >   
> > > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > > MS introduced _DSM doesn't.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > > >Just compare with a graphics card design, where on device 
> > > > > > > > > memory
> > > > > > > > >is mapped directly at some GPA not wasting RAM that guest 
> > > > > > > > > could
> > > > > > > > >use for other tasks.
> > > > > > > > 
> > > > > > > > This might have been true 20 years ago.  Most modern cards do 
> > > > > > > > DMA.  
> > > > > > > 
> > > > > > > Modern cards, with it's own RAM, map its VRAM in address space 
> > > > > > > directly
> > > > > > > and allow users use it (GEM API). So they do not waste 
> > > > > > > conventional RAM.
> > > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like 
> > > > > > > in this
> > > > > > > series (even PCI class id is the same)  
> > > > > > 
> > > > > > Don't know enough about graphics really, I'm not sure how these are
> > > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > > > mostly use guest RAM, not on card RAM.
> > > > > > 
> > > > > > > > >VMGENID and NVDIMM use-cases look to me exactly the same, 
> > > > > > > > > i.e.
> > > > > > > > >instead of consuming guest's RAM they should be mapped at
> > > > > > > > >some GPA and their memory accessed directly.
> > > > > > > > 
> > > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a 
> > > > > > > > fixed
> > > > > > > > address. This breaks the straight-forward approach of using a
> > > > > > > > rebalanceable PCI BAR.  
> > > > > > > 
> > > > > > > For PCI rebalance to work on Windows, one has to provide working 
> > > > > > > PCI driver
> > > > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > > > might map something else over ignored BAR.  
> > > > > > 
> > > > > > Does it disable the BAR 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-15 Thread Igor Mammedov
On Thu, 11 Feb 2016 18:30:19 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> > On Tue, 9 Feb 2016 14:17:44 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > > > > So the linker interface solves this rather neatly:
> > > > > bios allocates memory, bios passes memory map to guest.
> > > > > Served us well for several years without need for extensions,
> > > > > and it does solve the VM GEN ID problem, even though
> > > > > 1. it was never designed for huge areas like nvdimm seems to want to 
> > > > > use
> > > > > 2. we might want to add a new 64 bit flag to avoid touching low 
> > > > > memory
> > > > linker interface is fine for some readonly data, like ACPI tables
> > > > especially fixed tables not so for AML ones is one wants to patch it.
> > > > 
> > > > However now when you want to use it for other purposes you start
> > > > adding extensions and other guest->QEMU channels to communicate
> > > > patching info back.
> > > > It steals guest's memory which is also not nice and doesn't scale well. 
> > > >
> > > 
> > > This is an argument I don't get. memory is memory. call it guest memory
> > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > but much slower.
> > > 
> > > ...  
> > It however matters for user, he pays for guest with XXX RAM but gets less
> > than that. And that will be getting worse as a number of such devices
> > increases.
> >   
> > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > How about reading it from ACPI then? Is it really
> > > > > broken unless there's *also* a driver?
> > > > I don't get question, MS Spec requires address (ADDR method),
> > > > and it's read by ACPI (AML).
> > > 
> > > You were unhappy about DMA into guest memory.
> > > As a replacement for DMA, we could have AML read from
> > > e.g. PCI and write into RAM.
> > > This way we don't need to pass address to QEMU.  
> > That sounds better as it saves us from allocation of IO port
> > and QEMU don't need to write into guest memory, the only question is
> > if PCI_Config opregion would work with driver-less PCI device.  
> 
> Or PCI BAR for that reason. I don't know for sure.
unfortunately BAR doesn't work for driver-less PCI device,
but maybe we can add vendor specific PCI_Confog to always present
LPC/ISA bridges and make it do the job like it does it for allocating
IO ports for CPU/MEM hotplug now.

> 
> > 
> > And it's still pretty much not test-able since it would require
> > fully running OSPM to execute AML side.  
> 
> AML is not testable, but that's nothing new.
> You can test reading from PCI.
> 
> > >   
> > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > MS introduced _DSM doesn't.
> > > > 
> > > > > 
> > > > > 
> > > > > > > >Just compare with a graphics card design, where on device 
> > > > > > > > memory
> > > > > > > >is mapped directly at some GPA not wasting RAM that guest 
> > > > > > > > could
> > > > > > > >use for other tasks.
> > > > > > > 
> > > > > > > This might have been true 20 years ago.  Most modern cards do 
> > > > > > > DMA.  
> > > > > > 
> > > > > > Modern cards, with it's own RAM, map its VRAM in address space 
> > > > > > directly
> > > > > > and allow users use it (GEM API). So they do not waste conventional 
> > > > > > RAM.
> > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in 
> > > > > > this
> > > > > > series (even PCI class id is the same)  
> > > > > 
> > > > > Don't know enough about graphics really, I'm not sure how these are
> > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > > mostly use guest RAM, not on card RAM.
> > > > > 
> > > > > > > >VMGENID and NVDIMM use-cases look to me exactly the same, 
> > > > > > > > i.e.
> > > > > > > >instead of consuming guest's RAM they should be mapped at
> > > > > > > >some GPA and their memory accessed directly.
> > > > > > > 
> > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > > address. This breaks the straight-forward approach of using a
> > > > > > > rebalanceable PCI BAR.  
> > > > > > 
> > > > > > For PCI rebalance to work on Windows, one has to provide working 
> > > > > > PCI driver
> > > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > > might map something else over ignored BAR.  
> > > > > 
> > > > > Does it disable the BAR then? Or just move it elsewhere?
> > > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > > another device with driver over it.
> > > 
> > > Interesting. On classical PCI this is a forbidden configuration.
> > > Maybe we do something that confuses windows?

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-11 Thread Michael S. Tsirkin
On Thu, Feb 11, 2016 at 07:34:52PM +0200, Marcel Apfelbaum wrote:
> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> >On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> >>On Tue, 9 Feb 2016 14:17:44 +0200
> >>"Michael S. Tsirkin"  wrote:
> >>
> >>>On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> >So the linker interface solves this rather neatly:
> >bios allocates memory, bios passes memory map to guest.
> >Served us well for several years without need for extensions,
> >and it does solve the VM GEN ID problem, even though
> >1. it was never designed for huge areas like nvdimm seems to want to use
> >2. we might want to add a new 64 bit flag to avoid touching low memory
> linker interface is fine for some readonly data, like ACPI tables
> especially fixed tables not so for AML ones is one wants to patch it.
> 
> However now when you want to use it for other purposes you start
> adding extensions and other guest->QEMU channels to communicate
> patching info back.
> It steals guest's memory which is also not nice and doesn't scale well.
> >>>
> >>>This is an argument I don't get. memory is memory. call it guest memory
> >>>or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>>but much slower.
> >>>
> >>>...
> >>It however matters for user, he pays for guest with XXX RAM but gets less
> >>than that. And that will be getting worse as a number of such devices
> >>increases.
> >>
> >OK fine, but returning PCI BAR address to guest is wrong.
> >How about reading it from ACPI then? Is it really
> >broken unless there's *also* a driver?
> I don't get question, MS Spec requires address (ADDR method),
> and it's read by ACPI (AML).
> >>>
> >>>You were unhappy about DMA into guest memory.
> >>>As a replacement for DMA, we could have AML read from
> >>>e.g. PCI and write into RAM.
> >>>This way we don't need to pass address to QEMU.
> >>That sounds better as it saves us from allocation of IO port
> >>and QEMU don't need to write into guest memory, the only question is
> >>if PCI_Config opregion would work with driver-less PCI device.
> >
> >Or PCI BAR for that reason. I don't know for sure.
> >
> >>
> >>And it's still pretty much not test-able since it would require
> >>fully running OSPM to execute AML side.
> >
> >AML is not testable, but that's nothing new.
> >You can test reading from PCI.
> >
> >>>
> As for working PCI_Config OpRegion without driver, I haven't tried,
> but I wouldn't be surprised if it doesn't, taking in account that
> MS introduced _DSM doesn't.
> 
> >
> >
> Just compare with a graphics card design, where on device memory
> is mapped directly at some GPA not wasting RAM that guest could
> use for other tasks.
> >>>
> >>>This might have been true 20 years ago.  Most modern cards do DMA.
> >>
> >>Modern cards, with it's own RAM, map its VRAM in address space directly
> >>and allow users use it (GEM API). So they do not waste conventional RAM.
> >>For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >>series (even PCI class id is the same)
> >
> >Don't know enough about graphics really, I'm not sure how these are
> >relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >mostly use guest RAM, not on card RAM.
> >
> VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> instead of consuming guest's RAM they should be mapped at
> some GPA and their memory accessed directly.
> >>>
> >>>VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>>address. This breaks the straight-forward approach of using a
> >>>rebalanceable PCI BAR.
> >>
> >>For PCI rebalance to work on Windows, one has to provide working PCI 
> >>driver
> >>otherwise OS will ignore it when rebalancing happens and
> >>might map something else over ignored BAR.
> >
> >Does it disable the BAR then? Or just move it elsewhere?
> it doesn't, it just blindly ignores BARs existence and maps BAR of
> another device with driver over it.
> >>>
> >>>Interesting. On classical PCI this is a forbidden configuration.
> >>>Maybe we do something that confuses windows?
> >>>Could you tell me how to reproduce this behaviour?
> >>#cat > t << EOF
> >>pci_update_mappings_del
> >>pci_update_mappings_add
> >>EOF
> >>
> >>#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>
> >>wait till OS boots, note BARs programmed for ivshmem
> >>  in my case it was
> >>01:01.0 0,0xfe80+0x100
> >>then execute script and watch pci_update_mappings* trace events
> >>
> 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-11 Thread Igor Mammedov
On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > So the linker interface solves this rather neatly:
> > > bios allocates memory, bios passes memory map to guest.
> > > Served us well for several years without need for extensions,
> > > and it does solve the VM GEN ID problem, even though
> > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > linker interface is fine for some readonly data, like ACPI tables
> > especially fixed tables not so for AML ones is one wants to patch it.
> > 
> > However now when you want to use it for other purposes you start
> > adding extensions and other guest->QEMU channels to communicate
> > patching info back.
> > It steals guest's memory which is also not nice and doesn't scale well.  
> 
> This is an argument I don't get. memory is memory. call it guest memory
> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> but much slower.
> 
> ...
It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.

> > > OK fine, but returning PCI BAR address to guest is wrong.
> > > How about reading it from ACPI then? Is it really
> > > broken unless there's *also* a driver?  
> > I don't get question, MS Spec requires address (ADDR method),
> > and it's read by ACPI (AML).  
> 
> You were unhappy about DMA into guest memory.
> As a replacement for DMA, we could have AML read from
> e.g. PCI and write into RAM.
> This way we don't need to pass address to QEMU.
That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.

And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.

> 
> > As for working PCI_Config OpRegion without driver, I haven't tried,
> > but I wouldn't be surprised if it doesn't, taking in account that
> > MS introduced _DSM doesn't.
> >   
> > > 
> > >   
> > > > > >Just compare with a graphics card design, where on device memory
> > > > > >is mapped directly at some GPA not wasting RAM that guest could
> > > > > >use for other tasks.  
> > > > > 
> > > > > This might have been true 20 years ago.  Most modern cards do DMA.
> > > > 
> > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > series (even PCI class id is the same)
> > > 
> > > Don't know enough about graphics really, I'm not sure how these are
> > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > mostly use guest RAM, not on card RAM.
> > >   
> > > > > >VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > >instead of consuming guest's RAM they should be mapped at
> > > > > >some GPA and their memory accessed directly.  
> > > > > 
> > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > address. This breaks the straight-forward approach of using a
> > > > > rebalanceable PCI BAR.
> > > > 
> > > > For PCI rebalance to work on Windows, one has to provide working PCI 
> > > > driver
> > > > otherwise OS will ignore it when rebalancing happens and
> > > > might map something else over ignored BAR.
> > > 
> > > Does it disable the BAR then? Or just move it elsewhere?  
> > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > another device with driver over it.  
> 
> Interesting. On classical PCI this is a forbidden configuration.
> Maybe we do something that confuses windows?
> Could you tell me how to reproduce this behaviour?
#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
 -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
 -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
 -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
 in my case it was
   01:01.0 0,0xfe80+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i 
| nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
  pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe80+0x2
creates overlapping BAR with ivshmem 


> 
> > >   
> > > > > 
> > > > > >In that case NVDIMM could even map whole label area and
> > > > > 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-11 Thread Michael S. Tsirkin
On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> On Tue, 9 Feb 2016 14:17:44 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > So the linker interface solves this rather neatly:
> > > > bios allocates memory, bios passes memory map to guest.
> > > > Served us well for several years without need for extensions,
> > > > and it does solve the VM GEN ID problem, even though
> > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > > linker interface is fine for some readonly data, like ACPI tables
> > > especially fixed tables not so for AML ones is one wants to patch it.
> > > 
> > > However now when you want to use it for other purposes you start
> > > adding extensions and other guest->QEMU channels to communicate
> > > patching info back.
> > > It steals guest's memory which is also not nice and doesn't scale well.  
> > 
> > This is an argument I don't get. memory is memory. call it guest memory
> > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > but much slower.
> > 
> > ...
> It however matters for user, he pays for guest with XXX RAM but gets less
> than that. And that will be getting worse as a number of such devices
> increases.
> 
> > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > How about reading it from ACPI then? Is it really
> > > > broken unless there's *also* a driver?  
> > > I don't get question, MS Spec requires address (ADDR method),
> > > and it's read by ACPI (AML).  
> > 
> > You were unhappy about DMA into guest memory.
> > As a replacement for DMA, we could have AML read from
> > e.g. PCI and write into RAM.
> > This way we don't need to pass address to QEMU.
> That sounds better as it saves us from allocation of IO port
> and QEMU don't need to write into guest memory, the only question is
> if PCI_Config opregion would work with driver-less PCI device.

Or PCI BAR for that reason. I don't know for sure.

> 
> And it's still pretty much not test-able since it would require
> fully running OSPM to execute AML side.

AML is not testable, but that's nothing new.
You can test reading from PCI.

> > 
> > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > but I wouldn't be surprised if it doesn't, taking in account that
> > > MS introduced _DSM doesn't.
> > >   
> > > > 
> > > >   
> > > > > > >Just compare with a graphics card design, where on device 
> > > > > > > memory
> > > > > > >is mapped directly at some GPA not wasting RAM that guest could
> > > > > > >use for other tasks.  
> > > > > > 
> > > > > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > > > > >   
> > > > > 
> > > > > Modern cards, with it's own RAM, map its VRAM in address space 
> > > > > directly
> > > > > and allow users use it (GEM API). So they do not waste conventional 
> > > > > RAM.
> > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in 
> > > > > this
> > > > > series (even PCI class id is the same)
> > > > 
> > > > Don't know enough about graphics really, I'm not sure how these are
> > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > mostly use guest RAM, not on card RAM.
> > > >   
> > > > > > >VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > >instead of consuming guest's RAM they should be mapped at
> > > > > > >some GPA and their memory accessed directly.  
> > > > > > 
> > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > address. This breaks the straight-forward approach of using a
> > > > > > rebalanceable PCI BAR.
> > > > > 
> > > > > For PCI rebalance to work on Windows, one has to provide working PCI 
> > > > > driver
> > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > might map something else over ignored BAR.
> > > > 
> > > > Does it disable the BAR then? Or just move it elsewhere?  
> > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > another device with driver over it.  
> > 
> > Interesting. On classical PCI this is a forbidden configuration.
> > Maybe we do something that confuses windows?
> > Could you tell me how to reproduce this behaviour?
> #cat > t << EOF
> pci_update_mappings_del
> pci_update_mappings_add
> EOF
> 
> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> 
> wait till OS boots, note BARs programmed for ivshmem
>  in my case it was
>01:01.0 0,0xfe80+0x100
> then execute script and watch pci_update_mappings* trace events
> 
> # for i in $(seq 3 18); do printf -- 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-11 Thread Marcel Apfelbaum

On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:

On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin"  wrote:


On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:

So the linker interface solves this rather neatly:
bios allocates memory, bios passes memory map to guest.
Served us well for several years without need for extensions,
and it does solve the VM GEN ID problem, even though
1. it was never designed for huge areas like nvdimm seems to want to use
2. we might want to add a new 64 bit flag to avoid touching low memory

linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.


This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.


OK fine, but returning PCI BAR address to guest is wrong.
How about reading it from ACPI then? Is it really
broken unless there's *also* a driver?

I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).


You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.


Or PCI BAR for that reason. I don't know for sure.



And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.


AML is not testable, but that's nothing new.
You can test reading from PCI.




As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.





Just compare with a graphics card design, where on device memory
is mapped directly at some GPA not wasting RAM that guest could
use for other tasks.


This might have been true 20 years ago.  Most modern cards do DMA.


Modern cards, with it's own RAM, map its VRAM in address space directly
and allow users use it (GEM API). So they do not waste conventional RAM.
For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
series (even PCI class id is the same)


Don't know enough about graphics really, I'm not sure how these are
relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
mostly use guest RAM, not on card RAM.


VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
instead of consuming guest's RAM they should be mapped at
some GPA and their memory accessed directly.


VMGENID is tied to a spec that rather arbitrarily asks for a fixed
address. This breaks the straight-forward approach of using a
rebalanceable PCI BAR.


For PCI rebalance to work on Windows, one has to provide working PCI driver
otherwise OS will ignore it when rebalancing happens and
might map something else over ignored BAR.


Does it disable the BAR then? Or just move it elsewhere?

it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.


Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
  in my case it was
01:01.0 0,0xfe80+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i 
| nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe80+0x2
creates overlapping BAR with ivshmem




Hi,

Let me see if I understand.
You say that in Windows, if a device does not have a driver installed,
its BARS ranges can be used after re-balancing by other devices, right?

If yes, in Windows we cannot use the device anyway, so we shouldn't care, right?

Our only problem remains 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-10 Thread Michael S. Tsirkin
On Wed, Feb 10, 2016 at 10:51:47AM +0200, Michael S. Tsirkin wrote:
> So maybe we should
> leave this alone, wait until we see an actual user - this way we can
> figure out the implementation constraints better.

What I'm definitely interested in seeing is improving the
bios_linker_loader API within QEMU.

Is the below going in the right direction, in your opinion?


diff --git a/include/hw/acpi/bios-linker-loader.h 
b/include/hw/acpi/bios-linker-loader.h
index 498c0af..78d9a16 100644
--- a/include/hw/acpi/bios-linker-loader.h
+++ b/include/hw/acpi/bios-linker-loader.h
@@ -17,6 +17,17 @@ void bios_linker_loader_add_checksum(GArray *linker, const 
char *file,
  void *start, unsigned size,
  uint8_t *checksum);
 
+/*
+ * bios_linker_loader_add_pointer: ask guest to append address of source file
+ * into destination file at the specified pointer.
+ *
+ * @linker: linker file array
+ * @dest_file: destination file that must be changed
+ * @src_file: source file whos address must be taken
+ * @table: destination file array
+ * @pointer: location of the pointer to be patched within destination file
+ * @pointer_size: size of pointer to be patched, in bytes
+ */
 void bios_linker_loader_add_pointer(GArray *linker,
 const char *dest_file,
 const char *src_file,
diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
index e04d60a..84be25a 100644
--- a/hw/acpi/bios-linker-loader.c
+++ b/hw/acpi/bios-linker-loader.c
@@ -142,7 +142,13 @@ void bios_linker_loader_add_pointer(GArray *linker,
 uint8_t pointer_size)
 {
 BiosLinkerLoaderEntry entry;
-size_t offset = (gchar *)pointer - table->data;
+size_t offset;
+
+assert((gchar *)pointer >= table->data);
+
+offset = (gchar *)pointer - table->data;
+
+assert(offset + pointer_size < table->len);
 
 memset(, 0, sizeof entry);
 strncpy(entry.pointer.dest_file, dest_file,

-- 
MST



Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-10 Thread Laszlo Ersek
On 02/10/16 10:28, Michael S. Tsirkin wrote:
> On Wed, Feb 10, 2016 at 10:51:47AM +0200, Michael S. Tsirkin wrote:
>> So maybe we should
>> leave this alone, wait until we see an actual user - this way we can
>> figure out the implementation constraints better.
> 
> What I'm definitely interested in seeing is improving the
> bios_linker_loader API within QEMU.
> 
> Is the below going in the right direction, in your opinion?
> 
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h 
> b/include/hw/acpi/bios-linker-loader.h
> index 498c0af..78d9a16 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -17,6 +17,17 @@ void bios_linker_loader_add_checksum(GArray *linker, const 
> char *file,
>   void *start, unsigned size,
>   uint8_t *checksum);
>  
> +/*
> + * bios_linker_loader_add_pointer: ask guest to append address of source file
> + * into destination file at the specified pointer.
> + *
> + * @linker: linker file array
> + * @dest_file: destination file that must be changed
> + * @src_file: source file whos address must be taken
> + * @table: destination file array
> + * @pointer: location of the pointer to be patched within destination file
> + * @pointer_size: size of pointer to be patched, in bytes
> + */
>  void bios_linker_loader_add_pointer(GArray *linker,
>  const char *dest_file,
>  const char *src_file,
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index e04d60a..84be25a 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -142,7 +142,13 @@ void bios_linker_loader_add_pointer(GArray *linker,
>  uint8_t pointer_size)
>  {
>  BiosLinkerLoaderEntry entry;
> -size_t offset = (gchar *)pointer - table->data;
> +size_t offset;
> +
> +assert((gchar *)pointer >= table->data);
> +
> +offset = (gchar *)pointer - table->data;
> +
> +assert(offset + pointer_size < table->len);
>  
>  memset(, 0, sizeof entry);
>  strncpy(entry.pointer.dest_file, dest_file,
> 

I have two suggestions (independently of Igor's upcoming opinion):

(1) I propose to do all this arithmetic in uintptr_t, not (char*).

(2) In the last assertion, < should be <=. Both sides are exclusive, so
equality is valid.

(BTW the OVMF implementation of the linker-loader client is chock full
of verifications like the above, done in UINT64, so I can only agree
with the above safety measures, independently of vmgenid.)

Thanks
Laszlo



Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-10 Thread Michael S. Tsirkin
On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > > 2. ACPI approach consumes guest usable RAM to allocate buffer
> > > > >and then makes device to DMA data in that RAM.
> > > > >That's a design point I don't agree with.
> > > > 
> > > > Blame the broken VM GEN ID spec.
> > > > 
> > > > For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> > > > copy data over, this would fix rebalancing but there is a problem with
> > > > this approach: it can not be done atomically (while VM is not yet
> > > > running and accessing RAM).  So you can have guest read a partially
> > > > corrupted ID from memory.  
> > > 
> > > Yep, VM GEN ID spec is broken and we can't do anything about it as
> > > it's absolutely impossible to guaranty atomic update as guest OS
> > > has address of the buffer and can read it any time. Nothing could be
> > > done here.  
> > 
> > Hmm I thought we can stop VM while we are changing the ID.
> > But of course VM could be accessing it at the same time.
> > So I take this back, ACPI code reading PCI BAR and
> > writing data out to the buffer would be fine
> > from this point of view.
> spec is broken regardless of who reads BAR or RAM as read
> isn't atomic and UUID could be updated in the middle.
> So MS will have to live with that, unless they have a secret
> way to tell guest stop reading from that ADDR, do update and
> signal guest that it can read it.

And really, that the issue that's driving this up to v19.  There's no
good way to implement a bad spec.  It feels more like a failed
experiment than like a thought through interface.  So maybe we should
leave this alone, wait until we see an actual user - this way we can
figure out the implementation constraints better.

-- 
MST



Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-09 Thread Igor Mammedov
On Tue, 2 Feb 2016 13:16:38 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Feb 02, 2016 at 10:59:53AM +0100, Igor Mammedov wrote:
> > On Sun, 31 Jan 2016 18:22:13 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 28 Jan 2016 14:59:25 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > > > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > >   
> > > > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > > > > > >   
> > > > > > > > Based on Microsoft's specifications (paper can be
> > > > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > > > add a PCI device with corresponding description in
> > > > > > > > SSDT ACPI table.
> > > > > > > > 
> > > > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > > > a corresponding HMP/QMP command.
> > > > > > > > 
> > > > > > > > Example of using vmgenid device:
> > > > > > > >  -device 
> > > > > > > > vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > > > 
> > > > > > > > 'vmgenid' device initialization flow is as following:
> > > > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > > > 
> > > > > > > > Note:
> > > > > > > > This implementation uses PCI class 0x0500 code for vmgenid 
> > > > > > > > device,
> > > > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > > > Testing various Windows versions showed that, OS
> > > > > > > > doesn't touch nor checks for resource conflicts
> > > > > > > > for such PCI devices.
> > > > > > > > There was concern that during PCI rebalancing, OS
> > > > > > > > could reprogram the BAR at other place, which would
> > > > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > > > address.
> > > > > > > > However testing showed that Windows does rebalancing
> > > > > > > > only for PCI device that have a driver attached
> > > > > > > > and completely ignores NO_DRV class of devices.
> > > > > > > > Which in turn creates a problem where OS could remap
> > > > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > > > a driver-less PCI device.
> > > > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > > > makes OS to honor resource reservation and an ignored
> > > > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > > > 
> > > > > > > > Signed-off-by: Gal Hammer 
> > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > 
> > > > > > > It's an interesting hack, but this needs some thought. BIOS has 
> > > > > > > no idea
> > > > > > > this BAR is special and can not be rebalanced, so it might put 
> > > > > > > the BAR
> > > > > > > in the middle of the range, in effect fragmenting it.  
> > > > > > yep that's the only drawback in PCI approach.
> > > > > >   
> > > > > > > Really I think something like V12 just rewritten using the new 
> > > > > > > APIs
> > > > > > > (probably with something like build_append_named_dword that I 
> > > > > > > suggested)
> > > > > > > would be much a simpler way to implement this device, given
> > > > > > > the weird API limitations.  
> > > > > > We went over stating drawbacks of both approaches several times 
> > > > > > and that's where I strongly disagree with using v12 AML patching
> > > > > > approach for reasons stated in those discussions.  
> > > > > 
> > > > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > > > to host, and to have costom code to migrate the address.
> > > > allocating IO ports is fine by me but I'm against using bios_linker 
> > > > (ACPI)
> > > > approach for task at hand,
> > > > let me enumerate one more time the issues that make me dislike it so 
> > > > much
> > > > (in order where most disliked ones go the first):
> > > > 
> > > > 1. over-engineered for the task at hand, 
> > > >for device to become initialized guest OS has to execute AML,
> > > >so init chain looks like:
> > > >  QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) 
> > > > ->
> > > >  QEMU (update buf address)
> > > >it's hell to debug when something doesn't work right in this chain   
> > > >  
> > > 
> > > Well this is not very different from e.g. virtio.
> > > If it's just AML that worries you, we could teach 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-09 Thread Michael S. Tsirkin
On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > So the linker interface solves this rather neatly:
> > bios allocates memory, bios passes memory map to guest.
> > Served us well for several years without need for extensions,
> > and it does solve the VM GEN ID problem, even though
> > 1. it was never designed for huge areas like nvdimm seems to want to use
> > 2. we might want to add a new 64 bit flag to avoid touching low memory
> linker interface is fine for some readonly data, like ACPI tables
> especially fixed tables not so for AML ones is one wants to patch it.
> 
> However now when you want to use it for other purposes you start
> adding extensions and other guest->QEMU channels to communicate
> patching info back.
> It steals guest's memory which is also not nice and doesn't scale well.

This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

> > OK fine, but returning PCI BAR address to guest is wrong.
> > How about reading it from ACPI then? Is it really
> > broken unless there's *also* a driver?
> I don't get question, MS Spec requires address (ADDR method),
> and it's read by ACPI (AML).

You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

> As for working PCI_Config OpRegion without driver, I haven't tried,
> but I wouldn't be surprised if it doesn't, taking in account that
> MS introduced _DSM doesn't.
> 
> > 
> > 
> > > > >Just compare with a graphics card design, where on device memory
> > > > >is mapped directly at some GPA not wasting RAM that guest could
> > > > >use for other tasks.
> > > > 
> > > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > > 
> > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > series (even PCI class id is the same)  
> > 
> > Don't know enough about graphics really, I'm not sure how these are
> > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > mostly use guest RAM, not on card RAM.
> > 
> > > > >VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > >instead of consuming guest's RAM they should be mapped at
> > > > >some GPA and their memory accessed directly.
> > > > 
> > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > address. This breaks the straight-forward approach of using a
> > > > rebalanceable PCI BAR.  
> > > 
> > > For PCI rebalance to work on Windows, one has to provide working PCI 
> > > driver
> > > otherwise OS will ignore it when rebalancing happens and
> > > might map something else over ignored BAR.  
> > 
> > Does it disable the BAR then? Or just move it elsewhere?
> it doesn't, it just blindly ignores BARs existence and maps BAR of
> another device with driver over it.

Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

> > 
> > > >   
> > > > >In that case NVDIMM could even map whole label area and
> > > > >significantly simplify QEMU<->OSPM protocol that currently
> > > > >serializes that data through a 4K page.
> > > > >There is also performance issue with buffer allocated in RAM,
> > > > >because DMA adds unnecessary copying step when data could
> > > > >be read/written directly of NVDIMM.
> > > > >It might be no very important for _DSM interface but when it
> > > > >comes to supporting block mode it can become an issue.
> > > > 
> > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > it's guaranteed to work across BAR rebalancing.
> > > > Would that address the performance issue?  
> > > 
> > > it would if rebalancing were to account for driverless PCI device BARs,
> > > but it doesn't hence such BARs need to be statically pinned
> > > at place where BIOS put them at start up.
> > > I'm also not sure that PCIConfig operation region would work
> > > on Windows without loaded driver (similar to _DSM case).
> > > 
> > >   
> > > > > Above points make ACPI patching approach not robust and fragile
> > > > > and hard to maintain.
> > > > 
> > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > get what appears your general dislike of the linker host/guest
> > > > interface.  
> > > Besides technical issues general dislike is just what I've written
> > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > 
> > > to make it less fragile:
> > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > current impl. silently relies on value 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-02 Thread Igor Mammedov
On Sun, 31 Jan 2016 18:22:13 +0200
"Michael S. Tsirkin"  wrote:

> On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> > On Thu, 28 Jan 2016 14:59:25 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > > > > > Based on Microsoft's specifications (paper can be
> > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > add a PCI device with corresponding description in
> > > > > > SSDT ACPI table.
> > > > > > 
> > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > a corresponding HMP/QMP command.
> > > > > > 
> > > > > > Example of using vmgenid device:
> > > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > 
> > > > > > 'vmgenid' device initialization flow is as following:
> > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > 
> > > > > > Note:
> > > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > Testing various Windows versions showed that, OS
> > > > > > doesn't touch nor checks for resource conflicts
> > > > > > for such PCI devices.
> > > > > > There was concern that during PCI rebalancing, OS
> > > > > > could reprogram the BAR at other place, which would
> > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > address.
> > > > > > However testing showed that Windows does rebalancing
> > > > > > only for PCI device that have a driver attached
> > > > > > and completely ignores NO_DRV class of devices.
> > > > > > Which in turn creates a problem where OS could remap
> > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > a driver-less PCI device.
> > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > makes OS to honor resource reservation and an ignored
> > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > 
> > > > > > Signed-off-by: Gal Hammer 
> > > > > > Signed-off-by: Igor Mammedov   
> > > > > 
> > > > > It's an interesting hack, but this needs some thought. BIOS has no 
> > > > > idea
> > > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > > in the middle of the range, in effect fragmenting it.
> > > > yep that's the only drawback in PCI approach.
> > > > 
> > > > > Really I think something like V12 just rewritten using the new APIs
> > > > > (probably with something like build_append_named_dword that I 
> > > > > suggested)
> > > > > would be much a simpler way to implement this device, given
> > > > > the weird API limitations.
> > > > We went over stating drawbacks of both approaches several times 
> > > > and that's where I strongly disagree with using v12 AML patching
> > > > approach for reasons stated in those discussions.
> > > 
> > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > to host, and to have costom code to migrate the address.  
> > allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> > approach for task at hand,
> > let me enumerate one more time the issues that make me dislike it so much
> > (in order where most disliked ones go the first):
> > 
> > 1. over-engineered for the task at hand, 
> >for device to become initialized guest OS has to execute AML,
> >so init chain looks like:
> >  QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
> >  QEMU (update buf address)
> >it's hell to debug when something doesn't work right in this chain  
> 
> Well this is not very different from e.g. virtio.
> If it's just AML that worries you, we could teach BIOS/EFI a new command
> to give some addresses after linking back to QEMU. Would this address
> this issue?
it would make it marginally better (especially from tests pov)
though it won't fix other issues.

> 
> 
> >even if there isn't any memory corruption that incorrect AML patching
> >could introduce.
> >As result of complexity patches are hard to review since one has
> >to remember/relearn all details how bios_linker in QEMU and BIOS works,
> >hence chance of regression is very high.
> >Dynamically patched AML also introduces its own share of AML
> >code that has to deal with dynamic buff 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-02-02 Thread Michael S. Tsirkin
On Tue, Feb 02, 2016 at 10:59:53AM +0100, Igor Mammedov wrote:
> On Sun, 31 Jan 2016 18:22:13 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> > > On Thu, 28 Jan 2016 14:59:25 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > > > > > > Based on Microsoft's specifications (paper can be
> > > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > > add a PCI device with corresponding description in
> > > > > > > SSDT ACPI table.
> > > > > > > 
> > > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > > a corresponding HMP/QMP command.
> > > > > > > 
> > > > > > > Example of using vmgenid device:
> > > > > > >  -device 
> > > > > > > vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > > 
> > > > > > > 'vmgenid' device initialization flow is as following:
> > > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > > 
> > > > > > > Note:
> > > > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > > Testing various Windows versions showed that, OS
> > > > > > > doesn't touch nor checks for resource conflicts
> > > > > > > for such PCI devices.
> > > > > > > There was concern that during PCI rebalancing, OS
> > > > > > > could reprogram the BAR at other place, which would
> > > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > > address.
> > > > > > > However testing showed that Windows does rebalancing
> > > > > > > only for PCI device that have a driver attached
> > > > > > > and completely ignores NO_DRV class of devices.
> > > > > > > Which in turn creates a problem where OS could remap
> > > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > > a driver-less PCI device.
> > > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > > makes OS to honor resource reservation and an ignored
> > > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > > 
> > > > > > > Signed-off-by: Gal Hammer 
> > > > > > > Signed-off-by: Igor Mammedov   
> > > > > > 
> > > > > > It's an interesting hack, but this needs some thought. BIOS has no 
> > > > > > idea
> > > > > > this BAR is special and can not be rebalanced, so it might put the 
> > > > > > BAR
> > > > > > in the middle of the range, in effect fragmenting it.
> > > > > yep that's the only drawback in PCI approach.
> > > > > 
> > > > > > Really I think something like V12 just rewritten using the new APIs
> > > > > > (probably with something like build_append_named_dword that I 
> > > > > > suggested)
> > > > > > would be much a simpler way to implement this device, given
> > > > > > the weird API limitations.
> > > > > We went over stating drawbacks of both approaches several times 
> > > > > and that's where I strongly disagree with using v12 AML patching
> > > > > approach for reasons stated in those discussions.
> > > > 
> > > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > > to host, and to have costom code to migrate the address.  
> > > allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> > > approach for task at hand,
> > > let me enumerate one more time the issues that make me dislike it so much
> > > (in order where most disliked ones go the first):
> > > 
> > > 1. over-engineered for the task at hand, 
> > >for device to become initialized guest OS has to execute AML,
> > >so init chain looks like:
> > >  QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
> > >  QEMU (update buf address)
> > >it's hell to debug when something doesn't work right in this chain  
> > 
> > Well this is not very different from e.g. virtio.
> > If it's just AML that worries you, we could teach BIOS/EFI a new command
> > to give some addresses after linking back to QEMU. Would this address
> > this issue?
> it would make it marginally better (especially from tests pov)
> though it won't fix other issues.
> 
> > 
> > 
> > >even if there isn't any memory corruption that incorrect AML patching
> > >could introduce.
> > >As result of complexity 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-31 Thread Michael S. Tsirkin
On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 14:59:25 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:  
> > > > > Based on Microsoft's specifications (paper can be
> > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > add a PCI device with corresponding description in
> > > > > SSDT ACPI table.
> > > > > 
> > > > > The GUID is set using "vmgenid.guid" property or
> > > > > a corresponding HMP/QMP command.
> > > > > 
> > > > > Example of using vmgenid device:
> > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > 
> > > > > 'vmgenid' device initialization flow is as following:
> > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > 
> > > > > Note:
> > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > Testing various Windows versions showed that, OS
> > > > > doesn't touch nor checks for resource conflicts
> > > > > for such PCI devices.
> > > > > There was concern that during PCI rebalancing, OS
> > > > > could reprogram the BAR at other place, which would
> > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > address.
> > > > > However testing showed that Windows does rebalancing
> > > > > only for PCI device that have a driver attached
> > > > > and completely ignores NO_DRV class of devices.
> > > > > Which in turn creates a problem where OS could remap
> > > > > one of PCI devices(with driver) over BAR used by
> > > > > a driver-less PCI device.
> > > > > Statically declaring used memory range as VGEN._CRS
> > > > > makes OS to honor resource reservation and an ignored
> > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > 
> > > > > Signed-off-by: Gal Hammer 
> > > > > Signed-off-by: Igor Mammedov 
> > > > 
> > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > in the middle of the range, in effect fragmenting it.  
> > > yep that's the only drawback in PCI approach.
> > >   
> > > > Really I think something like V12 just rewritten using the new APIs
> > > > (probably with something like build_append_named_dword that I suggested)
> > > > would be much a simpler way to implement this device, given
> > > > the weird API limitations.  
> > > We went over stating drawbacks of both approaches several times 
> > > and that's where I strongly disagree with using v12 AML patching
> > > approach for reasons stated in those discussions.  
> > 
> > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > to host, and to have costom code to migrate the address.
> allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> approach for task at hand,
> let me enumerate one more time the issues that make me dislike it so much
> (in order where most disliked ones go the first):
> 
> 1. over-engineered for the task at hand, 
>for device to become initialized guest OS has to execute AML,
>so init chain looks like:
>  QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
>  QEMU (update buf address)
>it's hell to debug when something doesn't work right in this chain

Well this is not very different from e.g. virtio.
If it's just AML that worries you, we could teach BIOS/EFI a new command
to give some addresses after linking back to QEMU. Would this address
this issue?


>even if there isn't any memory corruption that incorrect AML patching
>could introduce.
>As result of complexity patches are hard to review since one has
>to remember/relearn all details how bios_linker in QEMU and BIOS works,
>hence chance of regression is very high.
>Dynamically patched AML also introduces its own share of AML
>code that has to deal with dynamic buff address value.
>For an example:
>  "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
>27 liner patch could be just 5-6 lines if static (known in advance)
>buffer address were used to declare static _CRS variable.

Problem is with finding a fixed address, and fragmentation that this
causes.  Look at the mess we have with just allocating addresses for
RAM.  I think it's a 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-29 Thread Igor Mammedov
On Thu, 28 Jan 2016 14:59:25 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > On Thu, 28 Jan 2016 13:13:04 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:  
> > > > Based on Microsoft's specifications (paper can be
> > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > add a PCI device with corresponding description in
> > > > SSDT ACPI table.
> > > > 
> > > > The GUID is set using "vmgenid.guid" property or
> > > > a corresponding HMP/QMP command.
> > > > 
> > > > Example of using vmgenid device:
> > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > 
> > > > 'vmgenid' device initialization flow is as following:
> > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > > 
> > > > Note:
> > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > Testing various Windows versions showed that, OS
> > > > doesn't touch nor checks for resource conflicts
> > > > for such PCI devices.
> > > > There was concern that during PCI rebalancing, OS
> > > > could reprogram the BAR at other place, which would
> > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > address.
> > > > However testing showed that Windows does rebalancing
> > > > only for PCI device that have a driver attached
> > > > and completely ignores NO_DRV class of devices.
> > > > Which in turn creates a problem where OS could remap
> > > > one of PCI devices(with driver) over BAR used by
> > > > a driver-less PCI device.
> > > > Statically declaring used memory range as VGEN._CRS
> > > > makes OS to honor resource reservation and an ignored
> > > > BAR range is not longer touched during PCI rebalancing.
> > > > 
> > > > Signed-off-by: Gal Hammer 
> > > > Signed-off-by: Igor Mammedov 
> > > 
> > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > in the middle of the range, in effect fragmenting it.  
> > yep that's the only drawback in PCI approach.
> >   
> > > Really I think something like V12 just rewritten using the new APIs
> > > (probably with something like build_append_named_dword that I suggested)
> > > would be much a simpler way to implement this device, given
> > > the weird API limitations.  
> > We went over stating drawbacks of both approaches several times 
> > and that's where I strongly disagree with using v12 AML patching
> > approach for reasons stated in those discussions.  
> 
> Yes, IIRC you dislike the need to allocate an IO range to pass address
> to host, and to have costom code to migrate the address.
allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
approach for task at hand,
let me enumerate one more time the issues that make me dislike it so much
(in order where most disliked ones go the first):

1. over-engineered for the task at hand, 
   for device to become initialized guest OS has to execute AML,
   so init chain looks like:
 QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
 QEMU (update buf address)
   it's hell to debug when something doesn't work right in this chain
   even if there isn't any memory corruption that incorrect AML patching
   could introduce.
   As result of complexity patches are hard to review since one has
   to remember/relearn all details how bios_linker in QEMU and BIOS works,
   hence chance of regression is very high.
   Dynamically patched AML also introduces its own share of AML
   code that has to deal with dynamic buff address value.
   For an example:
 "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
   27 liner patch could be just 5-6 lines if static (known in advance)
   buffer address were used to declare static _CRS variable.

2. ACPI approach consumes guest usable RAM to allocate buffer
   and then makes device to DMA data in that RAM.
   That's a design point I don't agree with.
   Just compare with a graphics card design, where on device memory
   is mapped directly at some GPA not wasting RAM that guest could
   use for other tasks.
   VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
   instead of consuming guest's RAM they should be mapped at
   some GPA and their memory accessed directly.
   In that case NVDIMM could even map whole label area and
   significantly simplify QEMU<->OSPM protocol that 

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-28 Thread Igor Mammedov
On Thu, 28 Jan 2016 13:13:04 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > Based on Microsoft's specifications (paper can be
> > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > easily found by "Virtual Machine Generation ID" keywords),
> > add a PCI device with corresponding description in
> > SSDT ACPI table.
> > 
> > The GUID is set using "vmgenid.guid" property or
> > a corresponding HMP/QMP command.
> > 
> > Example of using vmgenid device:
> >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > 
> > 'vmgenid' device initialization flow is as following:
> >  1. vmgenid has RAM BAR registered with size of GUID buffer
> >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > are generated with \_SB.VMGI.ADDR constant pointing to
> > GPA where BIOS's mapped vmgenid's BAR earlier
> > 
> > Note:
> > This implementation uses PCI class 0x0500 code for vmgenid device,
> > that is marked as NO_DRV in Windows's machine.inf.
> > Testing various Windows versions showed that, OS
> > doesn't touch nor checks for resource conflicts
> > for such PCI devices.
> > There was concern that during PCI rebalancing, OS
> > could reprogram the BAR at other place, which would
> > leave VGEN.ADDR pointing to the old (no more valid)
> > address.
> > However testing showed that Windows does rebalancing
> > only for PCI device that have a driver attached
> > and completely ignores NO_DRV class of devices.
> > Which in turn creates a problem where OS could remap
> > one of PCI devices(with driver) over BAR used by
> > a driver-less PCI device.
> > Statically declaring used memory range as VGEN._CRS
> > makes OS to honor resource reservation and an ignored
> > BAR range is not longer touched during PCI rebalancing.
> > 
> > Signed-off-by: Gal Hammer 
> > Signed-off-by: Igor Mammedov   
> 
> It's an interesting hack, but this needs some thought. BIOS has no idea
> this BAR is special and can not be rebalanced, so it might put the BAR
> in the middle of the range, in effect fragmenting it.
yep that's the only drawback in PCI approach.

> Really I think something like V12 just rewritten using the new APIs
> (probably with something like build_append_named_dword that I suggested)
> would be much a simpler way to implement this device, given
> the weird API limitations.
We went over stating drawbacks of both approaches several times 
and that's where I strongly disagree with using v12 AML patching
approach for reasons stated in those discussions.

> And hey, if you want to use a pci device to pass the physical
> address guest to host, instead of reserving
> a couple of IO addresses, sure, stick it in pci config in
> a vendor-specific capability, this way it'll get migrated
> automatically.
Could you elaborate more on this suggestion?

> 
> 
> > ---
> > changes since 17:
> >   - small fixups suggested in v14 review by Michael S. Tsirkin" 
> > 
> >   - make BAR prefetchable to make region cached as per MS spec
> >   - s/uuid/guid/ to match spec
> > changes since 14:
> >   - reserve BAR resources so that Windows won't touch it
> > during PCI rebalancing - "Michael S. Tsirkin" 
> >   - ACPI: split VGEN device of PCI device descriptor
> > and place it at PCI0 scope, so that won't be need trace its
> > location on PCI buses. - "Michael S. Tsirkin" 
> >   - permit only one vmgenid to be created
> >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > ---
> >  default-configs/i386-softmmu.mak   |   1 +
> >  default-configs/x86_64-softmmu.mak |   1 +
> >  docs/specs/pci-ids.txt |   1 +
> >  hw/i386/acpi-build.c   |  56 +-
> >  hw/misc/Makefile.objs  |   1 +
> >  hw/misc/vmgenid.c  | 154 
> > +
> >  include/hw/misc/vmgenid.h  |  27 +++
> >  include/hw/pci/pci.h   |   1 +
> >  8 files changed, 240 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/misc/vmgenid.c
> >  create mode 100644 include/hw/misc/vmgenid.h
> > 
> > diff --git a/default-configs/i386-softmmu.mak 
> > b/default-configs/i386-softmmu.mak
> > index b177e52..6402439 100644
> > --- a/default-configs/i386-softmmu.mak
> > +++ b/default-configs/i386-softmmu.mak
> > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> >  CONFIG_IOAPIC=y
> >  CONFIG_PVPANIC=y
> >  CONFIG_MEM_HOTPLUG=y
> > +CONFIG_VMGENID=y
> >  CONFIG_NVDIMM=y
> >  CONFIG_ACPI_NVDIMM=y
> >  CONFIG_XIO3130=y
> > diff --git a/default-configs/x86_64-softmmu.mak 
> > b/default-configs/x86_64-softmmu.mak
> > index 6e3b312..fdac18f 100644
> > --- a/default-configs/x86_64-softmmu.mak
> > +++ b/default-configs/x86_64-softmmu.mak
> > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> >  CONFIG_IOAPIC=y
> >  CONFIG_PVPANIC=y

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-28 Thread Michael S. Tsirkin
On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 13:13:04 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > > Based on Microsoft's specifications (paper can be
> > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > easily found by "Virtual Machine Generation ID" keywords),
> > > add a PCI device with corresponding description in
> > > SSDT ACPI table.
> > > 
> > > The GUID is set using "vmgenid.guid" property or
> > > a corresponding HMP/QMP command.
> > > 
> > > Example of using vmgenid device:
> > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > 
> > > 'vmgenid' device initialization flow is as following:
> > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > are generated with \_SB.VMGI.ADDR constant pointing to
> > > GPA where BIOS's mapped vmgenid's BAR earlier
> > > 
> > > Note:
> > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > that is marked as NO_DRV in Windows's machine.inf.
> > > Testing various Windows versions showed that, OS
> > > doesn't touch nor checks for resource conflicts
> > > for such PCI devices.
> > > There was concern that during PCI rebalancing, OS
> > > could reprogram the BAR at other place, which would
> > > leave VGEN.ADDR pointing to the old (no more valid)
> > > address.
> > > However testing showed that Windows does rebalancing
> > > only for PCI device that have a driver attached
> > > and completely ignores NO_DRV class of devices.
> > > Which in turn creates a problem where OS could remap
> > > one of PCI devices(with driver) over BAR used by
> > > a driver-less PCI device.
> > > Statically declaring used memory range as VGEN._CRS
> > > makes OS to honor resource reservation and an ignored
> > > BAR range is not longer touched during PCI rebalancing.
> > > 
> > > Signed-off-by: Gal Hammer 
> > > Signed-off-by: Igor Mammedov   
> > 
> > It's an interesting hack, but this needs some thought. BIOS has no idea
> > this BAR is special and can not be rebalanced, so it might put the BAR
> > in the middle of the range, in effect fragmenting it.
> yep that's the only drawback in PCI approach.
> 
> > Really I think something like V12 just rewritten using the new APIs
> > (probably with something like build_append_named_dword that I suggested)
> > would be much a simpler way to implement this device, given
> > the weird API limitations.
> We went over stating drawbacks of both approaches several times 
> and that's where I strongly disagree with using v12 AML patching
> approach for reasons stated in those discussions.

Yes, IIRC you dislike the need to allocate an IO range to pass address
to host, and to have costom code to migrate the address.

> > And hey, if you want to use a pci device to pass the physical
> > address guest to host, instead of reserving
> > a couple of IO addresses, sure, stick it in pci config in
> > a vendor-specific capability, this way it'll get migrated
> > automatically.
> Could you elaborate more on this suggestion?

I really just mean using PCI_Config operation region.
If you wish, I'll try to post a prototype next week.

> > 
> > 
> > > ---
> > > changes since 17:
> > >   - small fixups suggested in v14 review by Michael S. Tsirkin" 
> > > 
> > >   - make BAR prefetchable to make region cached as per MS spec
> > >   - s/uuid/guid/ to match spec
> > > changes since 14:
> > >   - reserve BAR resources so that Windows won't touch it
> > > during PCI rebalancing - "Michael S. Tsirkin" 
> > >   - ACPI: split VGEN device of PCI device descriptor
> > > and place it at PCI0 scope, so that won't be need trace its
> > > location on PCI buses. - "Michael S. Tsirkin" 
> > >   - permit only one vmgenid to be created
> > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > ---
> > >  default-configs/i386-softmmu.mak   |   1 +
> > >  default-configs/x86_64-softmmu.mak |   1 +
> > >  docs/specs/pci-ids.txt |   1 +
> > >  hw/i386/acpi-build.c   |  56 +-
> > >  hw/misc/Makefile.objs  |   1 +
> > >  hw/misc/vmgenid.c  | 154 
> > > +
> > >  include/hw/misc/vmgenid.h  |  27 +++
> > >  include/hw/pci/pci.h   |   1 +
> > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > >  create mode 100644 hw/misc/vmgenid.c
> > >  create mode 100644 include/hw/misc/vmgenid.h
> > > 
> > > diff --git a/default-configs/i386-softmmu.mak 
> > > b/default-configs/i386-softmmu.mak
> > > index b177e52..6402439 100644
> > > --- a/default-configs/i386-softmmu.mak
> > > +++ 

[Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-28 Thread Igor Mammedov
Based on Microsoft's specifications (paper can be
downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
easily found by "Virtual Machine Generation ID" keywords),
add a PCI device with corresponding description in
SSDT ACPI table.

The GUID is set using "vmgenid.guid" property or
a corresponding HMP/QMP command.

Example of using vmgenid device:
 -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"

'vmgenid' device initialization flow is as following:
 1. vmgenid has RAM BAR registered with size of GUID buffer
 2. BIOS initializes PCI devices and it maps BAR in PCI hole
 3. BIOS reads ACPI tables from QEMU, at that moment tables
are generated with \_SB.VMGI.ADDR constant pointing to
GPA where BIOS's mapped vmgenid's BAR earlier

Note:
This implementation uses PCI class 0x0500 code for vmgenid device,
that is marked as NO_DRV in Windows's machine.inf.
Testing various Windows versions showed that, OS
doesn't touch nor checks for resource conflicts
for such PCI devices.
There was concern that during PCI rebalancing, OS
could reprogram the BAR at other place, which would
leave VGEN.ADDR pointing to the old (no more valid)
address.
However testing showed that Windows does rebalancing
only for PCI device that have a driver attached
and completely ignores NO_DRV class of devices.
Which in turn creates a problem where OS could remap
one of PCI devices(with driver) over BAR used by
a driver-less PCI device.
Statically declaring used memory range as VGEN._CRS
makes OS to honor resource reservation and an ignored
BAR range is not longer touched during PCI rebalancing.

Signed-off-by: Gal Hammer 
Signed-off-by: Igor Mammedov 
---
changes since 17:
  - small fixups suggested in v14 review by Michael S. Tsirkin" 

  - make BAR prefetchable to make region cached as per MS spec
  - s/uuid/guid/ to match spec
changes since 14:
  - reserve BAR resources so that Windows won't touch it
during PCI rebalancing - "Michael S. Tsirkin" 
  - ACPI: split VGEN device of PCI device descriptor
and place it at PCI0 scope, so that won't be need trace its
location on PCI buses. - "Michael S. Tsirkin" 
  - permit only one vmgenid to be created
  - enable BAR be mapped above 4Gb if it can't be mapped at low mem
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 docs/specs/pci-ids.txt |   1 +
 hw/i386/acpi-build.c   |  56 +-
 hw/misc/Makefile.objs  |   1 +
 hw/misc/vmgenid.c  | 154 +
 include/hw/misc/vmgenid.h  |  27 +++
 include/hw/pci/pci.h   |   1 +
 8 files changed, 240 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/vmgenid.c
 create mode 100644 include/hw/misc/vmgenid.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index b177e52..6402439 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_VMGENID=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 6e3b312..fdac18f 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_VMGENID=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
index 0adcb89..e65ecf9 100644
--- a/docs/specs/pci-ids.txt
+++ b/docs/specs/pci-ids.txt
@@ -47,6 +47,7 @@ PCI devices (other than virtio):
 1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
 1b36:0006  PCI Rocker Ethernet switch device
 1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
+1b36:0009  PCI VM-Generation device
 1b36:000a  PCI-PCI bridge (multiseat)
 
 All these devices are documented in docs/specs.
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 78758e2..0187262 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -44,6 +44,7 @@
 #include "hw/acpi/tpm.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/timer/mc146818rtc_regs.h"
+#include "hw/misc/vmgenid.h"
 
 /* Supported chipsets: */
 #include "hw/acpi/piix4.h"
@@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
 info->applesmc_io_base = applesmc_port();
 }
 
+static Aml *build_vmgenid_device(uint64_t buf_paddr)
+{
+Aml *dev, *pkg, *crs;
+
+dev = aml_device("VGEN");
+aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
+aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
+aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
+
+pkg = aml_package(2);
+/* low 32 bits of UUID buffer addr */
+

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-28 Thread Michael S. Tsirkin
On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> Based on Microsoft's specifications (paper can be
> downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> easily found by "Virtual Machine Generation ID" keywords),
> add a PCI device with corresponding description in
> SSDT ACPI table.
> 
> The GUID is set using "vmgenid.guid" property or
> a corresponding HMP/QMP command.
> 
> Example of using vmgenid device:
>  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> 
> 'vmgenid' device initialization flow is as following:
>  1. vmgenid has RAM BAR registered with size of GUID buffer
>  2. BIOS initializes PCI devices and it maps BAR in PCI hole
>  3. BIOS reads ACPI tables from QEMU, at that moment tables
> are generated with \_SB.VMGI.ADDR constant pointing to
> GPA where BIOS's mapped vmgenid's BAR earlier
> 
> Note:
> This implementation uses PCI class 0x0500 code for vmgenid device,
> that is marked as NO_DRV in Windows's machine.inf.
> Testing various Windows versions showed that, OS
> doesn't touch nor checks for resource conflicts
> for such PCI devices.
> There was concern that during PCI rebalancing, OS
> could reprogram the BAR at other place, which would
> leave VGEN.ADDR pointing to the old (no more valid)
> address.
> However testing showed that Windows does rebalancing
> only for PCI device that have a driver attached
> and completely ignores NO_DRV class of devices.
> Which in turn creates a problem where OS could remap
> one of PCI devices(with driver) over BAR used by
> a driver-less PCI device.
> Statically declaring used memory range as VGEN._CRS
> makes OS to honor resource reservation and an ignored
> BAR range is not longer touched during PCI rebalancing.
> 
> Signed-off-by: Gal Hammer 
> Signed-off-by: Igor Mammedov 

It's an interesting hack, but this needs some thought. BIOS has no idea
this BAR is special and can not be rebalanced, so it might put the BAR
in the middle of the range, in effect fragmenting it.

Really I think something like V12 just rewritten using the new APIs
(probably with something like build_append_named_dword that I suggested)
would be much a simpler way to implement this device, given
the weird API limitations.

And hey, if you want to use a pci device to pass the physical
address guest to host, instead of reserving
a couple of IO addresses, sure, stick it in pci config in
a vendor-specific capability, this way it'll get migrated
automatically.


> ---
> changes since 17:
>   - small fixups suggested in v14 review by Michael S. Tsirkin" 
> 
>   - make BAR prefetchable to make region cached as per MS spec
>   - s/uuid/guid/ to match spec
> changes since 14:
>   - reserve BAR resources so that Windows won't touch it
> during PCI rebalancing - "Michael S. Tsirkin" 
>   - ACPI: split VGEN device of PCI device descriptor
> and place it at PCI0 scope, so that won't be need trace its
> location on PCI buses. - "Michael S. Tsirkin" 
>   - permit only one vmgenid to be created
>   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> ---
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  docs/specs/pci-ids.txt |   1 +
>  hw/i386/acpi-build.c   |  56 +-
>  hw/misc/Makefile.objs  |   1 +
>  hw/misc/vmgenid.c  | 154 
> +
>  include/hw/misc/vmgenid.h  |  27 +++
>  include/hw/pci/pci.h   |   1 +
>  8 files changed, 240 insertions(+), 2 deletions(-)
>  create mode 100644 hw/misc/vmgenid.c
>  create mode 100644 include/hw/misc/vmgenid.h
> 
> diff --git a/default-configs/i386-softmmu.mak 
> b/default-configs/i386-softmmu.mak
> index b177e52..6402439 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
>  CONFIG_MEM_HOTPLUG=y
> +CONFIG_VMGENID=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_XIO3130=y
> diff --git a/default-configs/x86_64-softmmu.mak 
> b/default-configs/x86_64-softmmu.mak
> index 6e3b312..fdac18f 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
>  CONFIG_MEM_HOTPLUG=y
> +CONFIG_VMGENID=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_XIO3130=y
> diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> index 0adcb89..e65ecf9 100644
> --- a/docs/specs/pci-ids.txt
> +++ b/docs/specs/pci-ids.txt
> @@ -47,6 +47,7 @@ PCI devices (other than virtio):
>  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
>  1b36:0006  PCI Rocker Ethernet switch device
>  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> +1b36:0009  PCI VM-Generation device
>  1b36:000a  PCI-PCI bridge (multiseat)
>  

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device

2016-01-28 Thread Laszlo Ersek
On 01/28/16 12:13, Michael S. Tsirkin wrote:
> On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
>> Based on Microsoft's specifications (paper can be
>> downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
>> easily found by "Virtual Machine Generation ID" keywords),
>> add a PCI device with corresponding description in
>> SSDT ACPI table.
>>
>> The GUID is set using "vmgenid.guid" property or
>> a corresponding HMP/QMP command.
>>
>> Example of using vmgenid device:
>>  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>>
>> 'vmgenid' device initialization flow is as following:
>>  1. vmgenid has RAM BAR registered with size of GUID buffer
>>  2. BIOS initializes PCI devices and it maps BAR in PCI hole
>>  3. BIOS reads ACPI tables from QEMU, at that moment tables
>> are generated with \_SB.VMGI.ADDR constant pointing to
>> GPA where BIOS's mapped vmgenid's BAR earlier
>>
>> Note:
>> This implementation uses PCI class 0x0500 code for vmgenid device,
>> that is marked as NO_DRV in Windows's machine.inf.
>> Testing various Windows versions showed that, OS
>> doesn't touch nor checks for resource conflicts
>> for such PCI devices.
>> There was concern that during PCI rebalancing, OS
>> could reprogram the BAR at other place, which would
>> leave VGEN.ADDR pointing to the old (no more valid)
>> address.
>> However testing showed that Windows does rebalancing
>> only for PCI device that have a driver attached
>> and completely ignores NO_DRV class of devices.
>> Which in turn creates a problem where OS could remap
>> one of PCI devices(with driver) over BAR used by
>> a driver-less PCI device.
>> Statically declaring used memory range as VGEN._CRS
>> makes OS to honor resource reservation and an ignored
>> BAR range is not longer touched during PCI rebalancing.
>>
>> Signed-off-by: Gal Hammer 
>> Signed-off-by: Igor Mammedov 
> 
> It's an interesting hack, but this needs some thought. BIOS has no idea
> this BAR is special and can not be rebalanced, so it might put the BAR
> in the middle of the range, in effect fragmenting it.
> 
> Really I think something like V12 just rewritten using the new APIs
> (probably with something like build_append_named_dword that I suggested)
> would be much a simpler way to implement this device, given
> the weird API limitations.
> 
> And hey, if you want to use a pci device to pass the physical
> address guest to host, instead of reserving
> a couple of IO addresses, sure, stick it in pci config in
> a vendor-specific capability, this way it'll get migrated
> automatically.

Looks like the bottleneck for this feature is that we can't agree on the
design.

I thought that the ACPI approach was a nice idea; the DataTableRegion
trick made me enthusiastic. Since DataTableRegion won't work, I feel
that the ACPI approach has devolved into too many layers of manual
indirections. (If I recall right, I posted one of the recent updates on
that, with some diagrams).

I think that this investigation has been beaten to death; now we should
just do the simplest thing that works. I think Igor's PCI approach is
simpler, and may be more easily debuggable, than the one ACPI approach
that could *actually* be made work.

I believe it wasn't my original priority about this feature, but at v19,
simplicity of design & implementation looks like the most important
trait. At this point even the fixed MMIO range near the LAPIC range
(whose relocation we don't support anyway, BTW) could be preferable.

Anyway, I've said my piece, I'm outta here. I hope you guys can reach an
agreement.

Thanks
Laszlo

> 
> 
>> ---
>> changes since 17:
>>   - small fixups suggested in v14 review by Michael S. Tsirkin" 
>> 
>>   - make BAR prefetchable to make region cached as per MS spec
>>   - s/uuid/guid/ to match spec
>> changes since 14:
>>   - reserve BAR resources so that Windows won't touch it
>> during PCI rebalancing - "Michael S. Tsirkin" 
>>   - ACPI: split VGEN device of PCI device descriptor
>> and place it at PCI0 scope, so that won't be need trace its
>> location on PCI buses. - "Michael S. Tsirkin" 
>>   - permit only one vmgenid to be created
>>   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
>> ---
>>  default-configs/i386-softmmu.mak   |   1 +
>>  default-configs/x86_64-softmmu.mak |   1 +
>>  docs/specs/pci-ids.txt |   1 +
>>  hw/i386/acpi-build.c   |  56 +-
>>  hw/misc/Makefile.objs  |   1 +
>>  hw/misc/vmgenid.c  | 154 
>> +
>>  include/hw/misc/vmgenid.h  |  27 +++
>>  include/hw/pci/pci.h   |   1 +
>>  8 files changed, 240 insertions(+), 2 deletions(-)
>>  create mode 100644 hw/misc/vmgenid.c
>>  create mode 100644 include/hw/misc/vmgenid.h
>>
>> diff --git a/default-configs/i386-softmmu.mak 
>>