Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Avi Kivity

On 01/18/2011 04:28 PM, Jan Kiszka wrote:


  So we can either infect the whole device tree with kvm (or maybe a
  more generic accelerator structure that also deals with Xen) or we need
  to pull the reference inside the device's init function from some global
  service (kvm_get_state).

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.


I'm one of them, but I don't have anything better to suggest than adding 
kvm_state attribute to qdev, which seems mighty artificial.  So I'm in 
favour of eliminating it now.



It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.


I'm biased in the other direction, but I agree.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Avi Kivity

On 01/18/2011 05:50 PM, Anthony Liguori wrote:

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.


The bus topology reflects how I/O flows in and out of a device.  We do 
not model a perfect PC bus architecture and I don't think we ever 
intend to.  Instead, we model a functional architecture.


A KVM bus is far from a function architecture.  It simply departs from 
both what a real PC looks like, and what a KVM PC looks like to a guest, 
for no reason except to force a particular object model.


It's completely artificial.  If kvm were not behind the kernel/user 
interface, and an unchangeable ABI, we'd refactor it.  However, we can't 
refactor it, and you're trying to warp the device model to adapt to this 
design problem instead of working around it.  You're elevating a kink 
into an architectural feature.




I/O from an assigned device does not flow through the emulated PCI 
bus.  Therefore, it does not belong on the emulated PCI bus.


Yes it does.  Config space and some mmio flows through qemu and the 
emulated PCI bus.  So do things like hotplug/hotunplug.





Assigned devices need to interact with the emulated PCI bus, but they 
shouldn't be children of it.




What's the difference, from the guest point of view, from an assigned 
RTL8139 card, and an emulated RTL8139 card?


If you believe there is no difference, what better way to model this 
than implement them the same way using the same interfaces?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Avi Kivity

On 01/19/2011 06:57 PM, Anthony Liguori wrote:

On 01/19/2011 07:15 AM, Markus Armbruster wrote:

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?


It's almost arbitrary, but I would say it's the direction that I/Os flow.



In the case of kvm, things are somewhat misleading.  I/O still flows 
through the (virtual) PCI bus, it's just short-circuited to a real 
device.  Similarly when attaching an ioeventfd to a virtio kick 
register, things still logically from the same way as without ioeventfd; 
we simply add a fast path for the operation.  But it doesn't change the 
logical view of things.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Avi Kivity

On 01/20/2011 11:22 PM, Jan Kiszka wrote:

On 2011-01-20 20:27, Blue Swirl wrote:
  On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszkajan.kis...@siemens.com  wrote:
  On 2011-01-19 20:32, Blue Swirl wrote:
  On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
  aligu...@linux.vnet.ibm.com  wrote:
  On 01/19/2011 07:15 AM, Markus Armbruster wrote:

  So they interact with KVM (need kvm_state), and they interact with the
  emulated PCI bus.  Could you elaborate on the fundamental difference
  between the two interactions that makes you choose the (hypothetical)
  KVM bus over the PCI bus as device parent?


  It's almost arbitrary, but I would say it's the direction that I/Os flow.

  But if the underlying observation is that the device tree is not really a
  tree, you're 100% correct.  This is part of why a factory interface that
  just takes a parent bus is too simplistic.

  I think we ought to introduce a -pci-device option that is specifically 
for
  creating PCI devices that doesn't require a parent bus argument but 
provides
  a way to specify stable addressing (for instancing, using a linear index).

  I think kvm_state should not be a property of any device or bus. It
  should be split to more logical pieces.

  Some parts of it could remain in CPUState, because they are associated
  with a VCPU.

  Also, for example irqfd could be considered to be similar object to
  char or block devices provided by QEMU to devices. Would it make sense
  to introduce new host types for passing parts of kvm_state to devices?

  I'd also make coalesced MMIO stuff part of memory object. We are not
  passing any state references when using cpu_physical_memory_rw(), but
  that could be changed.

  There are currently no VCPU-specific bits remaining in kvm_state.

  I think fields vcpu_events, robust_singlestep, debugregs,
  kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
  same for all VCPUs but still they are sort of CPU properties. I'm not
  sure about fd field.

They are all properties of the currently loaded KVM subsystem in the
host kernel. They can't change while KVM's root fd is opened.
Replicating this static information into each and every VCPU state would
be crazy.


Perhaps they should be renamed to have_xsave or features.xsave, and be 
made bools, to improve readability.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Anthony Liguori

On 01/25/2011 04:27 AM, Avi Kivity wrote:

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.


I'm biased in the other direction, but I agree.


Just #include kvm.h and reference the global kvm_state once in the 
initfn.  We don't have to solve this problem yet.  References to the 
global kvm_state become placeholders of where things need to be fixed up.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-25 Thread Anthony Liguori

On 01/25/2011 05:06 AM, Avi Kivity wrote:

On 01/19/2011 06:57 PM, Anthony Liguori wrote:

On 01/19/2011 07:15 AM, Markus Armbruster wrote:

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?


It's almost arbitrary, but I would say it's the direction that I/Os 
flow.




In the case of kvm, things are somewhat misleading.  I/O still flows 
through the (virtual) PCI bus, it's just short-circuited to a real device.


It doesn't.  If we have a PCI bus that transforms I/O or remaps I/O via 
an IOMMU, that device doesn't participate in it.


But this whole discussion is way off track.

We don't have to solve any of these problems today.  Just don't remove 
kvm_state and grab a global reference to it when we need to (which is 
*at best* one place in the code today) and let's move on with our lives.


Regards,

Anthony Liguori

  Similarly when attaching an ioeventfd to a virtio kick register, 
things still logically from the same way as without ioeventfd; we 
simply add a fast path for the operation.  But it doesn't change the 
logical view of things.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-24 Thread Gleb Natapov
On Tue, Jan 18, 2011 at 11:09:01AM -0600, Anthony Liguori wrote:
 But we also need to provide a compatible interface to management tools.
 Exposing the device model topology as a compatible interface
 artificially limits us.  It's far better to provide higher level
 supported interfaces to give us the flexibility to change the device
 model as we need to.
 How do you want to change qdev to keep the guest and management tool
 view stable while branching off kvm sub-buses?
 
 The qdev device model is not a stable interface.  I think that's
 been clear from the very beginning.
 

And what was the reason it was declared not stable? May be because we
were not sure we will do it right from the start, so change will be
needed later. But changes should bring qdev closer to reflect what guest
expect device topology to look like. This will bring us to stable state
as close possible.  We need this knowledge and stability in qdev for
device path creation.  Both kind of device paths: OF and the one we use
for migration. To create OF device path we need to know topology as seen
by a guest (and guest does not care how isa bus is implemented internally
inside south bridge), to create device path used for migration we need
stability, otherwise change in qdev topology will break migration. All
this artificial buses you propose to add move us in opposite direction
and make qdev useless for anything but  well for anything.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-24 Thread Jan Kiszka
On 2011-01-21 19:49, Blue Swirl wrote:
 I'd add fourth possible class:
  - device, CPU and machine configuration, like nographic,
 win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
 irqchip_in_kernel could fit here, though it obviously depends on a
 host capability too.

 I would count everything that cannot be assigned to a concrete device
 upfront to the dynamic state of a machine, thus class 2. The point is,
 (potentially) every device of that machine requires access to it, just
 like (indirectly, via the KVM core services) to some KVM VM state bits.
 
 The machine class should not be a catch-all, it would be like
 QEMUState or KVMState then. Perhaps each field or variable should be
 listed and given more thought.

Let's start with what is most urgent:

 - vmfd: file descriptor required for any KVM request that has VM scope
   (in-kernel device creation, device state synchronizations, IRQ
   routing etc.)
 - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
   (some devices will have to adjust their behavior depending on this)

pit_in_kernel would be analogue to irqchip, but it's also conceptually
x86-only (irqchips is only used by x86, but not tied to it) and it's not
mandatory for a first round of KVM devices for upstream.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-24 Thread Blue Swirl
On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 19:49, Blue Swirl wrote:
 I'd add fourth possible class:
  - device, CPU and machine configuration, like nographic,
 win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
 irqchip_in_kernel could fit here, though it obviously depends on a
 host capability too.

 I would count everything that cannot be assigned to a concrete device
 upfront to the dynamic state of a machine, thus class 2. The point is,
 (potentially) every device of that machine requires access to it, just
 like (indirectly, via the KVM core services) to some KVM VM state bits.

 The machine class should not be a catch-all, it would be like
 QEMUState or KVMState then. Perhaps each field or variable should be
 listed and given more thought.

 Let's start with what is most urgent:

  - vmfd: file descriptor required for any KVM request that has VM scope
   (in-kernel device creation, device state synchronizations, IRQ
   routing etc.)

I'd say VM state.

  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
   (some devices will have to adjust their behavior depending on this)

Since QEMU version is useless, I peeked at qemu-kvm version.

There are a lot of lines like:
if (kvm_enabled()  !kvm_irqchip_in_kernel())
kvm_just_do_it();

Perhaps these would be cleaner with stub functions.

The device cases are obvious: the devices need a flag, passed to them
by pc.c, which combines kvm_enabled  kvm_irqchip_in_kernel(). This
gets stored in device state.

But exec.c case, where kvm_update_interrupt_request() is called, is
more interesting. CPU init could set up function pointer to either
stub/NULL or kvm_update_interrupt_request().

I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.

So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
The information could be stored in a MachineState, where pc.c could
grab it for device and CPU setup.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-24 Thread Jan Kiszka
On 2011-01-24 22:35, Blue Swirl wrote:
 On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 19:49, Blue Swirl wrote:
 I'd add fourth possible class:
  - device, CPU and machine configuration, like nographic,
 win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
 irqchip_in_kernel could fit here, though it obviously depends on a
 host capability too.

 I would count everything that cannot be assigned to a concrete device
 upfront to the dynamic state of a machine, thus class 2. The point is,
 (potentially) every device of that machine requires access to it, just
 like (indirectly, via the KVM core services) to some KVM VM state bits.

 The machine class should not be a catch-all, it would be like
 QEMUState or KVMState then. Perhaps each field or variable should be
 listed and given more thought.

 Let's start with what is most urgent:

  - vmfd: file descriptor required for any KVM request that has VM scope
   (in-kernel device creation, device state synchronizations, IRQ
   routing etc.)
 
 I'd say VM state.

Good. That's +1 for introducing and distributing it.

 
  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
   (some devices will have to adjust their behavior depending on this)
 
 Since QEMU version is useless, I peeked at qemu-kvm version.
 
 There are a lot of lines like:
 if (kvm_enabled()  !kvm_irqchip_in_kernel())
 kvm_just_do_it();
 
 Perhaps these would be cleaner with stub functions.

Probably. I guess there is quite some room left for cleanups in this area.

 
 The device cases are obvious: the devices need a flag, passed to them
 by pc.c, which combines kvm_enabled  kvm_irqchip_in_kernel(). This
 gets stored in device state.

Not all devices are only instantiated by the machine init code. Even if
we are lucky that all those we need on x86 are created that way, we
shouldn't rely on this for future use case, including other KVM archs.

 
 But exec.c case, where kvm_update_interrupt_request() is called, is
 more interesting. CPU init could set up function pointer to either
 stub/NULL or kvm_update_interrupt_request().
 

Yes, callbacks are the way to go long term. Here we could also define
one for VCPU interrupt handling and set it according to the VCPU mode.

 I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.
 
 So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
 The information could be stored in a MachineState, where pc.c could
 grab it for device and CPU setup.

I still don't see how we can distribute the information to all
interested devices. It's basically the same issue as with current kvm_state.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Gerd Hoffmann

On 01/20/11 20:39, Anthony Liguori wrote:

On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:

Hi,


For (2), you cannot use bus=X,addr=Y because it makes assumptions about
the PCI topology which may change in newer -M pc's.


Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I expect
we'll see a new machine type 'q35', so '-m q35' will pick the ich9
chipset (which will have a different pci topology of course) and '-m
pc' will pick the existing piix chipset (which will continue to look
like it looks today).


But then what's the default machine type? When I say -M pc, I really
mean the default machine.


I'd tend to leave pc as default for a release cycle or two so we can 
hash out issues with q35, then flip the default once it got broader 
testing and runs stable.



At some point, qemu-system-x86_64 -device virtio-net-pci,addr=2.0

Is not going to be a reliable way to invoke qemu because there's no way
we can guarantee that slot 2 isn't occupied by a chipset device or some
other default device.


Indeed.  But qemu -M pc should continue to work though.  'pc' would 
better named 'piix3', but renaming it now is probably not worth the trouble.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Gerd Hoffmann

  Hi,


By the way, we don't have a QEMUState but instead use globals.


/me wants to underline this.

IMO it is absolutely pointless to worry about ways to pass around 
kvm_state.  There never ever will be a serious need for that.


We can stick with the current model of keeping global state in global 
variables.  And just do the same with kvm_state.


Or we can move to have all state in a QEMUState struct which we'll pass 
around basically everywhere.  Then we can simply embed or reference 
kvm_state there.


I'd tend to stick with the global variables as I don't see the point in 
having a QEMUstate.  I doubt we'll ever see two virtual machines driven 
by a single qemu process.  YMMV.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Markus Armbruster
Gerd Hoffmann kra...@redhat.com writes:

 On 01/20/11 20:39, Anthony Liguori wrote:
 On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
 Hi,

 For (2), you cannot use bus=X,addr=Y because it makes assumptions about
 the PCI topology which may change in newer -M pc's.

 Why should the PCI topology for 'pc' ever change?

 We'll probably get q35 support some day, but when this lands I expect
 we'll see a new machine type 'q35', so '-m q35' will pick the ich9
 chipset (which will have a different pci topology of course) and '-m
 pc' will pick the existing piix chipset (which will continue to look
 like it looks today).

 But then what's the default machine type? When I say -M pc, I really
 mean the default machine.

 I'd tend to leave pc as default for a release cycle or two so we can
 hash out issues with q35, then flip the default once it got broader
 testing and runs stable.

 At some point, qemu-system-x86_64 -device virtio-net-pci,addr=2.0

 Is not going to be a reliable way to invoke qemu because there's no way
 we can guarantee that slot 2 isn't occupied by a chipset device or some
 other default device.

 Indeed.  But qemu -M pc should continue to work though.  'pc' would
 better named 'piix3', but renaming it now is probably not worth the
 trouble.

We mustn't change pc-0.14  friends.  We routinely change pc, but
whether an upgrade to q35 qualifies as routine change is debatable.

If you don't want PCI topology (and more) to change across QEMU updates,
consider using the versioned machine types.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Markus Armbruster
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around
 kvm_state.  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll
 pass around basically everywhere.  Then we can simply embed or
 reference kvm_state there.

 I'd tend to stick with the global variables as I don't see the point
 in having a QEMUstate.  I doubt we'll ever see two virtual machines
 driven by a single qemu process.  YMMV.

/me grabs the fat magic marker and underlines some more.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl
On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann kra...@redhat.com wrote:
  Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around kvm_state.
  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll pass
 around basically everywhere.  Then we can simply embed or reference
 kvm_state there.

 I'd tend to stick with the global variables as I don't see the point in
 having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
 single qemu process.  YMMV.

Global variables are signs of a poor design. QEMUState would not help
that, instead more specific structures should be designed, much like
what I've proposed for KVMState. Some of these new structures should
be even passed around when it makes sense.

But I'd not start kvm_state redesign around global variables or QEMUState.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Jan Kiszka
On 2011-01-21 17:37, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann kra...@redhat.com wrote:
  Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around kvm_state.
  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll pass
 around basically everywhere.  Then we can simply embed or reference
 kvm_state there.

 I'd tend to stick with the global variables as I don't see the point in
 having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
 single qemu process.  YMMV.
 
 Global variables are signs of a poor design.

s/are/can be/.

 QEMUState would not help
 that, instead more specific structures should be designed, much like
 what I've proposed for KVMState. Some of these new structures should
 be even passed around when it makes sense.
 
 But I'd not start kvm_state redesign around global variables or QEMUState.

We do not need to move individual fields yet, but we need to define
classes of fields and strategies how to deal with them long-term. Then
we can move forward, and that already in the right direction.

Obvious classes are
 - static host capabilities and means for the KVM core to query them
 - per-VM fields
 - fields related to memory management

And we now need at least a plan for the second class to proceed with the
actual job.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl
On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 17:37, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann kra...@redhat.com wrote:
  Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around kvm_state.
  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll pass
 around basically everywhere.  Then we can simply embed or reference
 kvm_state there.

 I'd tend to stick with the global variables as I don't see the point in
 having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
 single qemu process.  YMMV.

 Global variables are signs of a poor design.

 s/are/can be/.

 QEMUState would not help
 that, instead more specific structures should be designed, much like
 what I've proposed for KVMState. Some of these new structures should
 be even passed around when it makes sense.

 But I'd not start kvm_state redesign around global variables or QEMUState.

 We do not need to move individual fields yet, but we need to define
 classes of fields and strategies how to deal with them long-term. Then
 we can move forward, and that already in the right direction.

Excellent plan.

 Obvious classes are
  - static host capabilities and means for the KVM core to query them

OK. There could be other host capabilities here in the future too,
like Xen. I don't think there are any Xen capabilities ATM though but
IIRC some recently sent patches had something like those.

  - per-VM fields

What is per-VM which is not machine or CPU architecture specific?

  - fields related to memory management

OK.

I'd add fourth possible class:
 - device, CPU and machine configuration, like nographic,
win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
irqchip_in_kernel could fit here, though it obviously depends on a
host capability too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Jan Kiszka
On 2011-01-21 19:04, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 17:37, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann kra...@redhat.com wrote:
  Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around 
 kvm_state.
  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll pass
 around basically everywhere.  Then we can simply embed or reference
 kvm_state there.

 I'd tend to stick with the global variables as I don't see the point in
 having a QEMUstate.  I doubt we'll ever see two virtual machines driven by 
 a
 single qemu process.  YMMV.

 Global variables are signs of a poor design.

 s/are/can be/.

 QEMUState would not help
 that, instead more specific structures should be designed, much like
 what I've proposed for KVMState. Some of these new structures should
 be even passed around when it makes sense.

 But I'd not start kvm_state redesign around global variables or QEMUState.

 We do not need to move individual fields yet, but we need to define
 classes of fields and strategies how to deal with them long-term. Then
 we can move forward, and that already in the right direction.
 
 Excellent plan.
 
 Obvious classes are
  - static host capabilities and means for the KVM core to query them
 
 OK. There could be other host capabilities here in the future too,
 like Xen. I don't think there are any Xen capabilities ATM though but
 IIRC some recently sent patches had something like those.
 
  - per-VM fields
 
 What is per-VM which is not machine or CPU architecture specific?

I think it would suffice for a first step to consider all per-VM fields
as independent of CPU architecture or machine type.

 
  - fields related to memory management
 
 OK.
 
 I'd add fourth possible class:
  - device, CPU and machine configuration, like nographic,
 win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
 irqchip_in_kernel could fit here, though it obviously depends on a
 host capability too.

I would count everything that cannot be assigned to a concrete device
upfront to the dynamic state of a machine, thus class 2. The point is,
(potentially) every device of that machine requires access to it, just
like (indirectly, via the KVM core services) to some KVM VM state bits.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-21 Thread Blue Swirl
On Fri, Jan 21, 2011 at 6:17 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 19:04, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-21 17:37, Blue Swirl wrote:
 On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann kra...@redhat.com wrote:
  Hi,

 By the way, we don't have a QEMUState but instead use globals.

 /me wants to underline this.

 IMO it is absolutely pointless to worry about ways to pass around 
 kvm_state.
  There never ever will be a serious need for that.

 We can stick with the current model of keeping global state in global
 variables.  And just do the same with kvm_state.

 Or we can move to have all state in a QEMUState struct which we'll pass
 around basically everywhere.  Then we can simply embed or reference
 kvm_state there.

 I'd tend to stick with the global variables as I don't see the point in
 having a QEMUstate.  I doubt we'll ever see two virtual machines driven 
 by a
 single qemu process.  YMMV.

 Global variables are signs of a poor design.

 s/are/can be/.

 QEMUState would not help
 that, instead more specific structures should be designed, much like
 what I've proposed for KVMState. Some of these new structures should
 be even passed around when it makes sense.

 But I'd not start kvm_state redesign around global variables or QEMUState.

 We do not need to move individual fields yet, but we need to define
 classes of fields and strategies how to deal with them long-term. Then
 we can move forward, and that already in the right direction.

 Excellent plan.

 Obvious classes are
  - static host capabilities and means for the KVM core to query them

 OK. There could be other host capabilities here in the future too,
 like Xen. I don't think there are any Xen capabilities ATM though but
 IIRC some recently sent patches had something like those.

  - per-VM fields

 What is per-VM which is not machine or CPU architecture specific?

 I think it would suffice for a first step to consider all per-VM fields
 as independent of CPU architecture or machine type.

I'm afraid that would not be progress.

  - fields related to memory management

 OK.

 I'd add fourth possible class:
  - device, CPU and machine configuration, like nographic,
 win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
 irqchip_in_kernel could fit here, though it obviously depends on a
 host capability too.

 I would count everything that cannot be assigned to a concrete device
 upfront to the dynamic state of a machine, thus class 2. The point is,
 (potentially) every device of that machine requires access to it, just
 like (indirectly, via the KVM core services) to some KVM VM state bits.

The machine class should not be a catch-all, it would be like
QEMUState or KVMState then. Perhaps each field or variable should be
listed and given more thought.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Gerd Hoffmann

  Hi,


For (2), you cannot use bus=X,addr=Y because it makes assumptions about
the PCI topology which may change in newer -M pc's.


Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I expect 
we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
chipset (which will have a different pci topology of course) and '-m pc' 
will pick the existing piix chipset (which will continue to look like it 
looks today).


cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Jan Kiszka
On 2011-01-19 20:32, Blue Swirl wrote:
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically for
 creating PCI devices that doesn't require a parent bus argument but provides
 a way to specify stable addressing (for instancing, using a linear index).
 
 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.
 
 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.
 
 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?
 
 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.

There are currently no VCPU-specific bits remaining in kvm_state. It may
be a good idea to introduce an arch-specific kvm_state and move related
bits over. It may also once be feasible to carve out memory management
related fields if we have proper abstractions for that, but I'm not
completely sure here.

Anyway, all these things are secondary. The primary topic here is how to
deal with kvm_state and its fields that have VM-global scope.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Daniel P. Berrange
On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
   Hi,
 
 For (2), you cannot use bus=X,addr=Y because it makes assumptions about
 the PCI topology which may change in newer -M pc's.
 
 Why should the PCI topology for 'pc' ever change?
 
 We'll probably get q35 support some day, but when this lands I
 expect we'll see a new machine type 'q35', so '-m q35' will pick the
 ich9 chipset (which will have a different pci topology of course)
 and '-m pc' will pick the existing piix chipset (which will continue
 to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl
On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-19 20:32, Blue Swirl wrote:
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically for
 creating PCI devices that doesn't require a parent bus argument but provides
 a way to specify stable addressing (for instancing, using a linear index).

 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.

 There are currently no VCPU-specific bits remaining in kvm_state.

I think fields vcpu_events, robust_singlestep, debugregs,
kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
same for all VCPUs but still they are sort of CPU properties. I'm not
sure about fd field.

 It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over.

This should probably contain only irqchip_in_kernel, pit_in_kernel and
many_ioeventfds, maybe fd.

 It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.

I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
migration_log into the memory object.

 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.

If it is an opaque blob which contains various unrelated stuff, no
clear place will be found.

By the way, we don't have a QEMUState but instead use globals. Perhaps
this should be reorganized as well. For fd field, maybe even using a
global variable could be justified, since it is used for direct access
to kernel, not unlike a system call.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Anthony Liguori

On 01/20/2011 03:33 AM, Jan Kiszka wrote:

On 2011-01-19 20:32, Blue Swirl wrote:
   

On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com  wrote:
 

On 01/19/2011 07:15 AM, Markus Armbruster wrote:
   

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?

 

It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really a
tree, you're 100% correct.  This is part of why a factory interface that
just takes a parent bus is too simplistic.

I think we ought to introduce a -pci-device option that is specifically for
creating PCI devices that doesn't require a parent bus argument but provides
a way to specify stable addressing (for instancing, using a linear index).
   

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.
 

There are currently no VCPU-specific bits remaining in kvm_state. It may
be a good idea to introduce an arch-specific kvm_state and move related
bits over. It may also once be feasible to carve out memory management
related fields if we have proper abstractions for that, but I'm not
completely sure here.

Anyway, all these things are secondary. The primary topic here is how to
deal with kvm_state and its fields that have VM-global scope.
   


The debate is really:

1) should we remove all passing of kvm_state and just assume it's static

2) deal with a couple places in the code where we need to figure out how 
to get at kvm_state


I think we've only identified 1 real instance of (2) and it's resulted 
in some good discussions about how to model KVM devices vs. emulated 
devices.  Honestly, (1) just stinks.  I see absolutely no advantage to 
it at all.   In the very worst case scenario, the thing we need to do is 
just reference an extern variable in a few places.  That completely 
avoids all of the modelling discussions for now (while leaving for 
placeholder FIXMEs so the problem can be tackled later).


I don't understand the resistance here.

Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Anthony Liguori

On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:

  Hi,


For (2), you cannot use bus=X,addr=Y because it makes assumptions about
the PCI topology which may change in newer -M pc's.


Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I expect 
we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
chipset (which will have a different pci topology of course) and '-m 
pc' will pick the existing piix chipset (which will continue to look 
like it looks today).


But then what's the default machine type?  When I say -M pc, I really 
mean the default machine.


At some point, qemu-system-x86_64 -device virtio-net-pci,addr=2.0

Is not going to be a reliable way to invoke qemu because there's no way 
we can guarantee that slot 2 isn't occupied by a chipset device or some 
other default device.


Regards,

Anthony Liguori


cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Anthony Liguori

On 01/20/2011 04:33 AM, Daniel P. Berrange wrote:

On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
   

   Hi,

 

For (2), you cannot use bus=X,addr=Y because it makes assumptions about
the PCI topology which may change in newer -M pc's.
   

Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I
expect we'll see a new machine type 'q35', so '-m q35' will pick the
ich9 chipset (which will have a different pci topology of course)
and '-m pc' will pick the existing piix chipset (which will continue
to look like it looks today).
 

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.
   


I assume that libvirt today assumes that it can use a set of PCI slots 
in bus 0, correct?  Probably in the range 3-31?  Such assumptions are 
very likely to break.


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl
On Thu, Jan 20, 2011 at 7:37 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 On 01/20/2011 03:33 AM, Jan Kiszka wrote:

 On 2011-01-19 20:32, Blue Swirl wrote:


 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com  wrote:


 On 01/19/2011 07:15 AM, Markus Armbruster wrote:


 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?



 It's almost arbitrary, but I would say it's the direction that I/Os
 flow.

 But if the underlying observation is that the device tree is not really
 a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically
 for
 creating PCI devices that doesn't require a parent bus argument but
 provides
 a way to specify stable addressing (for instancing, using a linear
 index).


 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.


 There are currently no VCPU-specific bits remaining in kvm_state. It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over. It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.

 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.


 The debate is really:

 1) should we remove all passing of kvm_state and just assume it's static

 2) deal with a couple places in the code where we need to figure out how to
 get at kvm_state

 I think we've only identified 1 real instance of (2) and it's resulted in
 some good discussions about how to model KVM devices vs. emulated devices.
  Honestly, (1) just stinks.  I see absolutely no advantage to it at all.

Fully agree.

 In the very worst case scenario, the thing we need to do is just reference
 an extern variable in a few places.  That completely avoids all of the
 modelling discussions for now (while leaving for placeholder FIXMEs so the
 problem can be tackled later).

I think KVMState was designed to match KVM ioctl interface: all stuff
that is needed for talking to KVM or received from KVM are there. But
I think this shouldn't be a design driver.

If the only pieces of kvm_state that are needed by the devices are
irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
passing kvm_state to devices becomes very different. Each of these are
just single bits, affecting only a few devices. Perhaps they could be
device properties which the board level sets when KVM is used?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Jan Kiszka
On 2011-01-20 20:27, Blue Swirl wrote:
 On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-19 20:32, Blue Swirl wrote:
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically for
 creating PCI devices that doesn't require a parent bus argument but 
 provides
 a way to specify stable addressing (for instancing, using a linear index).

 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.

 There are currently no VCPU-specific bits remaining in kvm_state.
 
 I think fields vcpu_events, robust_singlestep, debugregs,
 kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
 same for all VCPUs but still they are sort of CPU properties. I'm not
 sure about fd field.

They are all properties of the currently loaded KVM subsystem in the
host kernel. They can't change while KVM's root fd is opened.
Replicating this static information into each and every VCPU state would
be crazy.

In fact, services like kvm_has_vcpu_events() already encode this: they
are static functions without any kvm_state reference that simply return
the content of those fields. Totally inconsistent to this, we force the
caller of kvm_check_extension to pass a handle. This is part of my
problem with the current situation and any halfhearted steps in this
context. Either we work towards eliminating static KVMState *kvm_state
in kvm-all.c or eliminating KVMState.

 
 It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over.
 
 This should probably contain only irqchip_in_kernel, pit_in_kernel and
 many_ioeventfds, maybe fd.

fd is that root file descriptor you need for a few KVM services that are
not bound to a specific VM - e.g. feature queries. It's not arch
specific. Arch specific are e.g. robust_singlestep or xsave feature states.

 
 It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.
 
 I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
 migration_log into the memory object.

vmfd is the VM-scope file descriptor you need at machine-level. The rest
logically belongs to a memory object, but I haven't looked at technical
details yet.

 
 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.
 
 If it is an opaque blob which contains various unrelated stuff, no
 clear place will be found.

We aren't moving fields yet (and we shouldn't). We first of all need to
establish the handle distribution (which apparently requires quite some
work in areas beyond KVM).

 
 By the way, we don't have a QEMUState but instead use globals. Perhaps
 this should be reorganized as well. For fd field, maybe even using a
 global variable could be justified, since it is used for direct access
 to kernel, not unlike a system call.

The fd field is part of this discussion. Making it global (but local to
the kvm subsystem) was an implicit part of my original suggestion.

I've no problem with something like a QEMUState, or better a
MachineState that would also include a few KVM-specific fields like the
vmfd - just like we already do for CPUstate (or should we better
introduce a KVM CPU bus... ;) ).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Jan Kiszka
On 2011-01-20 20:37, Anthony Liguori wrote:
 On 01/20/2011 03:33 AM, Jan Kiszka wrote:
 On 2011-01-19 20:32, Blue Swirl wrote:
   
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com  wrote:
 
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:
   
 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?

  
 It's almost arbitrary, but I would say it's the direction that I/Os
 flow.

 But if the underlying observation is that the device tree is not
 really a
 tree, you're 100% correct.  This is part of why a factory interface
 that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is
 specifically for
 creating PCI devices that doesn't require a parent bus argument but
 provides
 a way to specify stable addressing (for instancing, using a linear
 index).

 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.
  
 There are currently no VCPU-specific bits remaining in kvm_state. It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over. It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.

 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.

 
 The debate is really:
 
 1) should we remove all passing of kvm_state and just assume it's static
 
 2) deal with a couple places in the code where we need to figure out how
 to get at kvm_state
 
 I think we've only identified 1 real instance of (2) and it's resulted
 in some good discussions about how to model KVM devices vs. emulated
 devices.  Honestly, (1) just stinks.  I see absolutely no advantage to
 it at all.   In the very worst case scenario, the thing we need to do is
 just reference an extern variable in a few places.  That completely
 avoids all of the modelling discussions for now (while leaving for
 placeholder FIXMEs so the problem can be tackled later).

The PCI bus discussion is surely an interesting outcome, but now almost
completely off-topic to the original, way less critical issue (as we
were discussing internals).

 
 I don't understand the resistance here.

IMHO, most suggestions on the table are still over-designed (like a
KVMBus that only passes a kvm_state - or do you have more features for
it in mind?). The idea I love most so far is establishing a machine
state that also carries those few KVM bits which correspond to the KVM
extension of CPUState.

But in the end I want an implementable consensus that helps moving
forward with main topic: the overdue KVM upstream merge. I just do not
have a clear picture yet.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Blue Swirl
On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2011-01-20 20:27, Blue Swirl wrote:
 On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-19 20:32, Blue Swirl wrote:
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically 
 for
 creating PCI devices that doesn't require a parent bus argument but 
 provides
 a way to specify stable addressing (for instancing, using a linear index).

 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.

 There are currently no VCPU-specific bits remaining in kvm_state.

 I think fields vcpu_events, robust_singlestep, debugregs,
 kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
 same for all VCPUs but still they are sort of CPU properties. I'm not
 sure about fd field.

 They are all properties of the currently loaded KVM subsystem in the
 host kernel. They can't change while KVM's root fd is opened.
 Replicating this static information into each and every VCPU state would
 be crazy.

Then each CPUX86State could have a pointer to common structure.

 In fact, services like kvm_has_vcpu_events() already encode this: they
 are static functions without any kvm_state reference that simply return
 the content of those fields. Totally inconsistent to this, we force the
 caller of kvm_check_extension to pass a handle. This is part of my
 problem with the current situation and any halfhearted steps in this
 context. Either we work towards eliminating static KVMState *kvm_state
 in kvm-all.c or eliminating KVMState.

If the CPU related fields are accessible through CPUState, the handle
should be available.

 It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over.

 This should probably contain only irqchip_in_kernel, pit_in_kernel and
 many_ioeventfds, maybe fd.

 fd is that root file descriptor you need for a few KVM services that are
 not bound to a specific VM - e.g. feature queries. It's not arch
 specific. Arch specific are e.g. robust_singlestep or xsave feature states.

By arch you mean guest CPU architecture? They are not machine features.


 It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.

 I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
 migration_log into the memory object.

 vmfd is the VM-scope file descriptor you need at machine-level. The rest
 logically belongs to a memory object, but I haven't looked at technical
 details yet.


 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.

 If it is an opaque blob which contains various unrelated stuff, no
 clear place will be found.

 We aren't moving fields yet (and we shouldn't). We first of all need to
 establish the handle distribution (which apparently requires quite some
 work in areas beyond KVM).

But I think this is exactly  the problem. If the handle is for the
current KVMState, you'll indeed need it in various places and passing
it around will be cumbersome. By moving the fields around, the
information should  be available more naturally.

 By the way, we don't have a QEMUState but instead use globals. Perhaps
 this should be reorganized as well. For fd field, maybe even using a
 global variable could be justified, since it is used for direct access
 to kernel, not unlike a system call.

 The fd field is part of this discussion. Making it global (but local to
 the kvm subsystem) was an implicit part of my original suggestion.

 I've no problem with something like a QEMUState, or better a
 MachineState that would also include a few KVM-specific fields like the
 vmfd - just like we already do for CPUstate (or should we 

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Jan Kiszka
On 2011-01-20 21:02, Blue Swirl wrote:
 I think KVMState was designed to match KVM ioctl interface: all stuff
 that is needed for talking to KVM or received from KVM are there. But
 I think this shouldn't be a design driver.

Agreed. The nice cleanup would probably be the complete assimilation of
KVMState by something bigger of comparable scope.

If a machine was brought up with KVM support, every device that refers
to this machine (as it is supposed to become part of it) should be able
to use KVM services in order to accelerate its model.

 
 If the only pieces of kvm_state that are needed by the devices are
 irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
 passing kvm_state to devices becomes very different. Each of these are
 just single bits, affecting only a few devices. Perhaps they could be
 device properties which the board level sets when KVM is used?

Forget about the static capabilities for now. The core of kvm_state are
handles that enable you to use KVM services and maybe state fields that
have machine scope (unless we find more local homes like a memory
object). Those need to be accessible by the kvm layer when servicing
requests of components that are related to that very same machine.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Jan Kiszka
On 2011-01-20 22:40, Blue Swirl wrote:
 On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2011-01-20 20:27, Blue Swirl wrote:
 On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2011-01-19 20:32, Blue Swirl wrote:
 On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically 
 for
 creating PCI devices that doesn't require a parent bus argument but 
 provides
 a way to specify stable addressing (for instancing, using a linear 
 index).

 I think kvm_state should not be a property of any device or bus. It
 should be split to more logical pieces.

 Some parts of it could remain in CPUState, because they are associated
 with a VCPU.

 Also, for example irqfd could be considered to be similar object to
 char or block devices provided by QEMU to devices. Would it make sense
 to introduce new host types for passing parts of kvm_state to devices?

 I'd also make coalesced MMIO stuff part of memory object. We are not
 passing any state references when using cpu_physical_memory_rw(), but
 that could be changed.

 There are currently no VCPU-specific bits remaining in kvm_state.

 I think fields vcpu_events, robust_singlestep, debugregs,
 kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
 same for all VCPUs but still they are sort of CPU properties. I'm not
 sure about fd field.

 They are all properties of the currently loaded KVM subsystem in the
 host kernel. They can't change while KVM's root fd is opened.
 Replicating this static information into each and every VCPU state would
 be crazy.
 
 Then each CPUX86State could have a pointer to common structure.

That already exists.

 
 In fact, services like kvm_has_vcpu_events() already encode this: they
 are static functions without any kvm_state reference that simply return
 the content of those fields. Totally inconsistent to this, we force the
 caller of kvm_check_extension to pass a handle. This is part of my
 problem with the current situation and any halfhearted steps in this
 context. Either we work towards eliminating static KVMState *kvm_state
 in kvm-all.c or eliminating KVMState.
 
 If the CPU related fields are accessible through CPUState, the handle
 should be available.
 
 It may
 be a good idea to introduce an arch-specific kvm_state and move related
 bits over.

 This should probably contain only irqchip_in_kernel, pit_in_kernel and
 many_ioeventfds, maybe fd.

 fd is that root file descriptor you need for a few KVM services that are
 not bound to a specific VM - e.g. feature queries. It's not arch
 specific. Arch specific are e.g. robust_singlestep or xsave feature states.
 
 By arch you mean guest CPU architecture? They are not machine features.

No, they are practically static host features.

 

 It may also once be feasible to carve out memory management
 related fields if we have proper abstractions for that, but I'm not
 completely sure here.

 I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
 migration_log into the memory object.

 vmfd is the VM-scope file descriptor you need at machine-level. The rest
 logically belongs to a memory object, but I haven't looked at technical
 details yet.


 Anyway, all these things are secondary. The primary topic here is how to
 deal with kvm_state and its fields that have VM-global scope.

 If it is an opaque blob which contains various unrelated stuff, no
 clear place will be found.

 We aren't moving fields yet (and we shouldn't). We first of all need to
 establish the handle distribution (which apparently requires quite some
 work in areas beyond KVM).
 
 But I think this is exactly  the problem. If the handle is for the
 current KVMState, you'll indeed need it in various places and passing
 it around will be cumbersome. By moving the fields around, the
 information should  be available more naturally.

Yeah, if we had a MachineState or if we could agree on introducing it,
I'm with you again. Improving the currently cumbersome KVM API
interaction was the main motivation for my original patch.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Gerd Hoffmann

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't work 
with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=name.  This is true for *every* device, including all pci devices. 
 If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.  Once q35 finally arrives 
this will change of course.


cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori aligu...@linux.vnet.ibm.com writes:

 On 01/18/2011 10:56 AM, Jan Kiszka wrote:

 The device model topology is 100% a hidden architectural detail.
  
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.


 But we also don't do PCI passthrough so we really haven't even
 explored how that maps in qdev.  I don't know if qemu-kvm has
 attempted to qdev-ify it.

 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.

 We need to provide a compatible interface to the guest.  If you agree
 with my above statements, then you'll also agree that we can do this
 without keeping the device model topology stable.

 But we also need to provide a compatible interface to management tools.
 Exposing the device model topology as a compatible interface
 artificially limits us.  It's far better to provide higher level
 supported interfaces to give us the flexibility to change the device
 model as we need to.
  
 How do you want to change qdev to keep the guest and management tool
 view stable while branching off kvm sub-buses?

 The qdev device model is not a stable interface.  I think that's been
 clear from the very beginning.

   Please propose such
 extensions so that they can be discussed. IIUC, that would be second
 relation between qdev and qbus instances besides the physical topology.
 What further use cases (besides passing kvm_state around) do you have in
 mind?


 The -device interface is a stable interface.  Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device.  It's implied in the interface.

Now I'm confused.  Isn't -device FOO,bus=pci.0 specifying the PCI bus?

[...]
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Gerd Hoffmann kra...@redhat.com writes:

 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:

 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.

 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.

 It is qdev-ified.  It is a normal pci device from qdev's point of view.

 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?

 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.

 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices. If unspecified qdev uses the first bus it finds.

 As long as there is a single pci bus only there is simply no need to
 specify it, thats why nobody does that today.  Once q35 finally
 arrives this will change of course.

As far as I know, libvirt does it already.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori aligu...@linux.vnet.ibm.com writes:

 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.


 The bus topology reflects how I/O flows in and out of a device.  We do
 not model a perfect PC bus architecture and I don't think we ever
 intend to.  Instead, we model a functional architecture.

 I/O from an assigned device does not flow through the emulated PCI
 bus.  Therefore, it does not belong on the emulated PCI bus.

 Assigned devices need to interact with the emulated PCI bus, but they
 shouldn't be children of it.

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't 
work with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=name.  This is true for *every* device, including all pci 
devices.  If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.


Right.  In terms of specifying bus=, what are we promising re: 
compatibility?  Will there always be a pci.0?  If we add some PCI-to-PCI 
bridges in order to support more devices, is libvirt support to parse 
the hierarchy and figure out which bus to put the device on?


Regards,

Anthony Liguori


  Once q35 finally arrives this will change of course.

cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:11 AM, Markus Armbruster wrote:

Gerd Hoffmannkra...@redhat.com  writes:

   

On 01/18/11 18:09, Anthony Liguori wrote:
 

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
   
 

The device model topology is 100% a hidden architectural detail.
   

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
 

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
   

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

 

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
   

Wrong.  You can specify the bus you want attach the device to via
bus=name.  This is true for *every* device, including all pci
devices. If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need to
specify it, thats why nobody does that today.  Once q35 finally
arrives this will change of course.
 

As far as I know, libvirt does it already.
   


I think that's a bad idea from a forward compatibility perspective.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:15 AM, Markus Armbruster wrote:

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?
   


It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really 
a tree, you're 100% correct.  This is part of why a factory interface 
that just takes a parent bus is too simplistic.


I think we ought to introduce a -pci-device option that is specifically 
for creating PCI devices that doesn't require a parent bus argument but 
provides a way to specify stable addressing (for instancing, using a 
linear index).


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
 On 01/19/2011 07:11 AM, Markus Armbruster wrote:
 Gerd Hoffmannkra...@redhat.com  writes:
 
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices. If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need to
 specify it, thats why nobody does that today.  Once q35 finally
 arrives this will change of course.
 As far as I know, libvirt does it already.
 
 I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Jan Kiszka
On 2011-01-19 17:57, Anthony Liguori wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:
 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?

 
 It's almost arbitrary, but I would say it's the direction that I/Os flow.

We need both if we want KVM buses. They are useless for enumerating the
device on that bus the guest sees it on.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 

On 01/18/11 18:09, Anthony Liguori wrote:
   

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
   

The device model topology is 100% a hidden architectural detail.
 

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
   

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
 

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

   

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
 

Wrong.  You can specify the bus you want attach the device to via
bus=name.  This is true for *every* device, including all pci
devices.  If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need
to specify it, thats why nobody does that today.
   

Right.  In terms of specifying bus=, what are we promising re:
compatibility?  Will there always be a pci.0?  If we add some
PCI-to-PCI bridges in order to support more devices, is libvirt
support to parse the hierarchy and figure out which bus to put the
device on?
 

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic
   


Yeah, we can't promise that.  If you use -M pc, you aren't guaranteed a 
stable PCI bus topology even with -nodefconfig/-nodefaults.


Regards,

Anthony Liguori


Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:19 AM, Daniel P. Berrange wrote:


In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)
   


There are two distinct things here:

1) creating exactly the same virtual machine (like for migration) given 
a newer version of QEMU


2) creating a reasonably similar virtual machine given a newer version 
of QEMU


For (1), you cannot use -M pc.  You should use things like bus=X,addr=Y 
much better is for QEMU to dump a device file and to just reuse that 
instead of guessing what you need.


For (2), you cannot use bus=X,addr=Y because it makes assumptions about 
the PCI topology which may change in newer -M pc's.


I think libvirt needs to treat this two scenarios differently to support 
forwards compatibility.


Regards,

Anthony Liguori


Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:


The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   


Yeah, but replacing the main chipset will certainly change the PCI 
topology such that if you're specifying bus=X and addr=X and then also 
using -M pc, unless you're parsing the default topology to come up with 
the addressing, it will break in the future.


That's why I think something simpler like a linear index that QEMU maps 
to a static location in the topology is probably the best future proof 
interface.


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
 On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 
 The reason we specify 'bus' is that we wanted to be flexible wrt
 upgrades of libvirt, without needing restarts of QEMU instances
 it manages. That way we can introduce new functionality into
 libvirt that relies on it having previously set 'bus' on all
 active QEMUs.
 
 If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
 be adding the extra bridges. I'd expect that QEMU provided just
 the first bridge and then libvirt would specify how many more
 bridges to create at boot or hotplug them later. So it wouldn't
 ever need to parse topology.
 
 Yeah, but replacing the main chipset will certainly change the PCI
 topology such that if you're specifying bus=X and addr=X and then
 also using -M pc, unless you're parsing the default topology to come
 up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
 On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
 On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?
 The answer to your questions probably differ depending on
 whether '-nodefconfig' and '-nodefaults' are set on the
 command line.  If they are set, then I'd expect to only
 ever see one PCI bus with name pci.0 forever more, unless
 i explicitly ask for more. If they are not set, then you
 might expect to see multiple PCI buses by appear by magic
 
 Yeah, we can't promise that.  If you use -M pc, you aren't
 guaranteed a stable PCI bus topology even with
 -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   

Yeah, but replacing the main chipset will certainly change the PCI
topology such that if you're specifying bus=X and addr=X and then
also using -M pc, unless you're parsing the default topology to come
up with the addressing, it will break in the future.
 

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem
   


Right, but you expose bus addressing via the XML, no?  That means that 
if a user specifies something like '1.0', and you translate that to 
bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used 
for the integrated 3D VGA graphics adapter, the guest creation will fail.


If you expose topological configuration to the user, the guest will not 
continue working down the road unless you come up with a scheme where 
you map addresses to a different address range for newer pcs.


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Blue Swirl
On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 On 01/19/2011 07:15 AM, Markus Armbruster wrote:

 So they interact with KVM (need kvm_state), and they interact with the
 emulated PCI bus.  Could you elaborate on the fundamental difference
 between the two interactions that makes you choose the (hypothetical)
 KVM bus over the PCI bus as device parent?


 It's almost arbitrary, but I would say it's the direction that I/Os flow.

 But if the underlying observation is that the device tree is not really a
 tree, you're 100% correct.  This is part of why a factory interface that
 just takes a parent bus is too simplistic.

 I think we ought to introduce a -pci-device option that is specifically for
 creating PCI devices that doesn't require a parent bus argument but provides
 a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 17:37, Anthony Liguori wrote:
 On 01/18/2011 10:17 AM, Jan Kiszka wrote:
 On 2011-01-18 17:04, Anthony Liguori wrote:

 A KVM device should sit on a KVM specific bus that hangs off of
 sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.



  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.



 The bus topology reflects how I/O flows in and out of a device.  We do
 not model a perfect PC bus architecture and I don't think we ever intend
 to.  Instead, we model a functional architecture.

 I/O from an assigned device does not flow through the emulated PCI bus.
 Therefore, it does not belong on the emulated PCI bus.

 Assigned devices need to interact with the emulated PCI bus, but they
 shouldn't be children of it.

  
 You should be able to find assigned devices on some PCI bus, so you
 either have to hack up the existing bus to host devices that are, on the
 other side, not part of it or branch off a pci-kvm sub-bus, just like
 you would have to create a sysbus-kvm.

 Management tools should never transverse the device tree to find
 devices.  This is a recipe for disaster in the long term because the
 device tree will not remain stable.

 So yes, a management tool should be able to enumerate assigned devices
 as they would enumerate any other PCI device but that has almost nothing
 to do with what the tree layout is.
  
 I'm probably misunderstanding you, but if the bus topology as the guest
 sees it is not properly reflected in an object tree on the qemu side, we
 are creating hacks again.

 
 There is no such thing as the bus topology as the guest sees it.
 
 The guest just sees a bunch of devices.  The guest can only infer things 
 like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
 RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
 in a SuperI/O chip that's part of the southbridge is all invisible to 
 the guest.
 
 The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.

 
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.
 
 We need to provide a compatible interface to the guest.  If you agree 
 with my above statements, then you'll also agree that we can do this 
 without keeping the device model topology stable.
 
 But we also need to provide a compatible interface to management tools.  
 Exposing the device model topology as a compatible interface 
 artificially limits us.  It's far better to provide higher level 
 supported interfaces to give us the flexibility to change the device 
 model as we need to.

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses? Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?

 
   If they create a device on bus X, it
 must never end up on bus Y just because it happens to be KVM-assisted or
 has some other property.
 
 Nope.  This is exactly what should happen.
 
 90% of the devices in the device model are not created by management 
 tools.  They're part of a chipset.  The chipset has well defined 
 extension points and we provide management interfaces to create devices 
 on those extension points.  That is, interfaces to create PCI devices.
 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Alex Williamson
On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
 On 2011-01-18 16:48, Anthony Liguori wrote:
  On 01/18/2011 09:43 AM, Jan Kiszka wrote:
  On 2011-01-18 16:04, Anthony Liguori wrote:
 
  On 01/18/2011 08:28 AM, Jan Kiszka wrote:
   
  On 2011-01-12 11:31, Jan Kiszka wrote:
 
 
  Am 12.01.2011 11:22, Avi Kivity wrote:
 
   
  On 01/11/2011 03:54 PM, Anthony Liguori wrote:
 
 
  Right, we should introduce a KVMBus that KVM devices are created on.
  The devices can get at KVMState through the BusState.
 
   
  There is no kvm bus in a PC (I looked).  We're bending the device model
  here because a device is implemented in the kernel and not in
  userspace.  An implementation detail is magnified beyond all 
  proportions.
 
  An ioapic that is implemented by kvm lives in exactly the same place
  that the qemu ioapic lives in.  An assigned pci device lives on the PCI
  bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
  it elsewhere, not through creating imaginary buses that don't exist.
 
 
 
  Exactly.
 
  So we can either infect the whole device tree with kvm (or maybe a
  more generic accelerator structure that also deals with Xen) or we need
  to pull the reference inside the device's init function from some global
  service (kvm_get_state).
 
   
  Note that this topic is still waiting for good suggestions, specifically
  from those who believe in kvm_state references :). This is not only
  blocking kvmstate merge but will affect KVM irqchips as well.
 
  It boils down to how we reasonably pass a kvm_state reference from
  machine init code to a sysbus device. I'm probably biased, but I don't
  see any way that does not work against the idea of confining access to
  kvm_state or breaks device instantiation from the command line or a
  config file.
 
 
  A KVM device should sit on a KVM specific bus that hangs off of sysbus.
  It can get to kvm_state through that bus.
 
  That bus doesn't get instantiated through qdev so requiring a pointer
  argument should not be an issue.
 
   
  This design is in conflict with the requirement to attach KVM-assisted
  devices also to their home bus, e.g. an assigned PCI device to the PCI
  bus. We don't support multi-homed qdev devices.
 
  
  With vfio, would an assigned PCI device even need kvm_state?
 
 IIUC: Yes, for establishing the irqfd link.

We abstract this through the msi/msix layer though, so the vfio driver
doesn't directly know anything about kvm_state.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 18:02, Alex Williamson wrote:
 On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
 On 2011-01-18 16:48, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all 
 proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.


 With vfio, would an assigned PCI device even need kvm_state?

 IIUC: Yes, for establishing the irqfd link.
 
 We abstract this through the msi/msix layer though, so the vfio driver
 doesn't directly know anything about kvm_state.

Which version/tree are you referring to? It wasn't the case in the last
version I found on the list.

Does the msi layer use irqfd for every source in kvm mode then? Of
course, the key question will be how that layer will once obtain kvm_state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.
 

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
   


But we also don't do PCI passthrough so we really haven't even explored 
how that maps in qdev.  I don't know if qemu-kvm has attempted to 
qdev-ify it.



Management and analysis tools must be able to traverse the system buses
and find guest devices this way.
   

We need to provide a compatible interface to the guest.  If you agree
with my above statements, then you'll also agree that we can do this
without keeping the device model topology stable.

But we also need to provide a compatible interface to management tools.
Exposing the device model topology as a compatible interface
artificially limits us.  It's far better to provide higher level
supported interfaces to give us the flexibility to change the device
model as we need to.
 

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses?


The qdev device model is not a stable interface.  I think that's been 
clear from the very beginning.



  Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?
   


The -device interface is a stable interface.  Right now, you don't 
specify any type of identifier of the pci bus when you create a PCI 
device.  It's implied in the interface.


   
 

   If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property.
   

Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management
tools.  They're part of a chipset.  The chipset has well defined
extension points and we provide management interfaces to create devices
on those extension points.  That is, interfaces to create PCI devices.

 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.
   


It's automatically created as part of the CPUs or as part of the 
chipset.  How to enable/disable kvm assistance is a property of the CPU 
and/or chipset.


Regards,

Anthony Liguori



Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:

 The device model topology is 100% a hidden architectural detail.
  
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.

 
 But we also don't do PCI passthrough so we really haven't even explored 
 how that maps in qdev.  I don't know if qemu-kvm has attempted to 
 qdev-ify it.

It is. And even if it weren't or the current version in qemu-kvm was not
perfect, we need to consider those uses cases now as we are about to
define a generic model for kvm device integration. That's the point of
this discussion.

 
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.

 We need to provide a compatible interface to the guest.  If you agree
 with my above statements, then you'll also agree that we can do this
 without keeping the device model topology stable.

 But we also need to provide a compatible interface to management tools.
 Exposing the device model topology as a compatible interface
 artificially limits us.  It's far better to provide higher level
 supported interfaces to give us the flexibility to change the device
 model as we need to.
  
 How do you want to change qdev to keep the guest and management tool
 view stable while branching off kvm sub-buses?
 
 The qdev device model is not a stable interface.  I think that's been 
 clear from the very beginning.

Internals aren't stable, but they should only be changed for a good
reason, specifically when the change may impact the whole set of device
models.

 
   Please propose such
 extensions so that they can be discussed. IIUC, that would be second
 relation between qdev and qbus instances besides the physical topology.
 What further use cases (besides passing kvm_state around) do you have in
 mind?

 
 The -device interface is a stable interface.  Right now, you don't 
 specify any type of identifier of the pci bus when you create a PCI 
 device.  It's implied in the interface.

Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.

 

  
If they create a device on bus X, it
 must never end up on bus Y just because it happens to be KVM-assisted or
 has some other property.

 Nope.  This is exactly what should happen.

 90% of the devices in the device model are not created by management
 tools.  They're part of a chipset.  The chipset has well defined
 extension points and we provide management interfaces to create devices
 on those extension points.  That is, interfaces to create PCI devices.

  
 Creating kvm irqchips via the management tool would be one simple way
 (not the only one, though) to enable/disable kvm assistance for those
 devices.

 
 It's automatically created as part of the CPUs or as part of the 
 chipset.  How to enable/disable kvm assistance is a property of the CPU 
 and/or chipset.

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 11:20 AM, Jan Kiszka wrote:


Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.
   


Today we only have a very low level factory interface--device creation 
and deletion.


I think we should move to higher level bus factory interfaces.  An 
interface to create a PCI device and to delete PCI devices.  This is the 
only sane way to do hot plug.


This also makes supporting multiple busses a lot more reasonable since 
this factory interface could be a method of the controller.



It's automatically created as part of the CPUs or as part of the
chipset.  How to enable/disable kvm assistance is a property of the CPU
and/or chipset.
 

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.
   


Which is fine in the short term.  This is exactly why we don't want the 
device model to be an ABI.   It gives us the ability to make changes as 
they make sense instead of trying to be perfect from the start (which we 
never will be).


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Alex Williamson
On Tue, 2011-01-18 at 18:08 +0100, Jan Kiszka wrote:
 On 2011-01-18 18:02, Alex Williamson wrote:
  On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
  On 2011-01-18 16:48, Anthony Liguori wrote:
  On 01/18/2011 09:43 AM, Jan Kiszka wrote:
  On 2011-01-18 16:04, Anthony Liguori wrote:
 
  On 01/18/2011 08:28 AM, Jan Kiszka wrote:
   
  On 2011-01-12 11:31, Jan Kiszka wrote:
 
 
  Am 12.01.2011 11:22, Avi Kivity wrote:
 
   
  On 01/11/2011 03:54 PM, Anthony Liguori wrote:
 
 
  Right, we should introduce a KVMBus that KVM devices are created on.
  The devices can get at KVMState through the BusState.
 
   
  There is no kvm bus in a PC (I looked).  We're bending the device 
  model
  here because a device is implemented in the kernel and not in
  userspace.  An implementation detail is magnified beyond all 
  proportions.
 
  An ioapic that is implemented by kvm lives in exactly the same place
  that the qemu ioapic lives in.  An assigned pci device lives on the 
  PCI
  bus, not a KVMBus.  If we need a pointer to KVMState, then we must 
  find
  it elsewhere, not through creating imaginary buses that don't exist.
 
 
 
  Exactly.
 
  So we can either infect the whole device tree with kvm (or maybe a
  more generic accelerator structure that also deals with Xen) or we 
  need
  to pull the reference inside the device's init function from some 
  global
  service (kvm_get_state).
 
   
  Note that this topic is still waiting for good suggestions, 
  specifically
  from those who believe in kvm_state references :). This is not only
  blocking kvmstate merge but will affect KVM irqchips as well.
 
  It boils down to how we reasonably pass a kvm_state reference from
  machine init code to a sysbus device. I'm probably biased, but I don't
  see any way that does not work against the idea of confining access to
  kvm_state or breaks device instantiation from the command line or a
  config file.
 
 
  A KVM device should sit on a KVM specific bus that hangs off of sysbus.
  It can get to kvm_state through that bus.
 
  That bus doesn't get instantiated through qdev so requiring a pointer
  argument should not be an issue.
 
   
  This design is in conflict with the requirement to attach KVM-assisted
  devices also to their home bus, e.g. an assigned PCI device to the PCI
  bus. We don't support multi-homed qdev devices.
 
 
  With vfio, would an assigned PCI device even need kvm_state?
 
  IIUC: Yes, for establishing the irqfd link.
  
  We abstract this through the msi/msix layer though, so the vfio driver
  doesn't directly know anything about kvm_state.
 
 Which version/tree are you referring to? It wasn't the case in the last
 version I found on the list.
 
 Does the msi layer use irqfd for every source in kvm mode then? Of
 course, the key question will be how that layer will once obtain kvm_state.

Looking at [RFC PATCH v2] VFIO based device assignment sent on Nov
5th, I guess we do call kvm_set_irqfd.  Maybe I'm just wishing that the
msi layer abstracted it better.  I'd like to be able to pass in a
userspace interrupt handler function pointer and an event notifier fd
and let the interrupt layers worry about how it's hooked up.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 18:31, Anthony Liguori wrote:
 It's automatically created as part of the CPUs or as part of the
 chipset.  How to enable/disable kvm assistance is a property of the CPU
 and/or chipset.
  
 If we exclude creation via command line / config files, we could also
 pass the kvm_state directly from the machine or chipset setup code and
 save us at least the kvm system buses.

 
 Which is fine in the short term.

Unless we want to abuse the pointer property for this, and there was
some resistance, we would have to change the sysbus init function
signature. I don't want to propose this for a short-term workaround, we
need a clearer vision and roadmap to avoid multiple invasive changes to
the device model.

  This is exactly why we don't want the 
 device model to be an ABI.   It gives us the ability to make changes as 
 they make sense instead of trying to be perfect from the start (which we 
 never will be).

The device model will always consist of a stable part, the guest and
management visible topology. That beast needs to be modeled as well,
likely via some new bus objects. If that's the way to go, starting now
is probably the right time as we have an urgent use case, right?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-12 11:31, Jan Kiszka wrote:
 Am 12.01.2011 11:22, Avi Kivity wrote:
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on. 
 The devices can get at KVMState through the BusState.

 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.

 
 Exactly.
 
 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 08:28 AM, Jan Kiszka wrote:

On 2011-01-12 11:31, Jan Kiszka wrote:
   

Am 12.01.2011 11:22, Avi Kivity wrote:
 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:
   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.
 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.

   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).
 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.
   


A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
It can get to kvm_state through that bus.


That bus doesn't get instantiated through qdev so requiring a pointer 
argument should not be an issue.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 16:04, Anthony Liguori wrote:
 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 On 2011-01-12 11:31, Jan Kiszka wrote:

 Am 12.01.2011 11:22, Avi Kivity wrote:
  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.
  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.


 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).
  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.

 
 A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
 It can get to kvm_state through that bus.
 
 That bus doesn't get instantiated through qdev so requiring a pointer 
 argument should not be an issue.
 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 09:43 AM, Jan Kiszka wrote:

On 2011-01-18 16:04, Anthony Liguori wrote:
   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 

On 2011-01-12 11:31, Jan Kiszka wrote:

   

Am 12.01.2011 11:22, Avi Kivity wrote:

 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:

   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.

 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.


   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.

 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.
   


With vfio, would an assigned PCI device even need kvm_state?

Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 09:43 AM, Jan Kiszka wrote:

On 2011-01-18 16:04, Anthony Liguori wrote:
   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 

On 2011-01-12 11:31, Jan Kiszka wrote:

   

Am 12.01.2011 11:22, Avi Kivity wrote:

 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:

   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.

 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.


   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.

 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.
   


The bus topology reflects how I/O flows in and out of a device.  We do 
not model a perfect PC bus architecture and I don't think we ever intend 
to.  Instead, we model a functional architecture.


I/O from an assigned device does not flow through the emulated PCI bus.  
Therefore, it does not belong on the emulated PCI bus.


Assigned devices need to interact with the emulated PCI bus, but they 
shouldn't be children of it.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 16:48, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.

 
 With vfio, would an assigned PCI device even need kvm_state?

IIUC: Yes, for establishing the irqfd link.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 16:50, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.

 
 The bus topology reflects how I/O flows in and out of a device.  We do 
 not model a perfect PC bus architecture and I don't think we ever intend 
 to.  Instead, we model a functional architecture.
 
 I/O from an assigned device does not flow through the emulated PCI bus.  
 Therefore, it does not belong on the emulated PCI bus.
 
 Assigned devices need to interact with the emulated PCI bus, but they 
 shouldn't be children of it.

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm. I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 10:01 AM, Jan Kiszka wrote:

On 2011-01-18 16:50, Anthony Liguori wrote:
   

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 

On 2011-01-18 16:04, Anthony Liguori wrote:

   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:

 

On 2011-01-12 11:31, Jan Kiszka wrote:


   

Am 12.01.2011 11:22, Avi Kivity wrote:


 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:


   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.


 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.



   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).


 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.


   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.


 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

   

The bus topology reflects how I/O flows in and out of a device.  We do
not model a perfect PC bus architecture and I don't think we ever intend
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they
shouldn't be children of it.
 

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm.


Management tools should never transverse the device tree to find 
devices.  This is a recipe for disaster in the long term because the 
device tree will not remain stable.


So yes, a management tool should be able to enumerate assigned devices 
as they would enumerate any other PCI device but that has almost nothing 
to do with what the tree layout is.


Regards,

Anthony Liguori


  I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka
On 2011-01-18 17:04, Anthony Liguori wrote:
 A KVM device should sit on a KVM specific bus that hangs off of
 sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.


  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.


 The bus topology reflects how I/O flows in and out of a device.  We do
 not model a perfect PC bus architecture and I don't think we ever intend
 to.  Instead, we model a functional architecture.

 I/O from an assigned device does not flow through the emulated PCI bus.
 Therefore, it does not belong on the emulated PCI bus.

 Assigned devices need to interact with the emulated PCI bus, but they
 shouldn't be children of it.
  
 You should be able to find assigned devices on some PCI bus, so you
 either have to hack up the existing bus to host devices that are, on the
 other side, not part of it or branch off a pci-kvm sub-bus, just like
 you would have to create a sysbus-kvm.
 
 Management tools should never transverse the device tree to find
 devices.  This is a recipe for disaster in the long term because the
 device tree will not remain stable.
 
 So yes, a management tool should be able to enumerate assigned devices
 as they would enumerate any other PCI device but that has almost nothing
 to do with what the tree layout is.

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.

Management and analysis tools must be able to traverse the system buses
and find guest devices this way. If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property. On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori

On 01/18/2011 10:17 AM, Jan Kiszka wrote:

On 2011-01-18 17:04, Anthony Liguori wrote:
   

A KVM device should sit on a KVM specific bus that hangs off of
sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.



 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.


   

The bus topology reflects how I/O flows in and out of a device.  We do
not model a perfect PC bus architecture and I don't think we ever intend
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they
shouldn't be children of it.

 

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm.
   

Management tools should never transverse the device tree to find
devices.  This is a recipe for disaster in the long term because the
device tree will not remain stable.

So yes, a management tool should be able to enumerate assigned devices
as they would enumerate any other PCI device but that has almost nothing
to do with what the tree layout is.
 

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.
   


There is no such thing as the bus topology as the guest sees it.

The guest just sees a bunch of devices.  The guest can only infer things 
like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
in a SuperI/O chip that's part of the southbridge is all invisible to 
the guest.


The device model topology is 100% a hidden architectural detail.


Management and analysis tools must be able to traverse the system buses
and find guest devices this way.


We need to provide a compatible interface to the guest.  If you agree 
with my above statements, then you'll also agree that we can do this 
without keeping the device model topology stable.


But we also need to provide a compatible interface to management tools.  
Exposing the device model topology as a compatible interface 
artificially limits us.  It's far better to provide higher level 
supported interfaces to give us the flexibility to change the device 
model as we need to.



  If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property.


Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management 
tools.  They're part of a chipset.  The chipset has well defined 
extension points and we provide management interfaces to create devices 
on those extension points.  That is, interfaces to create PCI devices.


Regards,

Anthony Liguori


  On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-12 Thread Avi Kivity

On 01/11/2011 03:54 PM, Anthony Liguori wrote:


Right, we should introduce a KVMBus that KVM devices are created on.  
The devices can get at KVMState through the BusState.


There is no kvm bus in a PC (I looked).  We're bending the device model 
here because a device is implemented in the kernel and not in 
userspace.  An implementation detail is magnified beyond all proportions.


An ioapic that is implemented by kvm lives in exactly the same place 
that the qemu ioapic lives in.  An assigned pci device lives on the PCI 
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find 
it elsewhere, not through creating imaginary buses that don't exist.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-12 Thread Jan Kiszka
Am 12.01.2011 11:22, Avi Kivity wrote:
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on. 
 The devices can get at KVMState through the BusState.
 
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.
 
 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.
 

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

Jan

PS: I started refreshing my whole patch series with the two
controversial patches removed. Will send out later so that we can
proceed with merging the critical bits, specifically all the bug fixes.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-12 Thread Markus Armbruster
Avi Kivity a...@redhat.com writes:

 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

 There is no kvm bus in a PC (I looked).  We're bending the device
 model here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all
 proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.

Exactly.  And that place is a bus.

What if the device interfaces in bus-specific ways with its parent bus?
Then we can't simply replace the parent bus by a KVM bus.  We'd need
*two* parent buses, as Jan pointed out upthread.

 An assigned pci device lives on the
 PCI bus, not a KVMBus.  If we need a pointer to KVMState, then we must
 find it elsewhere, not through creating imaginary buses that don't
 exist.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-11 Thread Gerd Hoffmann

  Hi,


Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.


It is considered bad/hackish style as you can't create that kind of 
devices using the -device command line switch (or from a machine 
description config file some day in the future).  So we should not add 
more uses of this, especially not in patches which are supposed to 
cleanup things ;)


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-11 Thread Markus Armbruster
Jan Kiszka jan.kis...@web.de writes:

 Am 10.01.2011 22:06, Jan Kiszka wrote:
  kvmclock should be created with
 kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
 reference.   Taking a global reference to kvm_state in machine_init is
 not a bad thing, obviously the machine initialization function needs
 access to the kvm_state.
 
 This would also require changing sysbus interfaces for the sake of KVM's
 abstraction. If this is the only way forward, I could look into this.

 Actually, there is already a channel to pass pointers to qdev devices:
 the pointer property hack. I'm not sure we should contribute to its user
 base

We shouldn't.

  or take the chance for a cleanup, but we are not alone with this
 requirement. Point below remains valid, though.

 
 Still, I do not see any benefit for the affected code. You then either
 need to steal a kvm_state reference from the first cpu or introduce a
 marvelous interface like kvm_get_state() to make this work from outside
 of the KVM core.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-11 Thread Anthony Liguori

On 01/11/2011 03:31 AM, Markus Armbruster wrote:

Jan Kiszkajan.kis...@web.de  writes:

   

Am 10.01.2011 22:06, Jan Kiszka wrote:
 

  kvmclock should be created with
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
reference.   Taking a global reference to kvm_state in machine_init is
not a bad thing, obviously the machine initialization function needs
access to the kvm_state.
 

This would also require changing sysbus interfaces for the sake of KVM's
abstraction. If this is the only way forward, I could look into this.
   

Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base
 

We shouldn't.
   


Right, we should introduce a KVMBus that KVM devices are created on.  
The devices can get at KVMState through the BusState.


Regards,

Anthony Liguori


  or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.

 

Still, I do not see any benefit for the affected code. You then either
need to steal a kvm_state reference from the first cpu or introduce a
marvelous interface like kvm_get_state() to make this work from outside
of the KVM core.
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-11 Thread Jan Kiszka
Am 11.01.2011 09:53, Gerd Hoffmann wrote:
   Hi,
 
 Actually, there is already a channel to pass pointers to qdev devices:
 the pointer property hack. I'm not sure we should contribute to its user
 base or take the chance for a cleanup, but we are not alone with this
 requirement. Point below remains valid, though.
 
 It is considered bad/hackish style as you can't create that kind of
 devices using the -device command line switch (or from a machine
 description config file some day in the future).

That kind of instantiation wouldn't be possible for device models that
require someone actively passing kvm_state to them...

  So we should not add
 more uses of this, especially not in patches which are supposed to
 cleanup things ;)

You won't see me disagree.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Anthony Liguori

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:

From: Jan Kiszkajan.kis...@siemens.com

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
CC: Glauber Costaglom...@redhat.com
Signed-off-by: Marcelo Tosattimtosa...@redhat.com
---
  target-i386/kvm.c |   92 -
  1 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 69b8234..47cb22b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@
  #include hw/apic.h
  #include ioport.h
  #include kvm_x86.h
+#include hw/sysbus.h

  #ifdef CONFIG_KVM_PARA
  #includelinux/kvm_para.h
@@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t 
status,
  #endif
  }

+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+typedef struct KVMClockState {
+SysBusDevice busdev;
+uint64_t clock;
+bool clock_valid;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+int ret;
+
+if (s-clock_valid) {
+return;
+}
+ret = kvm_vm_ioctl(KVM_GET_CLOCK,data);
+if (ret  0) {
+fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
+data.clock = 0;
+}
+s-clock = data.clock;
+/*
+ * If the VM is stopped, declare the clock state valid to avoid re-reading
+ * it on next vmsave (which would return a different value). Will be reset
+ * when the VM is continued.
+ */
+s-clock_valid = !vm_running;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+
+data.clock = s-clock;
+data.flags = 0;
+return kvm_vm_ioctl(KVM_SET_CLOCK,data);
+}
+
+static void kvmclock_vm_state_change(void *opaque, int running, int reason)
+{
+KVMClockState *s = opaque;
+
+if (running) {
+s-clock_valid = false;
+}
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
+
+qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+.name = kvmclock,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.pre_save = kvmclock_pre_save,
+.post_load = kvmclock_post_load,
+.fields = (VMStateField []) {
+VMSTATE_UINT64(clock, KVMClockState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+.qdev.name = kvmclock,
+.qdev.size = sizeof(KVMClockState),
+.qdev.vmsd =kvmclock_vmsd,
+.qdev.no_user = 1,
+.init = kvmclock_init,
+};
+#endif /* CONFIG_KVM_PARA  KVM_CAP_ADJUST_CLOCK */
+
  int kvm_arch_init_vcpu(CPUState *env)
  {
  struct {
@@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
  env-cpuid_svm_features= kvm_x86_get_supported_cpuid(0x800A,
  0, R_EDX);

-
  cpuid_i = 0;

  #ifdef CONFIG_KVM_PARA
@@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
  }
  #endif

+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+if (cpu_is_bsp(env)
+(env-cpuid_kvm_features  (1ULL  KVM_FEATURE_CLOCKSOURCE))) {
+sysbus_create_simple(kvmclock, -1, NULL);
+}
+#endif
+
  return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,cpuid_data);
  }

@@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
  int ret;
  struct utsname utsname;

+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+sysbus_register_withprop(kvmclock_info);
+#endif
+
  ret = kvm_get_supported_msrs();
  if (ret  0) {
  return ret;
   


There are a couple things wrong with this patch.  It breaks 
compatibility because it does not allow kvmclock to be created or 
initiated in machines.  Older machines didn't expose kvmclock but now 
they do.  It also makes it impossible to pass parameters to kvmclock in 
the future because the device creation is hidden deep in other code 
paths.  Calling any qdev creation function in anything but pc.c (or the 
equivalent) should be a big red flag.


The solution is simple, introduce as kvm_has_clocksource().  Within the 
machine init, create the the kvm clock device after CPU creation wrapped 
in a if (kvm_has_clocksource()) call.  kvmclock should be created with 
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored 
reference.   Taking a global reference to kvm_state in machine_init is 
not a bad thing, obviously the machine initialization function needs 
access to the kvm_state.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Jan Kiszka
Am 10.01.2011 21:31, Anthony Liguori wrote:
 On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
 From: Jan Kiszkajan.kis...@siemens.com

 If kvmclock is used, which implies the kernel supports it, register a
 kvmclock device with the sysbus. Its main purpose is to save and restore
 the kernel state on migration, but this will also allow to visualize it
 one day.

 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 CC: Glauber Costaglom...@redhat.com
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com
 ---
   target-i386/kvm.c |   92
 -
   1 files changed, 91 insertions(+), 1 deletions(-)

 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 69b8234..47cb22b 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -29,6 +29,7 @@
   #include hw/apic.h
   #include ioport.h
   #include kvm_x86.h
 +#include hw/sysbus.h

   #ifdef CONFIG_KVM_PARA
   #includelinux/kvm_para.h
 @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
 uint64_t status,
   #endif
   }

 +#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
 +typedef struct KVMClockState {
 +SysBusDevice busdev;
 +uint64_t clock;
 +bool clock_valid;
 +} KVMClockState;
 +
 +static void kvmclock_pre_save(void *opaque)
 +{
 +KVMClockState *s = opaque;
 +struct kvm_clock_data data;
 +int ret;
 +
 +if (s-clock_valid) {
 +return;
 +}
 +ret = kvm_vm_ioctl(KVM_GET_CLOCK,data);
 +if (ret  0) {
 +fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
 +data.clock = 0;
 +}
 +s-clock = data.clock;
 +/*
 + * If the VM is stopped, declare the clock state valid to avoid
 re-reading
 + * it on next vmsave (which would return a different value). Will
 be reset
 + * when the VM is continued.
 + */
 +s-clock_valid = !vm_running;
 +}
 +
 +static int kvmclock_post_load(void *opaque, int version_id)
 +{
 +KVMClockState *s = opaque;
 +struct kvm_clock_data data;
 +
 +data.clock = s-clock;
 +data.flags = 0;
 +return kvm_vm_ioctl(KVM_SET_CLOCK,data);
 +}
 +
 +static void kvmclock_vm_state_change(void *opaque, int running, int
 reason)
 +{
 +KVMClockState *s = opaque;
 +
 +if (running) {
 +s-clock_valid = false;
 +}
 +}
 +
 +static int kvmclock_init(SysBusDevice *dev)
 +{
 +KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
 +
 +qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
 +return 0;
 +}
 +
 +static const VMStateDescription kvmclock_vmsd= {
 +.name = kvmclock,
 +.version_id = 1,
 +.minimum_version_id = 1,
 +.minimum_version_id_old = 1,
 +.pre_save = kvmclock_pre_save,
 +.post_load = kvmclock_post_load,
 +.fields = (VMStateField []) {
 +VMSTATE_UINT64(clock, KVMClockState),
 +VMSTATE_END_OF_LIST()
 +}
 +};
 +
 +static SysBusDeviceInfo kvmclock_info = {
 +.qdev.name = kvmclock,
 +.qdev.size = sizeof(KVMClockState),
 +.qdev.vmsd =kvmclock_vmsd,
 +.qdev.no_user = 1,
 +.init = kvmclock_init,
 +};
 +#endif /* CONFIG_KVM_PARA  KVM_CAP_ADJUST_CLOCK */
 +
   int kvm_arch_init_vcpu(CPUState *env)
   {
   struct {
 @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
   env-cpuid_svm_features= kvm_x86_get_supported_cpuid(0x800A,
   0, R_EDX);

 -
   cpuid_i = 0;

   #ifdef CONFIG_KVM_PARA
 @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
   }
   #endif

 +#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
 +if (cpu_is_bsp(env)
 +(env-cpuid_kvm_features  (1ULL  KVM_FEATURE_CLOCKSOURCE))) {
 +sysbus_create_simple(kvmclock, -1, NULL);
 +}
 +#endif
 +
   return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,cpuid_data);
   }

 @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
   int ret;
   struct utsname utsname;

 +#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
 +sysbus_register_withprop(kvmclock_info);
 +#endif
 +
   ret = kvm_get_supported_msrs();
   if (ret  0) {
   return ret;

 
 There are a couple things wrong with this patch.  It breaks
 compatibility because it does not allow kvmclock to be created or
 initiated in machines.  Older machines didn't expose kvmclock but now
 they do.  It also makes it impossible to pass parameters to kvmclock in
 the future because the device creation is hidden deep in other code
 paths.

Device parameters should get passed as properties. Would already work
today if we had any.

  Calling any qdev creation function in anything but pc.c (or the
 equivalent) should be a big red flag.
 
 The solution is simple, introduce as kvm_has_clocksource().  Within the
 machine init, create the the kvm clock device after CPU creation wrapped
 in a if (kvm_has_clocksource()) call.

No problem with moving sysbus_create_simple to machine initialization,
though.

  kvmclock should be created 

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Jan Kiszka
Am 10.01.2011 22:06, Jan Kiszka wrote:
  kvmclock should be created with
 kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
 reference.   Taking a global reference to kvm_state in machine_init is
 not a bad thing, obviously the machine initialization function needs
 access to the kvm_state.
 
 This would also require changing sysbus interfaces for the sake of KVM's
 abstraction. If this is the only way forward, I could look into this.

Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.

 
 Still, I do not see any benefit for the affected code. You then either
 need to steal a kvm_state reference from the first cpu or introduce a
 marvelous interface like kvm_get_state() to make this work from outside
 of the KVM core.
 

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Anthony Liguori

On 01/10/2011 04:21 PM, Jan Kiszka wrote:

Am 10.01.2011 22:06, Jan Kiszka wrote:
   

  kvmclock should be created with
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
reference.   Taking a global reference to kvm_state in machine_init is
not a bad thing, obviously the machine initialization function needs
access to the kvm_state.
   

This would also require changing sysbus interfaces for the sake of KVM's
abstraction. If this is the only way forward, I could look into this.
 

Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.
   


It probably makes sense to have a KVMBus and not pass it as a property 
but rather have it access it from the KvmBusState.


Regards,

Anthony Liguori

   

Still, I do not see any benefit for the affected code. You then either
need to steal a kvm_state reference from the first cpu or introduce a
marvelous interface like kvm_get_state() to make this work from outside
of the KVM core.

 

Jan
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Anthony Liguori

On 01/10/2011 03:06 PM, Jan Kiszka wrote:

Am 10.01.2011 21:31, Anthony Liguori wrote:
   

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
 

From: Jan Kiszkajan.kis...@siemens.com

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
CC: Glauber Costaglom...@redhat.com
Signed-off-by: Marcelo Tosattimtosa...@redhat.com
---
   target-i386/kvm.c |   92
-
   1 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 69b8234..47cb22b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@
   #include hw/apic.h
   #include ioport.h
   #include kvm_x86.h
+#include hw/sysbus.h

   #ifdef CONFIG_KVM_PARA
   #includelinux/kvm_para.h
@@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
uint64_t status,
   #endif
   }

+#if defined(CONFIG_KVM_PARA)   defined(KVM_CAP_ADJUST_CLOCK)
+typedef struct KVMClockState {
+SysBusDevice busdev;
+uint64_t clock;
+bool clock_valid;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+int ret;
+
+if (s-clock_valid) {
+return;
+}
+ret = kvm_vm_ioctl(KVM_GET_CLOCK,data);
+if (ret   0) {
+fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
+data.clock = 0;
+}
+s-clock = data.clock;
+/*
+ * If the VM is stopped, declare the clock state valid to avoid
re-reading
+ * it on next vmsave (which would return a different value). Will
be reset
+ * when the VM is continued.
+ */
+s-clock_valid = !vm_running;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+
+data.clock = s-clock;
+data.flags = 0;
+return kvm_vm_ioctl(KVM_SET_CLOCK,data);
+}
+
+static void kvmclock_vm_state_change(void *opaque, int running, int
reason)
+{
+KVMClockState *s = opaque;
+
+if (running) {
+s-clock_valid = false;
+}
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
+
+qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+.name = kvmclock,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.pre_save = kvmclock_pre_save,
+.post_load = kvmclock_post_load,
+.fields = (VMStateField []) {
+VMSTATE_UINT64(clock, KVMClockState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+.qdev.name = kvmclock,
+.qdev.size = sizeof(KVMClockState),
+.qdev.vmsd =kvmclock_vmsd,
+.qdev.no_user = 1,
+.init = kvmclock_init,
+};
+#endif /* CONFIG_KVM_PARA   KVM_CAP_ADJUST_CLOCK */
+
   int kvm_arch_init_vcpu(CPUState *env)
   {
   struct {
@@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
   env-cpuid_svm_features= kvm_x86_get_supported_cpuid(0x800A,
   0, R_EDX);

-
   cpuid_i = 0;

   #ifdef CONFIG_KVM_PARA
@@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
   }
   #endif

+#if defined(CONFIG_KVM_PARA)   defined(KVM_CAP_ADJUST_CLOCK)
+if (cpu_is_bsp(env)
+(env-cpuid_kvm_features   (1ULL   KVM_FEATURE_CLOCKSOURCE))) {
+sysbus_create_simple(kvmclock, -1, NULL);
+}
+#endif
+
   return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,cpuid_data);
   }

@@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
   int ret;
   struct utsname utsname;

+#if defined(CONFIG_KVM_PARA)   defined(KVM_CAP_ADJUST_CLOCK)
+sysbus_register_withprop(kvmclock_info);
+#endif
+
   ret = kvm_get_supported_msrs();
   if (ret   0) {
   return ret;

   

There are a couple things wrong with this patch.  It breaks
compatibility because it does not allow kvmclock to be created or
initiated in machines.  Older machines didn't expose kvmclock but now
they do.  It also makes it impossible to pass parameters to kvmclock in
the future because the device creation is hidden deep in other code
paths.
 

Device parameters should get passed as properties. Would already work
today if we had any.

   

  Calling any qdev creation function in anything but pc.c (or the
equivalent) should be a big red flag.

The solution is simple, introduce as kvm_has_clocksource().  Within the
machine init, create the the kvm clock device after CPU creation wrapped
in a if (kvm_has_clocksource()) call.
 

No problem with moving sysbus_create_simple to machine initialization,
though.

   

  kvmclock should be created with
kvm_state as a parameter and 

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Jan Kiszka
Am 11.01.2011 00:02, Anthony Liguori wrote:
 On 01/10/2011 04:21 PM, Jan Kiszka wrote:
 Am 10.01.2011 22:06, Jan Kiszka wrote:
   
   kvmclock should be created with
 kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
 reference.   Taking a global reference to kvm_state in machine_init is
 not a bad thing, obviously the machine initialization function needs
 access to the kvm_state.

 This would also require changing sysbus interfaces for the sake of KVM's
 abstraction. If this is the only way forward, I could look into this.
  
 Actually, there is already a channel to pass pointers to qdev devices:
 the pointer property hack. I'm not sure we should contribute to its user
 base or take the chance for a cleanup, but we are not alone with this
 requirement. Point below remains valid, though.

 
 It probably makes sense to have a KVMBus and not pass it as a property
 but rather have it access it from the KvmBusState.

KVM is orthogonal to the qtree. Some devices like vga (kvm_coalesce_*,
kvm_log_start/stop*) would actually have to be attached to multiple
buses in this model.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Jan Kiszka
Am 11.01.2011 00:04, Anthony Liguori wrote:
   kvmclock should be created with
 kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
 reference.   Taking a global reference to kvm_state in machine_init is
 not a bad thing, obviously the machine initialization function needs
 access to the kvm_state.
  
 This would also require changing sysbus interfaces for the sake of KVM's
 abstraction. If this is the only way forward, I could look into this.

 Still, I do not see any benefit for the affected code. You then either
 need to steal a kvm_state reference from the first cpu or introduce a
 marvelous interface like kvm_get_state() to make this work from outside
 of the KVM core.

 
 Or move kvm_init() to pc_init() and then pc_init() has the kvm_state
 reference.

Or pass the reference to the machine_init service to avoid duplicating
kvm_init logic for every KVM arch.

That alone would still be consistent. But as long as we do not pass a
kvm_state to each and every memory registration (for
kvm_setup_guest_memory), this all is like putting a fence around half of
your yard and only declaring it closed.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-10 Thread Paolo Bonzini

On 01/10/2011 11:21 PM, Jan Kiszka wrote:

Am 10.01.2011 22:06, Jan Kiszka wrote:

  kvmclock should be created with
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
reference.   Taking a global reference to kvm_state in machine_init is
not a bad thing, obviously the machine initialization function needs
access to the kvm_state.


This would also require changing sysbus interfaces for the sake of KVM's
abstraction. If this is the only way forward, I could look into this.


Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.


The pointer property uses, at least last time I checked, were all:

1) for a CPU (apic, etrax interrupt controller)

2) for a device (sparc IIRC)

3) useless (smbus_eeprom.c)

So adding a fourth kind is not really a good idea, I think.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-06 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
CC: Glauber Costa glom...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 target-i386/kvm.c |   92 -
 1 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 69b8234..47cb22b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@
 #include hw/apic.h
 #include ioport.h
 #include kvm_x86.h
+#include hw/sysbus.h
 
 #ifdef CONFIG_KVM_PARA
 #include linux/kvm_para.h
@@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t 
status,
 #endif
 }
 
+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+typedef struct KVMClockState {
+SysBusDevice busdev;
+uint64_t clock;
+bool clock_valid;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+int ret;
+
+if (s-clock_valid) {
+return;
+}
+ret = kvm_vm_ioctl(KVM_GET_CLOCK, data);
+if (ret  0) {
+fprintf(stderr, KVM_GET_CLOCK failed: %s\n, strerror(ret));
+data.clock = 0;
+}
+s-clock = data.clock;
+/*
+ * If the VM is stopped, declare the clock state valid to avoid re-reading
+ * it on next vmsave (which would return a different value). Will be reset
+ * when the VM is continued.
+ */
+s-clock_valid = !vm_running;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+KVMClockState *s = opaque;
+struct kvm_clock_data data;
+
+data.clock = s-clock;
+data.flags = 0;
+return kvm_vm_ioctl(KVM_SET_CLOCK, data);
+}
+
+static void kvmclock_vm_state_change(void *opaque, int running, int reason)
+{
+KVMClockState *s = opaque;
+
+if (running) {
+s-clock_valid = false;
+}
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
+
+qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+.name = kvmclock,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.pre_save = kvmclock_pre_save,
+.post_load = kvmclock_post_load,
+.fields = (VMStateField []) {
+VMSTATE_UINT64(clock, KVMClockState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+.qdev.name = kvmclock,
+.qdev.size = sizeof(KVMClockState),
+.qdev.vmsd = kvmclock_vmsd,
+.qdev.no_user = 1,
+.init = kvmclock_init,
+};
+#endif /* CONFIG_KVM_PARA  KVM_CAP_ADJUST_CLOCK */
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
 struct {
@@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
 env-cpuid_svm_features  = kvm_x86_get_supported_cpuid(0x800A,
 0, R_EDX);
 
-
 cpuid_i = 0;
 
 #ifdef CONFIG_KVM_PARA
@@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
 }
 #endif
 
+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+if (cpu_is_bsp(env) 
+(env-cpuid_kvm_features  (1ULL  KVM_FEATURE_CLOCKSOURCE))) {
+sysbus_create_simple(kvmclock, -1, NULL);
+}
+#endif
+
 return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data);
 }
 
@@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
 int ret;
 struct utsname utsname;
 
+#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
+sysbus_register_withprop(kvmclock_info);
+#endif
+
 ret = kvm_get_supported_msrs();
 if (ret  0) {
 return ret;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html