Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/10/2011 10:23 PM, Anthony Liguori wrote:

I don't see how ioapic, pit, or pic have a system scope.

They are not bound to any CPU like the APIC which you may have in mind.


And none of the above interact with KVM.


They're implemented by kvm.  What deeper interaction do you have in mind?



They may be replaced by KVM but if you look at the PIT, this is done 
by having two distinct devices.  The KVM specific device can (and 
should) be instantiated with kvm_state.


The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
kernel devices are separate devices and that should be reflected in 
the device tree.


I don't see why.  Those are just two different implementations for the 
same guest visible device.  It's like saying IDE should be seen 
differently if it's backed by qcow2 or qed.


The device tree is about the guest view of devices.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/10/2011 10:11 PM, Anthony Liguori wrote:

On 01/08/2011 02:47 AM, Jan Kiszka wrote:

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)


I often reply too quickly without explaining myself.  Let me use 
ioeventfd as an example to highlight why KVMState is a good thing.


In real life, PIO and MMIO are never directly communicated to the 
device from the processor.  Instead, they go through a series of other 
devices.  In the case of something like an ISA device, a PIO first 
goes to the chipset into the PCI complex, it will then go through a 
PCI-to-ISA bridge via subtractive decoding, and then forward over the 
ISA device where it will be interpreted by some device.


The path to the chipset may be shared among different processors but 
it may also be unique.  The APIC is the best example as there are 
historic APICs that hung directly off of the CPUs such that the same 
MMIO access across different CPUs did not go to the same device.  This 
is why the APIC emulation in QEMU is so weird because we don't model 
this behavior correctly.


This means that a PIO operation needs to flow from a CPUState to a 
DeviceState.  It can then flow through to another DeviceState until 
it's finally handled.


The first problem with ioeventfd is that it's a per-VM operation.  It 
should be per VCPU.


Just consider ioeventfd as something that hooks the system bus, not the 
processor-chipset link, and the problem goes away.  In practice, any 
per-cpu io port (for SMM or power management) would need synchronous 
handling; an eventfd isn't a suitable way to communicate it.




But even if this were the case, the path that a PIO operation takes 
should not be impacted by ioeventfd.  IOW, a device shouldn't be 
allocating an eventfd() and handing it to a magical KVM call.  
Instead, a device should register a callback for a particular port in 
the same way it always does.  *As an optimization*, we should have 
another interface that says that these values are only valid for this 
IO port.  That would let us create eventfds and register things behind 
the scenes.


The semantics are different.  The normal callbacks are synchronous; the 
vcpu is halted until the callback is serviced.  For most callbacks, this 
is critical, not just per-cpu things like vmport (example: cirrus back 
switching).


I agree it shouldn't be done by a kvm-specific kvm call, but instead by 
an API, but that API needs to be explicitly asynchronous.  When we 
further thread qemu, we'd also need to specify which thread is to 
execute the callback (and the implementation would add the eventfd to 
the thread's fd poll list).




That means we can handle TCG, older KVM kernels, and newer KVM kernels 
without any special support in the device model.  It also means that 
the device models never have to worry about KVMState because there's 
an entirely different piece of code that's consulting the set of 
special ports and then deciding how to handle it.  The result is 
better, more portable code that doesn't have KVM-isms.


Yes.



If passing state around seems to be ugly, it's probably because we're 
not abstracting things correctly.  Removing the state and making it 
implicit is the wrong solution. 


I agree with the general sentiment that utilizing the fact that a 
variable is global to make it implicit is bad from a software 
engineering point of view.  By restricting access to variables and 
functions, you can enforce modularity on the code.  Much like the 
private: specifier in C++ and other languages.


Fixing the abstraction is the right solution (or living with the 
ugliness until someone else is motivated to fix it properly).


And with that too, especially the parenthesized statement.  We have 
qemu-kvm that is overly pragmatic and trie[sd] not to avoid modifying 
qemu as much as possible.  We have the qemu.git kvm implementation that 
tries a perfectionist approach that failed because most of the users 
need the featured and tested pragmatic approach.  The two mix like oil 
and water.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 03:01 AM, Avi Kivity wrote:

On 01/10/2011 10:23 PM, Anthony Liguori wrote:

I don't see how ioapic, pit, or pic have a system scope.

They are not bound to any CPU like the APIC which you may have in mind.


And none of the above interact with KVM.


They're implemented by kvm.  What deeper interaction do you have in mind?


The emulated ioapic/pit/pic do not interact with KVM at all.

The KVM versions should be completely separate devices.





They may be replaced by KVM but if you look at the PIT, this is done 
by having two distinct devices.  The KVM specific device can (and 
should) be instantiated with kvm_state.


The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
kernel devices are separate devices and that should be reflected in 
the device tree.


I don't see why.  Those are just two different implementations for the 
same guest visible device.


Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the device 
tree.


  It's like saying IDE should be seen differently if it's backed by 
qcow2 or qed.


No, it's not at all.

Advantages of separating KVM devices:

1) it becomes very clear what functionality is handled in the kernel 
verses in userspace (you can actually look at the code and tell)


2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for -no-kvm-irqchip)


3) a user can pass parameters directly to the in-kernel version of the 
device that are different from the userspace version (like selecting 
different interrupt catch-up methods)


Regards,

Anthony Liguori


The device tree is about the guest view of devices.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Alexander Graf

On 11.01.2011, at 15:00, Anthony Liguori wrote:

 On 01/11/2011 03:01 AM, Avi Kivity wrote:
 On 01/10/2011 10:23 PM, Anthony Liguori wrote:
 I don't see how ioapic, pit, or pic have a system scope.
 They are not bound to any CPU like the APIC which you may have in mind.
 
 And none of the above interact with KVM.
 
 They're implemented by kvm.  What deeper interaction do you have in mind?
 
 The emulated ioapic/pit/pic do not interact with KVM at all.
 
 The KVM versions should be completely separate devices.
 
 
 
 They may be replaced by KVM but if you look at the PIT, this is done by 
 having two distinct devices.  The KVM specific device can (and should) be 
 instantiated with kvm_state.
 
 The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel 
 devices are separate devices and that should be reflected in the device 
 tree.
 
 I don't see why.  Those are just two different implementations for the same 
 guest visible device.
 
 Right, they should appear the same to the guest but the fact that they're two 
 different implementations should be reflected in the device tree.
 
  It's like saying IDE should be seen differently if it's backed by qcow2 or 
 qed.
 
 No, it's not at all.
 
 Advantages of separating KVM devices:
 
 1) it becomes very clear what functionality is handled in the kernel verses 
 in userspace (you can actually look at the code and tell)
 
 2) a user can explicitly create either the emulated version of the device or 
 the in-kernel version of the device (no need for -no-kvm-irqchip)
 
 3) a user can pass parameters directly to the in-kernel version of the device 
 that are different from the userspace version (like selecting different 
 interrupt catch-up methods)

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs

I'm not saying this is unsolvable, but it's certainly something that bothers me 
:). Some sort of meta-device for KVM implemented devices and emulated devices 
would be nice. That device would then be the one state gets saved/restored from.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 08:06 AM, Alexander Graf wrote:

On 11.01.2011, at 15:00, Anthony Liguori wrote:

   

On 01/11/2011 03:01 AM, Avi Kivity wrote:
 

On 01/10/2011 10:23 PM, Anthony Liguori wrote:
   

I don't see how ioapic, pit, or pic have a system scope.
 

They are not bound to any CPU like the APIC which you may have in mind.
   

And none of the above interact with KVM.
 

They're implemented by kvm.  What deeper interaction do you have in mind?
   

The emulated ioapic/pit/pic do not interact with KVM at all.

The KVM versions should be completely separate devices.

 
   

They may be replaced by KVM but if you look at the PIT, this is done by having 
two distinct devices.  The KVM specific device can (and should) be instantiated 
with kvm_state.

The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel 
devices are separate devices and that should be reflected in the device tree.
 

I don't see why.  Those are just two different implementations for the same 
guest visible device.
   

Right, they should appear the same to the guest but the fact that they're two 
different implementations should be reflected in the device tree.

 

  It's like saying IDE should be seen differently if it's backed by qcow2 or 
qed.
   

No, it's not at all.

Advantages of separating KVM devices:

1) it becomes very clear what functionality is handled in the kernel verses in 
userspace (you can actually look at the code and tell)

2) a user can explicitly create either the emulated version of the device or 
the in-kernel version of the device (no need for -no-kvm-irqchip)

3) a user can pass parameters directly to the in-kernel version of the device 
that are different from the userspace version (like selecting different 
interrupt catch-up methods)
 

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs
   


This doesn't work today and it's never worked.  KVM exposes things that 
TCG cannot emulate (like pvclock).


Even as two devices, nothing prevents it from working.  Both devices 
just have to support each other's savevm format.  If they use the same 
code, it makes it very easy.  Take a look at how the KVM PIT is 
implemented for an example of this.


Regards,

Anthony Liguori


I'm not saying this is unsolvable, but it's certainly something that bothers me 
:). Some sort of meta-device for KVM implemented devices and emulated devices 
would be nice. That device would then be the one state gets saved/restored from.


Alex

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 04:00 PM, Anthony Liguori wrote:

On 01/11/2011 03:01 AM, Avi Kivity wrote:

On 01/10/2011 10:23 PM, Anthony Liguori wrote:

I don't see how ioapic, pit, or pic have a system scope.
They are not bound to any CPU like the APIC which you may have in 
mind.


And none of the above interact with KVM.


They're implemented by kvm.  What deeper interaction do you have in 
mind?


The emulated ioapic/pit/pic do not interact with KVM at all.


How can they not interact with kvm if they're implemented by kvm?

I really don't follow here.



The KVM versions should be completely separate devices.



Why?

I don't see why.  Those are just two different implementations for 
the same guest visible device.


Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the 
device tree.


Why?

To move beyond single-word questions, what is the purpose of the device 
tree?  In my mind, it reflects the virtual hardware.  What's important 
is that we have a PIC, virtio network adapter, and IDE disk.  Not that 
they're backed by kvm, vhost-net, and qcow2.




  It's like saying IDE should be seen differently if it's backed by 
qcow2 or qed.


No, it's not at all.

Advantages of separating KVM devices:

1) it becomes very clear what functionality is handled in the kernel 
verses in userspace (you can actually look at the code and tell)


How something is implemented is not important, certainly not important 
enough to expose to the user as an monitor or live migration ABI.




2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for 
-no-kvm-irqchip)


-device ioapic,model=kernel vs. -device kvm-ioapic?

Is it really important to do that? 110% of the time we want the kernel 
irqchips.  The remaining -10% are only used for testing.




3) a user can pass parameters directly to the in-kernel version of the 
device that are different from the userspace version (like selecting 
different interrupt catch-up methods)


-device pit,model=qemu,catchup=slew

error: catchup=slew not supported in this model

I'm not overly concerned about the implementation part.  Though I think 
it's better to have a single implementation with kvm acting as an 
accelerator, having it the other way is no big deal.  What I am worried 
about is exposing it as a monitor and migration ABI.  IMO the only 
important thing is the spec that the device implements, not what piece 
of code implements it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 04:09 PM, Anthony Liguori wrote:

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs


This doesn't work today and it's never worked.  KVM exposes things 
that TCG cannot emulate (like pvclock).


If you run kvm without pvclock, or implement pvclock in qemu, it works 
fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried it 
myself).


If we decide to have a kernel hpet implementation, for example, it would 
be good to be able to live migrate from a version without kernel hpet, 
to a version with kernel hpet, and have the kernel hpet enabled.




Even as two devices, nothing prevents it from working.  Both devices 
just have to support each other's savevm format.  If they use the same 
code, it makes it very easy.  Take a look at how the KVM PIT is 
implemented for an example of this.


They need to use the same device id then.  And if they share code, that 
indicates that they need to be the same device even more.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Alexander Graf

On 11.01.2011, at 15:09, Anthony Liguori wrote:

 On 01/11/2011 08:06 AM, Alexander Graf wrote:
 On 11.01.2011, at 15:00, Anthony Liguori wrote:
 
   
 On 01/11/2011 03:01 AM, Avi Kivity wrote:
 
 On 01/10/2011 10:23 PM, Anthony Liguori wrote:
   
 I don't see how ioapic, pit, or pic have a system scope.
 
 They are not bound to any CPU like the APIC which you may have in mind.
   
 And none of the above interact with KVM.
 
 They're implemented by kvm.  What deeper interaction do you have in mind?
   
 The emulated ioapic/pit/pic do not interact with KVM at all.
 
 The KVM versions should be completely separate devices.
 
 
   
 They may be replaced by KVM but if you look at the PIT, this is done by 
 having two distinct devices.  The KVM specific device can (and should) be 
 instantiated with kvm_state.
 
 The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel 
 devices are separate devices and that should be reflected in the device 
 tree.
 
 I don't see why.  Those are just two different implementations for the 
 same guest visible device.
   
 Right, they should appear the same to the guest but the fact that they're 
 two different implementations should be reflected in the device tree.
 
 
  It's like saying IDE should be seen differently if it's backed by qcow2 
 or qed.
   
 No, it's not at all.
 
 Advantages of separating KVM devices:
 
 1) it becomes very clear what functionality is handled in the kernel verses 
 in userspace (you can actually look at the code and tell)
 
 2) a user can explicitly create either the emulated version of the device 
 or the in-kernel version of the device (no need for -no-kvm-irqchip)
 
 3) a user can pass parameters directly to the in-kernel version of the 
 device that are different from the userspace version (like selecting 
 different interrupt catch-up methods)
 
 Disadvantages:
 
 1) you lose migration / savevm between KVM and non-KVM VMs
   
 
 This doesn't work today and it's never worked.  KVM exposes things that TCG 
 cannot emulate (like pvclock).

Those cases simply shouldn't exist and hurt us (or at least me). I had exactly 
the pvclock issue with xenner. Xenner can't do proper timekeeping in emulation 
mode. So implementing an emulated pvclock implementation is (pretty low) on my 
todo list.

 Even as two devices, nothing prevents it from working.  Both devices just 
 have to support each other's savevm format.  If they use the same code, it 
 makes it very easy.  Take a look at how the KVM PIT is implemented for an 
 example of this.

If that's all it takes, fine. It makes it pretty hard to enforce, but I guess 
we can get away with that :).

Making devices separate basically hurts abstraction. I don't see any use case 
where we should have a KVM device without emulation equivalent. For the CPU we 
also think of KVM as an accelerator instead of a separate device, no? :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 08:18 AM, Avi Kivity wrote:

On 01/11/2011 04:00 PM, Anthony Liguori wrote:

On 01/11/2011 03:01 AM, Avi Kivity wrote:

On 01/10/2011 10:23 PM, Anthony Liguori wrote:

I don't see how ioapic, pit, or pic have a system scope.
They are not bound to any CPU like the APIC which you may have in 
mind.


And none of the above interact with KVM.


They're implemented by kvm.  What deeper interaction do you have in 
mind?


The emulated ioapic/pit/pic do not interact with KVM at all.


How can they not interact with kvm if they're implemented by kvm?

I really don't follow here.


emulated ioapic/pit/pic == versions implemented in QEMU.  That's what 
I'm trying to say.  When not using the KVM versions of the devices, 
there are no interactions with KVM.




The KVM versions should be completely separate devices.



Why?


Because the KVM versions are replacements.

I don't see why.  Those are just two different implementations for 
the same guest visible device.


Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the 
device tree.


Why?

To move beyond single-word questions, what is the purpose of the 
device tree?  In my mind, it reflects the virtual hardware.  What's 
important is that we have a PIC, virtio network adapter, and IDE 
disk.  Not that they're backed by kvm, vhost-net, and qcow2.


Let me give a very concrete example to illustrate my point.

One thing I have on my TODO is to implement catch-up support for the 
emulated devices.  I want to implement three modes of catch-up support: 
drop, fast, and gradual.  Gradual is the best policy IMHO but fast is 
necessary on older kernels without highres timers.  Drop is necessary to 
maintain compatibility with what we have today.


The kernel PIT only implements one mode and even if the other two were 
added, even the newest version of QEMU needs to deal with the fact that 
there's old kernels out there with PIT's that only do fast.


So how does this get exposed to management tools?  Do you check for 
drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
anything but drift-mode=fast is specified?


We need to have the following mechanisms:

1) the ability to select an in-kernel PIT vs. a userspace PIT

2) an independent mechanism to configure the userspace PIT

3) an independent mechanism to configure the in-kernel PIT.

The best way to do this is to make the in-kernel PIT a separate device.  
Then we get all of this for free.




2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for 
-no-kvm-irqchip)


-device ioapic,model=kernel vs. -device kvm-ioapic?

Is it really important to do that? 110% of the time we want the kernel 
irqchips.  The remaining -10% are only used for testing.


If model=kernel makes the support options different, then you end up 
introduce another layer of option validation.  By using the later form, 
you get to leverage the option validation of qdev plus it makes it much 
clearer to users what options are supported in what model because now 
the documentation is explicit about it.




3) a user can pass parameters directly to the in-kernel version of 
the device that are different from the userspace version (like 
selecting different interrupt catch-up methods)


-device pit,model=qemu,catchup=slew

error: catchup=slew not supported in this model

I'm not overly concerned about the implementation part.  Though I 
think it's better to have a single implementation with kvm acting as 
an accelerator, having it the other way is no big deal.  What I am 
worried about is exposing it as a monitor and migration ABI.  IMO the 
only important thing is the spec that the device implements, not what 
piece of code implements it.


Just as we do in the PIT, there's nothing wrong with making the device's 
migration compatible.  I'm not entirely sure what your concerns about 
the monitor are but there's simply no way to hide the fact that a device 
is implemented in KVM at the monitor level.  But really, is this 
something that management tools want?  I doubt it.  I think they want to 
have ultimate control over what gets created with us providing a 
recommended set of defaults.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 08:22 AM, Avi Kivity wrote:

On 01/11/2011 04:09 PM, Anthony Liguori wrote:

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs


This doesn't work today and it's never worked.  KVM exposes things 
that TCG cannot emulate (like pvclock).


If you run kvm without pvclock, or implement pvclock in qemu, it works 
fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried 
it myself).


If we decide to have a kernel hpet implementation, for example, it 
would be good to be able to live migrate from a version without kernel 
hpet, to a version with kernel hpet, and have the kernel hpet enabled.




Even as two devices, nothing prevents it from working.  Both devices 
just have to support each other's savevm format.  If they use the 
same code, it makes it very easy.  Take a look at how the KVM PIT is 
implemented for an example of this.


They need to use the same device id then.  And if they share code, 
that indicates that they need to be the same device even more.


No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  
But that doesn't mean that we treat them as one device.


And BTW, there are guest visible differences between the KVM 
IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
migration works today is because usually delivers all interrupts 
quickly.  But it actually does maintain state in the work queue that 
isn't saved.  If PIT tried to implement gradual catchup, there would be 
no way not to expose that state to userspace.




Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 04:28 PM, Anthony Liguori wrote:

On 01/11/2011 08:18 AM, Avi Kivity wrote:

On 01/11/2011 04:00 PM, Anthony Liguori wrote:

On 01/11/2011 03:01 AM, Avi Kivity wrote:

On 01/10/2011 10:23 PM, Anthony Liguori wrote:

I don't see how ioapic, pit, or pic have a system scope.
They are not bound to any CPU like the APIC which you may have in 
mind.


And none of the above interact with KVM.


They're implemented by kvm.  What deeper interaction do you have in 
mind?


The emulated ioapic/pit/pic do not interact with KVM at all.


How can they not interact with kvm if they're implemented by kvm?

I really don't follow here.


emulated ioapic/pit/pic == versions implemented in QEMU.  That's 
what I'm trying to say.  When not using the KVM versions of the 
devices, there are no interactions with KVM.


Okay.  Isn't that the same for the cpu?  Yet we use the same CPUState 
and are live-migration compatible (as long as cpuids match).






The KVM versions should be completely separate devices.



Why?


Because the KVM versions are replacements.


Only the implementation.  The guest doesn't see the replacement.  They 
have exactly the same state.




I don't see why.  Those are just two different implementations for 
the same guest visible device.


Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the 
device tree.


Why?

To move beyond single-word questions, what is the purpose of the 
device tree?  In my mind, it reflects the virtual hardware.  What's 
important is that we have a PIC, virtio network adapter, and IDE 
disk.  Not that they're backed by kvm, vhost-net, and qcow2.


Let me give a very concrete example to illustrate my point.

One thing I have on my TODO is to implement catch-up support for the 
emulated devices.  I want to implement three modes of catch-up 
support: drop, fast, and gradual.  Gradual is the best policy IMHO but 
fast is necessary on older kernels without highres timers.  Drop is 
necessary to maintain compatibility with what we have today.


The kernel PIT only implements one mode and even if the other two were 
added, even the newest version of QEMU needs to deal with the fact 
that there's old kernels out there with PIT's that only do fast.


So how does this get exposed to management tools?  Do you check for 
drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
anything but drift-mode=fast is specified?


We need to have the following mechanisms:

1) the ability to select an in-kernel PIT vs. a userspace PIT

2) an independent mechanism to configure the userspace PIT

3) an independent mechanism to configure the in-kernel PIT.

The best way to do this is to make the in-kernel PIT a separate 
device.  Then we get all of this for free.


And it buys us live migration and ABI issues for the same price.

Really, can't we do

class i8254 {
...
virtual void set_catchup_policy(std::string policy) = 0;
...
}

to deal with the differences?





2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for 
-no-kvm-irqchip)


-device ioapic,model=kernel vs. -device kvm-ioapic?

Is it really important to do that? 110% of the time we want the 
kernel irqchips.  The remaining -10% are only used for testing.


If model=kernel makes the support options different, then you end up 
introduce another layer of option validation.  By using the later 
form, you get to leverage the option validation of qdev plus it makes 
it much clearer to users what options are supported in what model 
because now the documentation is explicit about it.


Option validation = internals.  ABI = ABI.  We can deal with the former 
in any number of ways, but exposing it to the ABI is forever.






3) a user can pass parameters directly to the in-kernel version of 
the device that are different from the userspace version (like 
selecting different interrupt catch-up methods)


-device pit,model=qemu,catchup=slew

error: catchup=slew not supported in this model

I'm not overly concerned about the implementation part.  Though I 
think it's better to have a single implementation with kvm acting as 
an accelerator, having it the other way is no big deal.  What I am 
worried about is exposing it as a monitor and migration ABI.  IMO the 
only important thing is the spec that the device implements, not what 
piece of code implements it.


Just as we do in the PIT, there's nothing wrong with making the 
device's migration compatible. 


Then the two devices have the same migration section id?  That's my 
biggest worry.  Not really worried about PIT and PIC (no one uses the 
user PIT now), but more about future devices moving into the kernel, if 
we have to do that.


I'm not entirely sure what your concerns about the monitor are but 
there's simply no way to hide the fact that a device is implemented in 
KVM at the monitor 

Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 04:36 PM, Anthony Liguori wrote:
They need to use the same device id then.  And if they share code, 
that indicates that they need to be the same device even more.



No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
code.  But that doesn't mean that we treat them as one device.


Cirrus and VGA really are separate devices.  They share code because on 
evolved from the other, and is backwards compatible with the other.  
i8254 and i8254-kvm did not evolve from each other, both are 
implementations of the i8254 spec, and both are 100% compatible with 
each other (modulu bugs).




And BTW, there are guest visible differences between the KVM 
IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
migration works today is because usually delivers all interrupts 
quickly.  But it actually does maintain state in the work queue that 
isn't saved.  If PIT tried to implement gradual catchup, there would 
be no way not to expose that state to userspace.


Why not?  Whatever state the kernel keeps, we expose to userspace and 
allow sending it over the wire.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 08:56 AM, Avi Kivity wrote:

On 01/11/2011 04:36 PM, Anthony Liguori wrote:
They need to use the same device id then.  And if they share code, 
that indicates that they need to be the same device even more.



No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
code.  But that doesn't mean that we treat them as one device.


Cirrus and VGA really are separate devices.  They share code because 
on evolved from the other, and is backwards compatible with the 
other.  i8254 and i8254-kvm did not evolve from each other,


Actually, they did, but that's besides the point.

both are implementations of the i8254 spec, and both are 100% 
compatible with each other (modulu bugs).




And BTW, there are guest visible differences between the KVM 
IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
migration works today is because usually delivers all interrupts 
quickly.  But it actually does maintain state in the work queue that 
isn't saved.  If PIT tried to implement gradual catchup, there would 
be no way not to expose that state to userspace.


Why not?  Whatever state the kernel keeps, we expose to userspace and 
allow sending it over the wire.


What exactly is the scenario you're concerned about?

Migration between userspace HPET and in-kernel HPET?

One thing I've been considering is essentially migration filters.  It 
would be a set of rules that essentially were hpet-kvm.* = hpet.* 
which would allow migration from hpet to hpet-kvm given a translation of 
state.  I think this sort of higher level ruleset would make it easier 
to support migration between versions of the device model.


Of course, that only gives you a forward path.  It doesn't give you a 
backwards path.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Alexander Graf

On 11.01.2011, at 16:12, Anthony Liguori wrote:

 On 01/11/2011 08:56 AM, Avi Kivity wrote:
 On 01/11/2011 04:36 PM, Anthony Liguori wrote:
 They need to use the same device id then.  And if they share code, that 
 indicates that they need to be the same device even more.
 
 
 No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  But 
 that doesn't mean that we treat them as one device.
 
 Cirrus and VGA really are separate devices.  They share code because on 
 evolved from the other, and is backwards compatible with the other.  i8254 
 and i8254-kvm did not evolve from each other,
 
 Actually, they did, but that's besides the point.
 
 both are implementations of the i8254 spec, and both are 100% compatible 
 with each other (modulu bugs).
 
 
 And BTW, there are guest visible differences between the KVM IOAPIC/PIC/PIT 
 than the QEMU versions.  The only reason PIT live migration works today is 
 because usually delivers all interrupts quickly.  But it actually does 
 maintain state in the work queue that isn't saved.  If PIT tried to 
 implement gradual catchup, there would be no way not to expose that state 
 to userspace.
 
 Why not?  Whatever state the kernel keeps, we expose to userspace and allow 
 sending it over the wire.
 
 What exactly is the scenario you're concerned about?
 
 Migration between userspace HPET and in-kernel HPET?
 
 One thing I've been considering is essentially migration filters.  It would 
 be a set of rules that essentially were hpet-kvm.* = hpet.* which would 
 allow migration from hpet to hpet-kvm given a translation of state.  I think 
 this sort of higher level ruleset would make it easier to support migration 
 between versions of the device model.
 
 Of course, that only gives you a forward path.  It doesn't give you a 
 backwards path.

Why not? Just include the version in the rule set and define a backwards rule 
if it's easy to do. If not, migration isn't possible.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 05:12 PM, Anthony Liguori wrote:
No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
code.  But that doesn't mean that we treat them as one device.


Cirrus and VGA really are separate devices.  They share code because 
on evolved from the other, and is backwards compatible with the 
other.  i8254 and i8254-kvm did not evolve from each other,



Actually, they did, but that's besides the point.


The code did, the devices did not.

Why not?  Whatever state the kernel keeps, we expose to userspace and 
allow sending it over the wire.


What exactly is the scenario you're concerned about?

Migration between userspace HPET and in-kernel HPET?


Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
failing for kernel hpet.




One thing I've been considering is essentially migration filters.  It 
would be a set of rules that essentially were hpet-kvm.* = hpet.* 
which would allow migration from hpet to hpet-kvm given a translation 
of state.  I think this sort of higher level ruleset would make it 
easier to support migration between versions of the device model.


Of course, that only gives you a forward path.  It doesn't give you a 
backwards path.




It would be easier to have them use the same device id in the first place.

If it looks like an i8254, quacks like an i8254, and live migrates like 
an i8254, it's probably an i8254.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori

On 01/11/2011 09:37 AM, Avi Kivity wrote:
Why not?  Whatever state the kernel keeps, we expose to userspace 
and allow sending it over the wire.


What exactly is the scenario you're concerned about?

Migration between userspace HPET and in-kernel HPET?


Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
failing for kernel hpet.


That's pretty easy to address.



One thing I've been considering is essentially migration filters.  It 
would be a set of rules that essentially were hpet-kvm.* = hpet.* 
which would allow migration from hpet to hpet-kvm given a translation 
of state.  I think this sort of higher level ruleset would make it 
easier to support migration between versions of the device model.


Of course, that only gives you a forward path.  It doesn't give you a 
backwards path.




It would be easier to have them use the same device id in the first 
place.


If it looks like an i8254, quacks like an i8254, and live migrates 
like an i8254, it's probably an i8254.


And that's fine.  I'm not suggesting you call it i8253.  But it's two 
separate implementations.  We should make that visible, not try to hide 
it.  It's an important detail.


Imagine getting a sosreport that includes a dump of the device tree.  
You really want to see something in there that tells you it's an 
in-kernel PIT and not the userspace one.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 05:55 PM, Anthony Liguori wrote:




One thing I've been considering is essentially migration filters.  
It would be a set of rules that essentially were hpet-kvm.* = 
hpet.* which would allow migration from hpet to hpet-kvm given a 
translation of state.  I think this sort of higher level ruleset 
would make it easier to support migration between versions of the 
device model.


Of course, that only gives you a forward path.  It doesn't give you 
a backwards path.




It would be easier to have them use the same device id in the first 
place.


If it looks like an i8254, quacks like an i8254, and live migrates 
like an i8254, it's probably an i8254.


And that's fine.  I'm not suggesting you call it i8253.  But it's two 
separate implementations.  We should make that visible, not try to 
hide it.  It's an important detail.




Visible, yes, but not in live migration, or in 'info i8254', or 
similar.  We can live migrate between qcow2 and qed (using block 
migration), we should be able to do the same for the two i8254 
implementations.


I'm not happy about separate implementations, but that's a minor 
details.  We can change it 2n+1 times without anybody noticing.  Not so 
about ABI stuff.


Imagine getting a sosreport that includes a dump of the device tree.  
You really want to see something in there that tells you it's an 
in-kernel PIT and not the userspace one.


Sure.  Not the device tree though.  The command line would give all the 
information?


Or 'info i8254' can say something about the implementation.  I don't 
want to have the user say 'info i8254-kvm'.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Anthony Liguori


Visible, yes, but not in live migration, or in 'info i8254', or 
similar.  We can live migrate between qcow2 and qed (using block 
migration), we should be able to do the same for the two i8254 
implementations.


I'm not happy about separate implementations, but that's a minor 
details.  We can change it 2n+1 times without anybody noticing.  Not 
so about ABI stuff.


Imagine getting a sosreport that includes a dump of the device tree.  
You really want to see something in there that tells you it's an 
in-kernel PIT and not the userspace one.


Sure.  Not the device tree though.  The command line would give all 
the information?


Then it's a one off option.  We really want as much info as possible 
stored in the device tree.




Or 'info i8254' can say something about the implementation.  I don't 
want to have the user say 'info i8254-kvm'.


info doesn't take a qdev device so yes, it can show whatever we want it 
to show.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-11 Thread Avi Kivity

On 01/11/2011 06:26 PM, Anthony Liguori wrote:


Visible, yes, but not in live migration, or in 'info i8254', or 
similar.  We can live migrate between qcow2 and qed (using block 
migration), we should be able to do the same for the two i8254 
implementations.


I'm not happy about separate implementations, but that's a minor 
details.  We can change it 2n+1 times without anybody noticing.  Not 
so about ABI stuff.


Imagine getting a sosreport that includes a dump of the device 
tree.  You really want to see something in there that tells you it's 
an in-kernel PIT and not the userspace one.


Sure.  Not the device tree though.  The command line would give all 
the information?


Then it's a one off option.  We really want as much info as possible 
stored in the device tree.




Or 'info i8254' can say something about the implementation.  I don't 
want to have the user say 'info i8254-kvm'.


info doesn't take a qdev device so yes, it can show whatever we want 
it to show.




It may be a qdev read-only attribute (and thus not migrated?) if we have 
to have it in qdev for some reason.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-10 Thread Anthony Liguori

On 01/08/2011 02:47 AM, Jan Kiszka wrote:

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)
   


I often reply too quickly without explaining myself.  Let me use 
ioeventfd as an example to highlight why KVMState is a good thing.


In real life, PIO and MMIO are never directly communicated to the device 
from the processor.  Instead, they go through a series of other 
devices.  In the case of something like an ISA device, a PIO first goes 
to the chipset into the PCI complex, it will then go through a 
PCI-to-ISA bridge via subtractive decoding, and then forward over the 
ISA device where it will be interpreted by some device.


The path to the chipset may be shared among different processors but it 
may also be unique.  The APIC is the best example as there are historic 
APICs that hung directly off of the CPUs such that the same MMIO access 
across different CPUs did not go to the same device.  This is why the 
APIC emulation in QEMU is so weird because we don't model this behavior 
correctly.


This means that a PIO operation needs to flow from a CPUState to a 
DeviceState.  It can then flow through to another DeviceState until it's 
finally handled.


The first problem with ioeventfd is that it's a per-VM operation.  It 
should be per VCPU.


But even if this were the case, the path that a PIO operation takes 
should not be impacted by ioeventfd.  IOW, a device shouldn't be 
allocating an eventfd() and handing it to a magical KVM call.  Instead, 
a device should register a callback for a particular port in the same 
way it always does.  *As an optimization*, we should have another 
interface that says that these values are only valid for this IO port.  
That would let us create eventfds and register things behind the scenes.


That means we can handle TCG, older KVM kernels, and newer KVM kernels 
without any special support in the device model.  It also means that the 
device models never have to worry about KVMState because there's an 
entirely different piece of code that's consulting the set of special 
ports and then deciding how to handle it.  The result is better, more 
portable code that doesn't have KVM-isms.


If passing state around seems to be ugly, it's probably because we're 
not abstracting things correctly.  Removing the state and making it 
implicit is the wrong solution.  Fixing the abstraction is the right 
solution (or living with the ugliness until someone else is motivated to 
fix it properly).


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-10 Thread Jan Kiszka
Am 10.01.2011 20:59, Anthony Liguori wrote:
 On 01/08/2011 02:47 AM, Jan Kiszka wrote:
 Am 08.01.2011 00:27, Anthony Liguori wrote:
   
 On 01/07/2011 03:03 AM, Jan Kiszka wrote:
 
 Am 06.01.2011 20:24, Anthony Liguori wrote:

   
 On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:

 
 From: Jan Kiszkajan.kis...@siemens.com

 QEMU supports only one VM, so there is only one kvm_state per
 process,
 and we gain nothing passing a reference to it around. Eliminate any
 need
 to refer to it outside of kvm-all.c.

 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 CC: Alexander Grafag...@suse.de
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com



 I think this is a big mistake.

  
 Obviously, I don't share your concerns. :)


   
 Having to manage kvm_state keeps the abstraction lines well defined.

  
 How does it help?


   
 Otherwise, it's far too easy for portions of code to call into KVM
 functions that really shouldn't.

  
 I can't imagine we gain anything from requiring kvm_check_extension
 callers to hold a kvm_state capability. Yes, it's now much easier to
 call kvm_[vm_]ioctl, but that's the key point of this change:

 So far we primarily complicated the internal interface between generic
 and arch-dependent kvm parts by requiring kvm_state joggling. But
 external users already find interfaces without this restriction
 (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
 complicated to _cleanly_ pass kvm_state references to all users that
 need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
 irqchips.


 I think you're basically making my point for me.

 ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
 a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
  
 OK, but I don't want to argue about the ioeventfd API. So let's put this
 case aside. :)

   
 kvm_state is available as part of CPU state so it's quite easy to get at
 if these interfaces just took a CPUState argument (and they should).
  
 My point is definitely NOT about cpu-bound devices. That case is clear
 and is not touched at all by this patch.

 My point is about devices that have clear system scope like kvmclock,
 ioapic, pit, pic,
 
 I don't see how ioapic, pit, or pic have a system scope.

They are not bound to any CPU like the APIC which you may have in mind.

 
 I don't know enough about kvmclock.

It's just the same.

 
   whatever-the-future-will-bring. And about KVM services
 that have global scope like capability checks and other feature
 explorations or VM configurations done by the KVM arch code. You still
 didn't explain what we gain in these concrete scenarios by handing the
 technically redundant abstraction kvm_state around, especially _inside_
 the KVM core.

 
 If you have to pass around a KVMState pointer, you establish an explicit
 relationship and communication between subsystems.  Any place where the
 global KVMState is used is a red flag that something is wrong.

It is and will be _only_ used inside kvm-all.c. Again: What is the
benefit of restricting access to kvm_check_extension this way?

 
 I don't see what the advantage to making all of the KVMState global and
 implicit.  It seems like a big step backwards to me.  Can you give a
 very concrete example of where you think it results in easier to
 understand code as I don't see how making relationships implicit ever
 makes code easier to understand?

The best example does not yet exist (fortunately): Just look at patch 28
and then try to pass some kvm_state reference to the kvmclock device. Is
this handle worth changing the sysbus API?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-10 Thread Jan Kiszka
Am 10.01.2011 21:11, Anthony Liguori wrote:
 On 01/08/2011 02:47 AM, Jan Kiszka wrote:
 OK, but I don't want to argue about the ioeventfd API. So let's put this
 case aside. :)

 
 I often reply too quickly without explaining myself.  Let me use
 ioeventfd as an example to highlight why KVMState is a good thing.
 
 In real life, PIO and MMIO are never directly communicated to the device
 from the processor.  Instead, they go through a series of other
 devices.  In the case of something like an ISA device, a PIO first goes
 to the chipset into the PCI complex, it will then go through a
 PCI-to-ISA bridge via subtractive decoding, and then forward over the
 ISA device where it will be interpreted by some device.
 
 The path to the chipset may be shared among different processors but it
 may also be unique.  The APIC is the best example as there are historic
 APICs that hung directly off of the CPUs such that the same MMIO access
 across different CPUs did not go to the same device.  This is why the
 APIC emulation in QEMU is so weird because we don't model this behavior
 correctly.
 
 This means that a PIO operation needs to flow from a CPUState to a
 DeviceState.  It can then flow through to another DeviceState until it's
 finally handled.
 
 The first problem with ioeventfd is that it's a per-VM operation.  It
 should be per VCPU.
 
 But even if this were the case, the path that a PIO operation takes
 should not be impacted by ioeventfd.  IOW, a device shouldn't be
 allocating an eventfd() and handing it to a magical KVM call.  Instead,
 a device should register a callback for a particular port in the same
 way it always does.  *As an optimization*, we should have another
 interface that says that these values are only valid for this IO port. 
 That would let us create eventfds and register things behind the scenes.
 
 That means we can handle TCG, older KVM kernels, and newer KVM kernels
 without any special support in the device model.  It also means that the
 device models never have to worry about KVMState because there's an
 entirely different piece of code that's consulting the set of special
 ports and then deciding how to handle it.  The result is better, more
 portable code that doesn't have KVM-isms.
 
 If passing state around seems to be ugly, it's probably because we're
 not abstracting things correctly.  Removing the state and making it
 implicit is the wrong solution.  Fixing the abstraction is the right
 solution (or living with the ugliness until someone else is motivated to
 fix it properly).

Look at my other reply, it should answer this. ioeventfd is the wrong
example IMHO as one may argue about its relation to VCPUS.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-10 Thread Anthony Liguori

On 01/10/2011 02:12 PM, Jan Kiszka wrote:

Am 10.01.2011 20:59, Anthony Liguori wrote:
   

On 01/08/2011 02:47 AM, Jan Kiszka wrote:
 

Am 08.01.2011 00:27, Anthony Liguori wrote:

   

On 01/07/2011 03:03 AM, Jan Kiszka wrote:

 

Am 06.01.2011 20:24, Anthony Liguori wrote:


   

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:


 

From: Jan Kiszkajan.kis...@siemens.com

QEMU supports only one VM, so there is only one kvm_state per
process,
and we gain nothing passing a reference to it around. Eliminate any
need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
CC: Alexander Grafag...@suse.de
Signed-off-by: Marcelo Tosattimtosa...@redhat.com



   

I think this is a big mistake.


 

Obviously, I don't share your concerns. :)



   

Having to manage kvm_state keeps the abstraction lines well defined.


 

How does it help?



   

Otherwise, it's far too easy for portions of code to call into KVM
functions that really shouldn't.


 

I can't imagine we gain anything from requiring kvm_check_extension
callers to hold a kvm_state capability. Yes, it's now much easier to
call kvm_[vm_]ioctl, but that's the key point of this change:

So far we primarily complicated the internal interface between generic
and arch-dependent kvm parts by requiring kvm_state joggling. But
external users already find interfaces without this restriction
(kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
complicated to _cleanly_ pass kvm_state references to all users that
need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
irqchips.


   

I think you're basically making my point for me.

ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

 

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)


   

kvm_state is available as part of CPU state so it's quite easy to get at
if these interfaces just took a CPUState argument (and they should).

 

My point is definitely NOT about cpu-bound devices. That case is clear
and is not touched at all by this patch.

My point is about devices that have clear system scope like kvmclock,
ioapic, pit, pic,
   

I don't see how ioapic, pit, or pic have a system scope.
 

They are not bound to any CPU like the APIC which you may have in mind.
   


And none of the above interact with KVM.

They may be replaced by KVM but if you look at the PIT, this is done by 
having two distinct devices.  The KVM specific device can (and should) 
be instantiated with kvm_state.


The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel 
devices are separate devices and that should be reflected in the device 
tree.



I don't know enough about kvmclock.
 

It's just the same.

   
 

   whatever-the-future-will-bring. And about KVM services
that have global scope like capability checks and other feature
explorations or VM configurations done by the KVM arch code. You still
didn't explain what we gain in these concrete scenarios by handing the
technically redundant abstraction kvm_state around, especially _inside_
the KVM core.

   

If you have to pass around a KVMState pointer, you establish an explicit
relationship and communication between subsystems.  Any place where the
global KVMState is used is a red flag that something is wrong.
 

It is and will be _only_ used inside kvm-all.c. Again: What is the
benefit of restricting access to kvm_check_extension this way?
   


The more places that need to deal with KVM compatibility code, the worse 
we will be because it's more opportunities to get it wrong.



I don't see what the advantage to making all of the KVMState global and
implicit.  It seems like a big step backwards to me.  Can you give a
very concrete example of where you think it results in easier to
understand code as I don't see how making relationships implicit ever
makes code easier to understand?
 

The best example does not yet exist (fortunately): Just look at patch 28
and then try to pass some kvm_state reference to the kvmclock device. Is
this handle worth changing the sysbus API?
   


Let me look at that patch and reply there.

Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-10 Thread Jan Kiszka
Am 10.01.2011 21:23, Anthony Liguori wrote:
 On 01/10/2011 02:12 PM, Jan Kiszka wrote:
 Am 10.01.2011 20:59, Anthony Liguori wrote:
   
 On 01/08/2011 02:47 AM, Jan Kiszka wrote:
 
 Am 08.01.2011 00:27, Anthony Liguori wrote:

   
 On 01/07/2011 03:03 AM, Jan Kiszka wrote:

 
 Am 06.01.2011 20:24, Anthony Liguori wrote:


   
 On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:


 
 From: Jan Kiszkajan.kis...@siemens.com

 QEMU supports only one VM, so there is only one kvm_state per
 process,
 and we gain nothing passing a reference to it around. Eliminate any
 need
 to refer to it outside of kvm-all.c.

 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 CC: Alexander Grafag...@suse.de
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com




 I think this is a big mistake.


  
 Obviously, I don't share your concerns. :)



   
 Having to manage kvm_state keeps the abstraction lines well defined.


  
 How does it help?



   
 Otherwise, it's far too easy for portions of code to call into KVM
 functions that really shouldn't.


  
 I can't imagine we gain anything from requiring kvm_check_extension
 callers to hold a kvm_state capability. Yes, it's now much
 easier to
 call kvm_[vm_]ioctl, but that's the key point of this change:

 So far we primarily complicated the internal interface between
 generic
 and arch-dependent kvm parts by requiring kvm_state joggling. But
 external users already find interfaces without this restriction
 (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
 complicated to _cleanly_ pass kvm_state references to all users that
 need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
 irqchips.



 I think you're basically making my point for me.

 ioeventfd is a broken interface.  It shouldn't be a VM ioctl but
 rather
 a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

  
 OK, but I don't want to argue about the ioeventfd API. So let's put
 this
 case aside. :)


   
 kvm_state is available as part of CPU state so it's quite easy to
 get at
 if these interfaces just took a CPUState argument (and they should).

  
 My point is definitely NOT about cpu-bound devices. That case is clear
 and is not touched at all by this patch.

 My point is about devices that have clear system scope like kvmclock,
 ioapic, pit, pic,

 I don't see how ioapic, pit, or pic have a system scope.
  
 They are not bound to any CPU like the APIC which you may have in mind.

 
 And none of the above interact with KVM.
 
 They may be replaced by KVM but if you look at the PIT, this is done by
 having two distinct devices.  The KVM specific device can (and should)
 be instantiated with kvm_state.
 
 The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel
 devices are separate devices and that should be reflected in the device
 tree.

If separate device or hack to existing one - both need to sync their
user space state with the kernel when QEMU asks them to. That's how they
have to interact with KVM all the time. Same for kvmclock if you want to
look at a really trivial example.

 
 I don't know enough about kvmclock.
  
 It's just the same.

   
 
whatever-the-future-will-bring. And about KVM services
 that have global scope like capability checks and other feature
 explorations or VM configurations done by the KVM arch code. You still
 didn't explain what we gain in these concrete scenarios by handing the
 technically redundant abstraction kvm_state around, especially _inside_
 the KVM core.


 If you have to pass around a KVMState pointer, you establish an explicit
 relationship and communication between subsystems.  Any place where the
 global KVMState is used is a red flag that something is wrong.
  
 It is and will be _only_ used inside kvm-all.c. Again: What is the
 benefit of restricting access to kvm_check_extension this way?

 
 The more places that need to deal with KVM compatibility code, the worse
 we will be because it's more opportunities to get it wrong.

That code belongs where the related logic is. IMHO, it would be a
needless abstraction to push in-kernel access services and workaround
definitions in the KVM core instead of the KVM device model code -
provided there is only one user.

But this discussion is a bit abstract right now as we do not yet have
anything more complex than kvmclock on the table for QEMU.

 
 I don't see what the advantage to making all of the KVMState global and
 implicit.  It seems like a big step backwards to me.  Can you give a
 very concrete example of where you think it results in easier to
 understand code as I don't see how making relationships implicit ever
 makes code easier to understand?
  
 The best example does not yet exist (fortunately): Just look at patch 28
 and then try to pass some kvm_state reference to the kvmclock 

Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-08 Thread Jan Kiszka
Am 08.01.2011 00:27, Anthony Liguori wrote:
 On 01/07/2011 03:03 AM, Jan Kiszka wrote:
 Am 06.01.2011 20:24, Anthony Liguori wrote:
   
 On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
 
 From: Jan Kiszkajan.kis...@siemens.com

 QEMU supports only one VM, so there is only one kvm_state per process,
 and we gain nothing passing a reference to it around. Eliminate any
 need
 to refer to it outside of kvm-all.c.

 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 CC: Alexander Grafag...@suse.de
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com


 I think this is a big mistake.
  
 Obviously, I don't share your concerns. :)

   
 Having to manage kvm_state keeps the abstraction lines well defined.
  
 How does it help?

   
 Otherwise, it's far too easy for portions of code to call into KVM
 functions that really shouldn't.
  
 I can't imagine we gain anything from requiring kvm_check_extension
 callers to hold a kvm_state capability. Yes, it's now much easier to
 call kvm_[vm_]ioctl, but that's the key point of this change:

 So far we primarily complicated the internal interface between generic
 and arch-dependent kvm parts by requiring kvm_state joggling. But
 external users already find interfaces without this restriction
 (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
 complicated to _cleanly_ pass kvm_state references to all users that
 need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
 irqchips.

 
 I think you're basically making my point for me.
 
 ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
 a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)

 
 kvm_state is available as part of CPU state so it's quite easy to get at
 if these interfaces just took a CPUState argument (and they should).

My point is definitely NOT about cpu-bound devices. That case is clear
and is not touched at all by this patch.

My point is about devices that have clear system scope like kvmclock,
ioapic, pit, pic, whatever-the-future-will-bring. And about KVM services
that have global scope like capability checks and other feature
explorations or VM configurations done by the KVM arch code. You still
didn't explain what we gain in these concrete scenarios by handing the
technically redundant abstraction kvm_state around, especially _inside_
the KVM core.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-07 Thread Jan Kiszka
Am 06.01.2011 20:24, Anthony Liguori wrote:
 On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
 From: Jan Kiszkajan.kis...@siemens.com

 QEMU supports only one VM, so there is only one kvm_state per process,
 and we gain nothing passing a reference to it around. Eliminate any need
 to refer to it outside of kvm-all.c.

 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 CC: Alexander Grafag...@suse.de
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com

 
 I think this is a big mistake.

Obviously, I don't share your concerns. :)

 
 Having to manage kvm_state keeps the abstraction lines well defined. 

How does it help?

 Otherwise, it's far too easy for portions of code to call into KVM
 functions that really shouldn't.

I can't imagine we gain anything from requiring kvm_check_extension
callers to hold a kvm_state capability. Yes, it's now much easier to
call kvm_[vm_]ioctl, but that's the key point of this change:

So far we primarily complicated the internal interface between generic
and arch-dependent kvm parts by requiring kvm_state joggling. But
external users already find interfaces without this restriction
(kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
complicated to _cleanly_ pass kvm_state references to all users that
need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.

Let's just stop this artificial abstraction that has no practical use
and focus on detecting layering violations via code review. That's more
reliable IMHO.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-07 Thread Anthony Liguori

On 01/07/2011 03:03 AM, Jan Kiszka wrote:

Am 06.01.2011 20:24, Anthony Liguori wrote:
   

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
 

From: Jan Kiszkajan.kis...@siemens.com

QEMU supports only one VM, so there is only one kvm_state per process,
and we gain nothing passing a reference to it around. Eliminate any need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
CC: Alexander Grafag...@suse.de
Signed-off-by: Marcelo Tosattimtosa...@redhat.com

   

I think this is a big mistake.
 

Obviously, I don't share your concerns. :)

   

Having to manage kvm_state keeps the abstraction lines well defined.
 

How does it help?

   

Otherwise, it's far too easy for portions of code to call into KVM
functions that really shouldn't.
 

I can't imagine we gain anything from requiring kvm_check_extension
callers to hold a kvm_state capability. Yes, it's now much easier to
call kvm_[vm_]ioctl, but that's the key point of this change:

So far we primarily complicated the internal interface between generic
and arch-dependent kvm parts by requiring kvm_state joggling. But
external users already find interfaces without this restriction
(kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
complicated to _cleanly_ pass kvm_state references to all users that
need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.
   


I think you're basically making my point for me.

ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather 
a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.


kvm_state is available as part of CPU state so it's quite easy to get at 
if these interfaces just took a CPUState argument (and they should).



Let's just stop this artificial abstraction that has no practical use
and focus on detecting layering violations via code review. That's more
reliable IMHO.
   


Documenting relationships between devices and the CPU is a very 
important task.  Being able to grep for cpu_single_env to see what 
devices models are interacting with the CPU is a very good thing.


When you've got these interactions hidden in a spaghetti of function 
calls, things become impossible to understand.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-06 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

QEMU supports only one VM, so there is only one kvm_state per process,
and we gain nothing passing a reference to it around. Eliminate any need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
CC: Alexander Graf ag...@suse.de
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
---
 cpu-defs.h|2 -
 kvm-all.c |  232 +
 kvm-stub.c|2 +-
 kvm.h |   15 +--
 target-i386/cpuid.c   |9 +-
 target-i386/kvm.c |   77 
 target-i386/kvm_x86.h |3 +
 target-ppc/kvm.c  |   12 ++--
 target-s390x/kvm.c|8 +--
 9 files changed, 160 insertions(+), 200 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 8d4bf86..0e04239 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
 #endif
 
 struct kvm_run;
-struct KVMState;
 struct qemu_work_item;
 
 typedef struct CPUBreakpoint {
@@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
 struct QemuCond *halt_cond; \
 struct qemu_work_item *queued_work_first, *queued_work_last;\
 const char *cpu_model_str;  \
-struct KVMState *kvm_state; \
 struct kvm_run *kvm_run;\
 int kvm_fd; \
 int kvm_vcpu_dirty;
diff --git a/kvm-all.c b/kvm-all.c
index ef2ca3b..d8820c7 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -52,8 +52,7 @@ typedef struct KVMSlot
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
-struct KVMState
-{
+static struct KVMState {
 KVMSlot slots[32];
 int fd;
 int vmfd;
@@ -72,21 +71,19 @@ struct KVMState
 int irqchip_in_kernel;
 int pit_in_kernel;
 int xsave, xcrs;
-};
-
-static KVMState *kvm_state;
+} kvm_state;
 
-static KVMSlot *kvm_alloc_slot(KVMState *s)
+static KVMSlot *kvm_alloc_slot(void)
 {
 int i;
 
-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
 /* KVM private memory slots */
 if (i = 8  i  12) {
 continue;
 }
-if (s-slots[i].memory_size == 0) {
-return s-slots[i];
+if (kvm_state.slots[i].memory_size == 0) {
+return kvm_state.slots[i];
 }
 }
 
@@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
 abort();
 }
 
-static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
- target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
  target_phys_addr_t end_addr)
 {
 int i;
 
-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem = s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem = kvm_state.slots[i];
 
 if (start_addr == mem-start_addr 
 end_addr == mem-start_addr + mem-memory_size) {
@@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
 /*
  * Find overlapping slot with lowest start address
  */
-static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
-target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
 target_phys_addr_t end_addr)
 {
 KVMSlot *found = NULL;
 int i;
 
-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem = s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem = kvm_state.slots[i];
 
 if (mem-memory_size == 0 ||
 (found  found-start_addr  mem-start_addr)) {
@@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
 return found;
 }
 
-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
   target_phys_addr_t *phys_addr)
 {
 int i;
 
-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem = s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem = kvm_state.slots[i];
 
 if (ram_addr = mem-phys_offset 
 ram_addr  mem-phys_offset + mem-memory_size) {
@@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, 
ram_addr_t ram_addr,
 return 0;
 }
 
-static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
+static int kvm_set_user_memory_region(KVMSlot *slot)
 {
 struct kvm_userspace_memory_region mem;
 
@@ -166,10 +161,10 @@ static int kvm_set_user_memory_region(KVMState *s, 
KVMSlot *slot)
 mem.memory_size = slot-memory_size;
 mem.userspace_addr = (unsigned 

Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

2011-01-06 Thread Anthony Liguori

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:

From: Jan Kiszkajan.kis...@siemens.com

QEMU supports only one VM, so there is only one kvm_state per process,
and we gain nothing passing a reference to it around. Eliminate any need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
CC: Alexander Grafag...@suse.de
Signed-off-by: Marcelo Tosattimtosa...@redhat.com
   


I think this is a big mistake.

Having to manage kvm_state keeps the abstraction lines well defined.  
Otherwise, it's far too easy for portions of code to call into KVM 
functions that really shouldn't.


Regards,

Anthony Liguori


---
  cpu-defs.h|2 -
  kvm-all.c |  232 +
  kvm-stub.c|2 +-
  kvm.h |   15 +--
  target-i386/cpuid.c   |9 +-
  target-i386/kvm.c |   77 
  target-i386/kvm_x86.h |3 +
  target-ppc/kvm.c  |   12 ++--
  target-s390x/kvm.c|8 +--
  9 files changed, 160 insertions(+), 200 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 8d4bf86..0e04239 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
  #endif

  struct kvm_run;
-struct KVMState;
  struct qemu_work_item;

  typedef struct CPUBreakpoint {
@@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
  struct QemuCond *halt_cond; \
  struct qemu_work_item *queued_work_first, *queued_work_last;\
  const char *cpu_model_str;  \
-struct KVMState *kvm_state; \
  struct kvm_run *kvm_run;\
  int kvm_fd; \
  int kvm_vcpu_dirty;
diff --git a/kvm-all.c b/kvm-all.c
index ef2ca3b..d8820c7 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -52,8 +52,7 @@ typedef struct KVMSlot

  typedef struct kvm_dirty_log KVMDirtyLog;

-struct KVMState
-{
+static struct KVMState {
  KVMSlot slots[32];
  int fd;
  int vmfd;
@@ -72,21 +71,19 @@ struct KVMState
  int irqchip_in_kernel;
  int pit_in_kernel;
  int xsave, xcrs;
-};
-
-static KVMState *kvm_state;
+} kvm_state;

-static KVMSlot *kvm_alloc_slot(KVMState *s)
+static KVMSlot *kvm_alloc_slot(void)
  {
  int i;

-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
  /* KVM private memory slots */
  if (i= 8  i  12) {
  continue;
  }
-if (s-slots[i].memory_size == 0) {
-returns-slots[i];
+if (kvm_state.slots[i].memory_size == 0) {
+returnkvm_state.slots[i];
  }
  }

@@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
  abort();
  }

-static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
- target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
   target_phys_addr_t end_addr)
  {
  int i;

-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem =s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem =kvm_state.slots[i];

  if (start_addr == mem-start_addr
  end_addr == mem-start_addr + mem-memory_size) {
@@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
  /*
   * Find overlapping slot with lowest start address
   */
-static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
-target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
  target_phys_addr_t end_addr)
  {
  KVMSlot *found = NULL;
  int i;

-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem =s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem =kvm_state.slots[i];

  if (mem-memory_size == 0 ||
  (found  found-start_addr  mem-start_addr)) {
@@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
  return found;
  }

-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
target_phys_addr_t *phys_addr)
  {
  int i;

-for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
-KVMSlot *mem =s-slots[i];
+for (i = 0; i  ARRAY_SIZE(kvm_state.slots); i++) {
+KVMSlot *mem =kvm_state.slots[i];

  if (ram_addr= mem-phys_offset
  ram_addr  mem-phys_offset + mem-memory_size) {
@@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, 
ram_addr_t ram_addr,
  return 0;
  }

-static int kvm_set_user_memory_region(KVMState *s,