Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-07 Thread Marcelo Tosatti
On Tue, May 04, 2010 at 02:35:28PM -0400, Glauber Costa wrote:
 Currently, in the linux kernel, we reset kvmclock if we are rebooting
 into a crash kernel through kexec. The rationale, is that a new kernel
 won't follow the same memory addresses, and the memory where kvmclock is
 located in the first kernel, will be something else in the second one.
 
 We don't do it in normal reboots, because the second kernel ends up
 registering kvmclock again, which has the effect of turning off the
 first instance.
 
 This is, however, totally wrong. This assumes we're booting into
 a kernel that also has kvmclock enabled. If by some reason we reboot
 into something that doesn't do kvmclock including but not limited to:
  * rebooting into an older kernel without kvmclock support,
  * rebooting with no-kvmclock,
  * rebootint into another O.S,
 
 we'll simply have the hypervisor writing into a random memory position
 into the guest. Neat, uh?
 
 Moreover, I believe the fix belongs in qemu, since it is the entity
 more prepared to detect all kinds of reboots (by means of a cpu_reset),
 not to mention the presence of misbehaving guests, that can forget
 to turn kvmclock off.
 
 This patch fixes the issue for me.
 
 Signed-off-by: Glauber Costa glom...@redhat.com
 ---
  qemu-kvm-x86.c |   19 +++
  1 files changed, 19 insertions(+), 0 deletions(-)
 
 diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
 index 439c31a..4b94e04 100644
 --- a/qemu-kvm-x86.c
 +++ b/qemu-kvm-x86.c
 @@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque)
  }
  #endif /* KVM_CAP_USER_NMI */
  
 +static int kvm_turn_off_clock(CPUState *env)
 +{
 +struct {
 +struct kvm_msrs info;
 +struct kvm_msr_entry entries[100];
 +} msr_data;
 +
 +struct kvm_msr_entry *msrs = msr_data.entries;
 +int n = 0;
 +
 +kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0);
 +kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0);
 +msr_data.info.nmsrs = n;
 +
 +return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
 +}
 +
 +
  void kvm_arch_cpu_reset(CPUState *env)
  {
 +kvm_turn_off_clock(env);

Better to zero env-system_time_msr and wall_clock_msr here, and modify
kvm_arch_load_regs to write those MSRs on KVM_PUT_RESET_STATE (to be
conformant with the new writeback scheme).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Avi Kivity

On 05/04/2010 09:35 PM, Glauber Costa wrote:

Currently, in the linux kernel, we reset kvmclock if we are rebooting
into a crash kernel through kexec. The rationale, is that a new kernel
won't follow the same memory addresses, and the memory where kvmclock is
located in the first kernel, will be something else in the second one.

We don't do it in normal reboots, because the second kernel ends up
registering kvmclock again, which has the effect of turning off the
first instance.

This is, however, totally wrong. This assumes we're booting into
a kernel that also has kvmclock enabled. If by some reason we reboot
into something that doesn't do kvmclock including but not limited to:
  * rebooting into an older kernel without kvmclock support,
  * rebooting with no-kvmclock,
  * rebootint into another O.S,

we'll simply have the hypervisor writing into a random memory position
into the guest. Neat, uh?

Moreover, I believe the fix belongs in qemu, since it is the entity
more prepared to detect all kinds of reboots (by means of a cpu_reset),
not to mention the presence of misbehaving guests, that can forget
to turn kvmclock off.

This patch fixes the issue for me.

Signed-off-by: Glauber Costaglom...@redhat.com
---
  qemu-kvm-x86.c |   19 +++
  1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 439c31a..4b94e04 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque)
  }
  #endif /* KVM_CAP_USER_NMI */

+static int kvm_turn_off_clock(CPUState *env)
+{
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[100];
+} msr_data;
+
+struct kvm_msr_entry *msrs = msr_data.entries;
+int n = 0;
+
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0);
   


This fails if the kernel doesn't support those MSRs.  Moreover, you need 
to use the new MSRs as well if we are ever to succeed in deprecating the 
old ones.



+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
+}
+
+
   


How about a different approach?  Query the supported MSRs 
(KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for 
the TSC, and the old clock MSRs when the new ones are present)?


Long term we need a kernel reset function, but this will do for now.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Glauber Costa
On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:
 
 +msr_data.info.nmsrs = n;
 +
 +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
 +}
 +
 +
 
 How about a different approach?  Query the supported MSRs
 (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
 for the TSC, and the old clock MSRs when the new ones are present)?
I didn't went that route because I was unsure that every one of them
would be resetable by writing 0 on it.

And if we are going to special case the most part of it, then there
is no point in doing it.

If you think it is doable to special case just the tsc, then I am fine.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Avi Kivity

On 05/05/2010 06:24 PM, Glauber Costa wrote:

On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:

   

+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
+}
+
+
   

How about a different approach?  Query the supported MSRs
(KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
for the TSC, and the old clock MSRs when the new ones are present)?
 

I didn't went that route because I was unsure that every one of them
would be resetable by writing 0 on it.
   


There are probably others.  We should reset them correctly anyway.

It's probably done by generic qemu code so it works.


And if we are going to special case the most part of it, then there
is no point in doing it.

If you think it is doable to special case just the tsc, then I am fine.
   


I think if we have the following sequence

  clear all msrs
  qemu reset
  kvm specific msr reset

Then we'd be fine.

Oh, and tsc needs to be reset to 0 as well - it isn't a special case.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Glauber Costa
On Wed, May 05, 2010 at 06:34:22PM +0300, Avi Kivity wrote:
 On 05/05/2010 06:24 PM, Glauber Costa wrote:
 On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:
 
 +msr_data.info.nmsrs = n;
 +
 +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
 +}
 +
 +
 How about a different approach?  Query the supported MSRs
 (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
 for the TSC, and the old clock MSRs when the new ones are present)?
 I didn't went that route because I was unsure that every one of them
 would be resetable by writing 0 on it.
 
 There are probably others.  We should reset them correctly anyway.
 
 It's probably done by generic qemu code so it works.
 
 And if we are going to special case the most part of it, then there
 is no point in doing it.
 
 If you think it is doable to special case just the tsc, then I am fine.
 
 I think if we have the following sequence
 
   clear all msrs
   qemu reset
   kvm specific msr reset
 
 Then we'd be fine.
 
 Oh, and tsc needs to be reset to 0 as well - it isn't a special case.
This means a guest running on a perfectly synchronized tsc host
will not see a sync tsc. Simply because we can't possible
reset all tscs at the same time.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-04 Thread Glauber Costa
Currently, in the linux kernel, we reset kvmclock if we are rebooting
into a crash kernel through kexec. The rationale, is that a new kernel
won't follow the same memory addresses, and the memory where kvmclock is
located in the first kernel, will be something else in the second one.

We don't do it in normal reboots, because the second kernel ends up
registering kvmclock again, which has the effect of turning off the
first instance.

This is, however, totally wrong. This assumes we're booting into
a kernel that also has kvmclock enabled. If by some reason we reboot
into something that doesn't do kvmclock including but not limited to:
 * rebooting into an older kernel without kvmclock support,
 * rebooting with no-kvmclock,
 * rebootint into another O.S,

we'll simply have the hypervisor writing into a random memory position
into the guest. Neat, uh?

Moreover, I believe the fix belongs in qemu, since it is the entity
more prepared to detect all kinds of reboots (by means of a cpu_reset),
not to mention the presence of misbehaving guests, that can forget
to turn kvmclock off.

This patch fixes the issue for me.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 qemu-kvm-x86.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 439c31a..4b94e04 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque)
 }
 #endif /* KVM_CAP_USER_NMI */
 
+static int kvm_turn_off_clock(CPUState *env)
+{
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[100];
+} msr_data;
+
+struct kvm_msr_entry *msrs = msr_data.entries;
+int n = 0;
+
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0);
+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
+}
+
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
+kvm_turn_off_clock(env);
 kvm_arch_reset_vcpu(env);
 kvm_reset_mpstate(env);
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html