Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:

 While aging is not too hard to do, I don't think it would add much in 
 practice; we rarely observe mmu shadow pages being recycled due to 
 memory pressure.  So this is mostly helpful for preventing a VM from 
 pinning memory when under severe memory pressure, where we don't expect 
 good performance anyway.
 

 Issue is that the shrinker callback will not be called only under
 severe memory pressure, but for normal system pressure too.

   

How much shrinkage goes on under normal pressure?

Rebuilding a single shadow page costs a maximum of 512 faults (so about 
1 msec).  If the shrinker evicts one entry per second, this is a 
performance hiy of 0.1%.

Perhaps if we set the cost high enough, the normal eviction rate will be 
low enough.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/6] pv mmu fixes

2008-03-17 Thread Avi Kivity
Marcelo Tosatti wrote:
 The following patchset fixes pvmmu/cr3-cache for 32-bit guests.


   
Thanks.  I folded the fixes into the patches they fixed, and merged 
everything except cr3 cache (I want to look at it again and see if I can 
reduce the impact a little).  pvmmu branch now contains the unmerged 
patches.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] x86: don't allow KVM_CLOCK without HAVE_KVM

2008-03-17 Thread Avi Kivity
Randy Dunlap wrote:
 On Sun, 16 Mar 2008 13:13:08 +0200 Avi Kivity wrote:

   
 Randy Dunlap wrote:
 
 From: Randy Dunlap [EMAIL PROTECTED]

 Make KVM_CLOCK depend on HAVE_KVM.  Otherwise a Voyager build can
 fail with:

   CC  arch/x86/kernel/asm-offsets.s
 In file included from include2/asm/irqflags.h:59,
  from 
 /local/linsrc/next-20080314/include/linux/irqflags.h:46,
  from include2/asm/system.h:11,
  from include2/asm/processor.h:21,
  from include2/asm/atomic_32.h:5,
  from include2/asm/atomic.h:2,
  from /local/linsrc/next-20080314/include/linux/crypto.h:20,
  from 
 /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets_32.c:7,
  from 
 /local/linsrc/next-20080314/arch/x86/kernel/asm-offsets.c:2:
 include2/asm/paravirt.h: In function 'startup_ipi_hook':
 include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
 named 'startup_ipi_hook'
 include2/asm/paravirt.h:856: error: 'struct pv_apic_ops' has no member 
 named 'startup_ipi_hook'
 include2/asm/paravirt.h:856: error: memory input 4 is not directly 
 addressable
 make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
 make[1]: *** [prepare0] Error 2
 make: *** [sub-make] Error 2

   
   
 Looks like it's a general paravirt vs voyager issue, nothing kvmclock 
 specific about it.  Wouldn't it be better to have voyager and paravirt 
 mutually exclude each other, rather than every paravirt user?
 

 They do generally mutually exclude each other.  I think that the problem
 is just that dirty old select PARAVIRT in config KVM_CLOCK.
 PARAVIRT depends on !(X86_VISWS || X86_VOYAGER), but select doesn't
 care^W honor that.  As Documentation/kbuild/kconfig-language.txt says:

   In general use select only for
   non-visible symbols (no prompts anywhere) and for symbols with
   no dependencies. That will limit the usefulness but on the
   other hand avoid the illegal configurations all over. kconfig
   should one day warn about such things.

 so changing the select to depends on would fix it, but that's the
 only fix that I know of.
   

A depends is horrible from the user point of view as it hides the 
feature completely if paravirt is not enabled.  So your original 
workaround is probably best.

Or maybe
   depends on PARAVIRT_CAPABLE
   selects PARAVIRT

Where PARAVIRT_CAPABLE is a synonym for !(X86_REMOVE_ME || 
X86_TOTAL_SILLYNESS), so we don't have to repeat it everywhere.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] x86: kvmclock needs to include apic.h

2008-03-16 Thread Avi Kivity
Randy Dunlap wrote:
 From: Randy Dunlap [EMAIL PROTECTED]

 kvmclock needs to #include apic.h to prevent a build error:

 next-20080314/arch/x86/kernel/kvmclock.c:142: error: implicit declaration of 
 function 'setup_secondary_APIC_clock'
 elan1.out:make[2]: *** [arch/x86/kernel/kvmclock.o] Error 1

   

Applied, thanks.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH][QEMU] Use a separate device for in-kernel PIT

2008-03-16 Thread Avi Kivity
Anthony Liguori wrote:
 Part of the feedback we received from Fabrice about the KVM patches for QEMU
 is that we should create a separate device for the in-kernel APIC to avoid
 having lots of if (kvm_enabled()) within the APIC code that were difficult to
 understand why there were needed.

 This patch separates the in-kernel PIT into a separate device.  It also
 introduces some configure logic to only compile in support for the in-kernel
 PIT if it's available.

 The result of this is that we now only need a single if (kvm_enabled()) to
 determine which device to use.  Besides making it more upstream friendly, I
 think this makes the code much easier to understand.

   

It introduces a new issue, the save/restore format is effectively forked 
and will have to be maintained in parallel if there are any changes.

Perhaps keep in the same file, but as two separate devices that can 
share the save/restore code?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH][QEMU] Use a separate device for in-kernel PIT

2008-03-16 Thread Avi Kivity
Yang, Sheng wrote:
 And we got two choices in userspace: one ioctl to reset all kvm devices, or 
 one ioctl for each device. For we are separating in kernel device into 
 separate devices, seems the later is more proper. But would it bring other 
 troubles like inconsistent state for smp? 

   

I agree that a separate ioctl would introduce smp problems, so I think 
one ioctl that resets all devices and all vcpus is the way to go.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] shrinker support for the mmu cache

2008-03-16 Thread Avi Kivity
Marcelo Tosatti wrote:
 On Wed, Mar 12, 2008 at 08:13:41PM +0200, Izik Eidus wrote:
   
 this patch simply register the mmu cache with the shrinker.
 

 Hi Izik,

 Nice.

 I think you want some sort of aging mechanism here. Walk through all
 translations of a shadow page clearing the referenced bit of all
 mappings it holds (and moving pages with any accessed translation to the
 head of the list).
   

While aging is not too hard to do, I don't think it would add much in 
practice; we rarely observe mmu shadow pages being recycled due to 
memory pressure.  So this is mostly helpful for preventing a VM from 
pinning memory when under severe memory pressure, where we don't expect 
good performance anyway.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/24] QEMU ACPI PCI hotplug support

2008-03-16 Thread Avi Kivity
Marcelo Tosatti wrote:
 The following patchset allows PCI hot add/remove through ACPI (handled
 by the acpiphp driver on Linux guests).

 Comments are welcome.

   

Applied all, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] keymap nl-be bug/incomplete

2008-03-16 Thread Avi Kivity
Ben Budts wrote:
 Hi,

 I've been having problems using the -vnc and -k nl-be... Since I've started 
 using KVM v.28 ...

 Lots of keys won't work and you get the following errors in your Host OS :

 Warning: no scancode found for keysym 249
 Warning: no scancode found for keysym 163
 Warning: no scancode found for keysym 163
 Warning: no scancode found for keysym 163
 Warning: no scancode found for keysym 163
 Warning: no scancode found for keysym 224
 Warning: no scancode found for keysym 224
 Warning: no scancode found for keysym 224
 Warning: no scancode found for keysym 224
 Warning: no scancode found for keysym 231


 I'm running version 61 now and the keymap still hasn't been fixed, so
 I took the liberty to change the original nl-be keymap that looks like this :

 nato:/opt/kvm/share/qemu/keymaps# cat nl-be
 # Dutch (Belgium)
 map 0x813
 include common
 -
 Did some hard work and added the following, which solved all my problems :-)
 Would be great if you could implement it in your new KVM release
   

This problem is probably shared with qemu, so please post this to 
[EMAIL PROTECTED]  I'll merge it into kvm within a few days.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Qemu-kvm is leaking my memory ???

2008-03-16 Thread Avi Kivity
Zdenek Kabelac wrote:
 Hello

 Recently I'm using qemu-kvm on fedora-rawhide box with my own kernels
 (with many debug options) I've noticed that over the time my memory
 seems to disappear somewhere.

 Here is my memory trace after boot and some time of work - thus memory
 should be populated.
   

No idea how these should add up.  What does 'free' say?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 0/3] QEMU balloon support

2008-03-16 Thread Avi Kivity
Marcelo Tosatti wrote:
 This patchset resends Anthony's QEMU balloon support plus:

 - Truncates the target size to ram size
 - Enables madvise() conditioned on KVM_ZAP_GFN ioctl

   

Once mmu notifiers are in, KVM_ZAP_GFN isn't needed.  So we have three 
possible situations:

- zap needed, but not available: don't madvise()
- zap needed and available: zap and madvise()
- zap unneeded: madvise()

Did you find out what's causing the errors in the first place (if zap is 
not used)?  It worries me greatly.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/2] add functions to read and set the ldt

2008-03-16 Thread Avi Kivity
Izik Eidus wrote:
 From 28f36d30f8eef9c12afe52e183bf4c8405d113d2 Mon Sep 17 00:00:00 2001
 From: Izik Eidus [EMAIL PROTECTED]
 Date: Thu, 13 Mar 2008 02:03:37 +0200
 Subject: [PATCH] KVM: vmx, svm add functions to read and set the ldt

 diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
 index 12932bb..d9806e2 100644
 --- a/include/asm-x86/kvm_host.h
 +++ b/include/asm-x86/kvm_host.h
 @@ -386,6 +386,8 @@ struct kvm_x86_ops {
   void (*set_idt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
   void (*get_gdt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
   void (*set_gdt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
 + void (*get_ldt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
 + void (*set_ldt)(struct kvm_vcpu *vcpu, struct descriptor_table *dt);
   

These are -get_segment() and -set_segment() with seg == VCPU_SREG_LDTR.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 2/2] Hardware task switching support

2008-03-16 Thread Avi Kivity
Izik Eidus wrote:
 From 6a7207a0f3ee8af6ebafcec9d40a75b87f00a129 Mon Sep 17 00:00:00 2001
 From: Izik Eidus [EMAIL PROTECTED]
 Date: Thu, 13 Mar 2008 02:34:21 +0200
 Subject: [PATCH] KVM: hardware task switching support

 Signed-off-by: Izik Eidus [EMAIL PROTECTED]
 ---
  arch/x86/kvm/svm.c |   11 +-
  arch/x86/kvm/tss_segment.h |   59 +++
  arch/x86/kvm/vmx.c |   15 ++
  arch/x86/kvm/x86.c |  385 
 
  include/asm-x86/kvm_host.h |2 +
  5 files changed, 469 insertions(+), 3 deletions(-)
  create mode 100644 arch/x86/kvm/tss_segment.h

 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 4e1dd61..be78278 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1121,9 +1121,14 @@ static int invalid_op_interception(struct vcpu_svm 
 *svm,
  static int task_switch_interception(struct vcpu_svm *svm,
   struct kvm_run *kvm_run)
  {
 - pr_unimpl(svm-vcpu, %s: task switch is unsupported\n, __func__);
 - kvm_run-exit_reason = KVM_EXIT_UNKNOWN;
 - return 0;
 + u16 tss_selector;
 +
 + tss_selector = (u16)svm-vmcb-control.exit_info_1;
 + if(svm-vmcb-control.exit_info_2  ((unsigned long)1  36))
 + return kvm_task_switch(svm-vcpu, tss_selector, 1);
 + if(svm-vmcb-control.exit_info_2  ((unsigned long)1  38))
 + return kvm_task_switch(svm-vcpu, tss_selector, 2);
 + return kvm_task_switch(svm-vcpu, tss_selector, 0);
   

space after if.  Change the magic numbers (36, 38, 0, 1, 2) inuo 
constants in svm.h and kvm_host.h. (unsigned long)1  36 will break on 
i386 (needs 1ULL  36).

  }
  
  static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 diff --git a/arch/x86/kvm/tss_segment.h b/arch/x86/kvm/tss_segment.h
 new file mode 100644
 index 000..622aa10
 --- /dev/null
 +++ b/arch/x86/kvm/tss_segment.h
   

tss already has segment in it.  call it just tss.h.

 +
 +/* allowed just for 8 bytes segments */
 +static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
 +   struct desc_struct *seg_desc)
 +{
 + struct descriptor_table gdt_ldt;
 + u16 index = selector  3;
 +
 + if (selector  1  2)
 + kvm_x86_ops-get_ldt(vcpu, gdt_ldt);
 + else
 + kvm_x86_ops-get_gdt(vcpu, gdt_ldt);
 + if (gdt_ldt.limit  index * 8 + 7)
 + return 1;
   

This part can be in a helper function shared with load_ and 
save_guest_segment_descriptor.  Maybe rename to read_ and write_ to 
avoid confusion.

 +
 +static int load_tss_segment32(struct kvm_vcpu *vcpu,
 +   struct desc_struct *seg_desc,
 +   struct tss_segment_32 *tss)
 +{
 + u32 base_addr;
 +
 + base_addr = seg_desc-base0;
 + base_addr |= (seg_desc-base1  16);
 + base_addr |= (seg_desc-base2  24);
   

A helper, again.

 +static void save_state_to_tss32(struct kvm_vcpu *vcpu,
 + struct tss_segment_32 *tss)
 +{
 + struct kvm_segment kvm_seg;
 +
 + tss-cr3 = vcpu-arch.cr3;
 + tss-eip = vcpu-arch.rip;
 + tss-eflags = kvm_x86_ops-get_rflags(vcpu);
 + tss-eax = vcpu-arch.regs[VCPU_REGS_RAX];
 + tss-ecx = vcpu-arch.regs[VCPU_REGS_RCX];
 + tss-edx = vcpu-arch.regs[VCPU_REGS_RDX];
 + tss-ebx = vcpu-arch.regs[VCPU_REGS_RBX];
 + tss-esp = vcpu-arch.regs[VCPU_REGS_RSP];
 + tss-ebp = vcpu-arch.regs[VCPU_REGS_RBP];
 + tss-esi = vcpu-arch.regs[VCPU_REGS_RSI];
 + tss-edi = vcpu-arch.regs[VCPU_REGS_RDI];
 + 
 + get_segment(vcpu, kvm_seg, VCPU_SREG_ES);
 + tss-es = kvm_seg.selector;
   

tss-es = get_segment_selector(vcpu, VCPU_SREG_ES);

 + load_guest_segment_descriptor(vcpu, tss-ldt_selector,
 +   seg_desc);
 + seg_desct_to_kvm_desct(seg_desc, tss-ldt_selector,
 +kvm_seg);
 + set_segment(vcpu, kvm_seg, VCPU_SREG_LDTR);
 +
 + load_guest_segment_descriptor(vcpu, tss-es, seg_desc);
 + seg_desct_to_kvm_desct(seg_desc, tss-es, kvm_seg);
 + kvm_seg.type |= 1;
 + if (!kvm_seg.s)
 + kvm_seg.unusable = 1;
 + set_segment(vcpu, kvm_seg, VCPU_SREG_ES);
   

Wrap these into a helper.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] KVM: Add reset support for in kernel PIT

2008-03-16 Thread Avi Kivity
Yang, Sheng wrote:
 From 2d08f4266a8f47d9c52db9d4f629ab5d2f8fd044 Mon Sep 17 00:00:00 2001
 From: Sheng Yang [EMAIL PROTECTED]
 Date: Thu, 13 Mar 2008 10:22:26 +0800
 Subject: [PATCH] KVM: Add reset support for in kernel PIT

 Separate the reset part and prepare for reset support.

   

 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index 7776f50..06a241a 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -476,12 +476,28 @@ static int speaker_in_range(struct kvm_io_device *this, 
 gpa_t addr)
   return (addr == KVM_SPEAKER_BASE_ADDRESS);
  }

 -struct kvm_pit *kvm_create_pit(struct kvm *kvm)
 +void kvm_pit_reset(struct kvm_pit *pit)
  {
   int i;
 + struct kvm_kpit_channel_state *c;
 +
 + mutex_lock(pit-pit_state.lock);
 + for (i = 0; i  3; i++) {
 + c = pit-pit_state.channels[i];
 + c-mode = 0xff;
 + c-gate = (i != 2);
 + pit_load_count(pit-kvm, i, 0);
 + }
 + mutex_unlock(pit-pit_state.lock);
 +
 + atomic_set(pit-pit_state.pit_timer.pending, 0);
 + pit-pit_state.inject_pending = 1;
 +}
   

Don't you need an hrtimer_cancel() here, in case this is a true reset 
and not part of initialization?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ANNOUNCE] kvm-guest-drivers-linux-1

2008-03-16 Thread Avi Kivity
Paul Collins wrote:
 Iain Paton [EMAIL PROTECTED] writes:

   
 Trying to build against 2.6.24 gives the following:

 CC [M]  /root/kvm/kvm-guest-drivers-linux-1/virtio_net.o
 /root/kvm/kvm-guest-drivers-linux-1/virtio_net.c: In function 'receive_skb':
 /root/kvm/kvm-guest-drivers-linux-1/virtio_net.c:101: error: implicit 
 declaration of function 'skb_partial_csum_set'
 make[2]: *** [/root/kvm/kvm-guest-drivers-linux-1/virtio_net.o] Error 1
 make[1]: *** [_module_/root/kvm/kvm-guest-drivers-linux-1] Error 2
 

 This change fixes the build here.
   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] KVM: Add reset support for in kernel PIT

2008-03-16 Thread Avi Kivity
Yang, Sheng wrote:
 On Sunday 16 March 2008 22:36:57 Avi Kivity wrote:
   
 Yang, Sheng wrote:
 
 From 2d08f4266a8f47d9c52db9d4f629ab5d2f8fd044 Mon Sep 17 00:00:00 2001
 From: Sheng Yang [EMAIL PROTECTED]
 Date: Thu, 13 Mar 2008 10:22:26 +0800
 Subject: [PATCH] KVM: Add reset support for in kernel PIT

 Separate the reset part and prepare for reset support.



 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index 7776f50..06a241a 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -476,12 +476,28 @@ static int speaker_in_range(struct kvm_io_device
 *this, gpa_t addr)
 return (addr == KVM_SPEAKER_BASE_ADDRESS);
  }

 -struct kvm_pit *kvm_create_pit(struct kvm *kvm)
 +void kvm_pit_reset(struct kvm_pit *pit)
  {
 int i;
 +   struct kvm_kpit_channel_state *c;
 +
 +   mutex_lock(pit-pit_state.lock);
 +   for (i = 0; i  3; i++) {
 +   c = pit-pit_state.channels[i];
 +   c-mode = 0xff;
 +   c-gate = (i != 2);
 +   pit_load_count(pit-kvm, i, 0);
 +   }
 +   mutex_unlock(pit-pit_state.lock);
 +
 +   atomic_set(pit-pit_state.pit_timer.pending, 0);
 +   pit-pit_state.inject_pending = 1;
 +}
   
 Don't you need an hrtimer_cancel() here, in case this is a true reset
 and not part of initialization?
 

 This was done implicitly, when the function call pit_load_count() with 
 c-mode=0xff. But may be it's better to do it explicitly? 

   

It's fine, didn't look closely enough.  Will apply, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 0/4]Porting Xentrace to kvm

2008-03-16 Thread Avi Kivity
Liu, Eric E wrote:
 Hi,
 The following patches port xentrace to kvm which is useful for
 performance tuning and debugging.

 It is designed to allow debugging traces of kvm to be generated on
 Up/Smp machines. Each trace entry is outputted in a trace ring buffer
 for per cpu which is mapped to userspace, and the userspace tools can
 analyze the data according to some formats definitions.

 Since we already have had debugfs_entries and some other kernel debug
 mechanism to use, does this kvmtrace make sense? Any comment is
 welcomed. 
   

This looks very useful.  While kvm_stat provides good data, this is much 
more in depth.

The kernel already contains a method of transferring percpu data to 
userspace; see Documentation/filesystems/relay.txt.  It supports mmap() 
and read().  Please see if it is a good fit.

Please post patches with individual subjects, instead of having the same 
subject for every patch.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 1/4]Porting Xentrace to kvm

2008-03-16 Thread Avi Kivity
Liu, Eric E wrote:
 From 0d7f1ee470fe907e00ac6246bfa11e5322bc64fb Mon Sep 17 00:00:00 2001
 From: Feng (Eric) Liu [EMAIL PROTECTED]
 Date: Sat, 15 Mar 2008 06:07:33 -0400
 Subject: [PATCH] KVM: Add some trace entries in current code, when the
 KVM_TRACE
 compilation option is enabled, it outputs the data info thrace
 buffer. Define some interfaces for userspace tools to use the
 buffer and analyze the trace data.
   

  
 +#ifdef CONFIG_KVM_TRACE
 +#define KVMTRACE_ND(evt, vcpu, cycles, count, d1, d2, d3, d4)
 \
 + do {
 \
 + if (unlikely(kvm_trace_enable_flag)) {
 \
 + if (KVM_TRC_##evt == KVM_TRC_VMEXIT ||
 \
 + KVM_TRC_##evt == KVM_TRC_PAGE_FAULT) {
 \
 + struct {
 \
 + u32 pid:16, vid:16;
   

pids can be 32-bit, I think...

 \
 + u32 data1;
 \
 + unsigned long data2;
 \
 + } _d;
 \
 + _d.pid  = (u16)current-tgid;
 \
 + _d.vid  = (vcpu)-vcpu_id;
 \
 + _d.data1 = d1;
 \
 + _d.data2 = d2;
 \
 + kvm_trace_var(KVM_TRC_ ## evt, cycles,
 \
 + sizeof(_d), (unsigned char *)_d);
 \
 + } else {
 \
   

This special-casing of exits and page faults is probably unnecessary.  
Zeroing a couple of variables in a developer-only environment isn't 
worth it.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH RFC 0/4]Porting Xentrace to kvm

2008-03-16 Thread Avi Kivity
Jan Kiszka wrote:
 Avi Kivity wrote:
   
 Liu, Eric E wrote:
 
 Hi,
 The following patches port xentrace to kvm which is useful for
 performance tuning and debugging.

 It is designed to allow debugging traces of kvm to be generated on
 Up/Smp machines. Each trace entry is outputted in a trace ring buffer
 for per cpu which is mapped to userspace, and the userspace tools can
 analyze the data according to some formats definitions.

 Since we already have had debugfs_entries and some other kernel debug
 mechanism to use, does this kvmtrace make sense? Any comment is
 welcomed. 
   
   
 This looks very useful.  While kvm_stat provides good data, this is much 
 more in depth.

 The kernel already contains a method of transferring percpu data to 
 userspace; see Documentation/filesystems/relay.txt.  It supports mmap() 
 and read().  Please see if it is a good fit.
 

 I would also suggest to use the new kernel standard for instrumentation:
 trace_mark().

 This would also allow to reuse the trace points with other tracer and
 maybe even obsolete a separate transportation channel, e.g. when LTTng
 is once :-/ merged.
   

Can one have markers automatically recorded?  Or do you need to connect 
the marker with a logging function?

If the latter, it's too difficult to use.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Fix sci irq set when acpi timer about to wrap

2008-03-16 Thread Avi Kivity
Dor Laor wrote:
 From 498f162fc9d9fb897c756273c481101a44a220de Mon Sep 17 00:00:00 2001
 From: Dor Laor [EMAIL PROTECTED]
 Date: Thu, 13 Mar 2008 00:11:41 +0200
 Subject: [PATCH] Fix sci irq set when acpi timer about to wrap.

 The acpi timer should generate sci irq when enabled and
 when bit 23 of the timer counter toogles.
 It fixes time reading by the performance counter api of windows guest.

   

How does this relate to ce35c9534137b71327466fa9abc243cbe2d7e8dc?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/6] KVM: In kernel pit model

2008-03-07 Thread Avi Kivity
Yang, Sheng wrote:
 Found more complex for KVM. Xen pulled pm timer down to kernel part, and used 
 the guest TSC as source. So only adjust TSC is OK for it. But we are still 
 using pm timer in QEmu, which using host time as source. So even we pull back 
 TSC, the problem still exists, for 2.6.9 prefer to pm timer by default

Interesting.  I guess we should pull the pm timer into the kernel as 
well.  Timing is too tricky for userspace.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Mark kobjects as unitialized

2008-03-07 Thread Avi Kivity
Greg KH wrote:
 and is on my TODO list, slowly getting
 closer to the top...

   

Strange.  On my TODO list, things slowly get pushed to the bottom.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/2] provide disable clock functionality.

2008-03-07 Thread Avi Kivity
Glauber Costa wrote:
 Avi,

 Hope this is better
   

Applied, thanks.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] disable clock before rebooting.

2008-03-07 Thread Avi Kivity
Glauber Costa wrote:
 This patch writes 0 (actually, what really matters is that the
 LSB is cleared) to the system time msr before rebooting/shutting down
 the machine.

 Without it, we can have a random memory location being written
 when the guest comes back
   if (!kvm_para_available())
 @@ -154,6 +181,11 @@ void __init kvmclock_init(void)
   pv_time_ops.set_wallclock = kvm_set_wallclock;
   pv_time_ops.sched_clock = kvm_clock_read;
   pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
 + machine_ops.emergency_restart = kvm_emergency_restart;
 + machine_ops.shutdown  = kvm_shutdown;
 + machine_ops.restart  = kvm_restart;
 + machine_ops.halt  = kvm_halt;
 + machine_ops.power_off  = kvm_power_off;
   clocksource_register(kvm_clock);
   }
  }
   

Oh, I think that these are all unnecessary.  You need to stop the clock 
only if the memory it uses will be reused.  Halt, shutdown and poweroff 
clearly don't.  Resets need to go through the host anyway, since they 
can be invoked without the guest knowing about it.

The only case I can think of where we need to stop the clock is kexec.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/6] KVM: In kernel pit model

2008-03-07 Thread Avi Kivity
Yang, Sheng wrote:
 On Friday 07 March 2008 16:53:40 Avi Kivity wrote:
   
 Yang, Sheng wrote:
 
 Found more complex for KVM. Xen pulled pm timer down to kernel part, and
 used the guest TSC as source. So only adjust TSC is OK for it. But we are
 still using pm timer in QEmu, which using host time as source. So even we
 pull back TSC, the problem still exists, for 2.6.9 prefer to pm timer by
 default
   
 Interesting.  I guess we should pull the pm timer into the kernel as
 well.  Timing is too tricky for userspace.
 

 ... Should we suggest using clock=pit on pae 2.6.9 at first?

   

While it is hardly a lovely solution (things should work out of the box) 
it is reasonable as a temporary measure.


Can you repost your patchset?  If you're quick I can apply it today, 
otherwise it will have to wait until next week.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] headersinstall of kvm.h does not work

2008-03-07 Thread Avi Kivity
Christian Borntraeger wrote:
 Hello Avi,

 in commit fb56dbb31c4738a3918db81fd24da732ce3b4ae6 you changed 
 include/linux/Kbuild:
 snip
 KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM
 Currently, make headers_check barfs due to asm/kvm.h, which linux/kvm.h
 includes, not existing.  Rather than add a zillion asm/kvm.hs, export 
 kvm.h only if the arch actually supports it.
 [...]
  unifdef-y += keyboard.h
 -unifdef-y += kvm.h
 +unifdef-$(CONFIG_HAVE_KVM) += kvm.h
  unifdef-y += llc.h
  unifdef-y += loop.h
 snip--

 This patch does not work. Kbuild (scripts/Makefile.headersinst) does not 
 check the config file, so kvm.h is never installed.

 Sam is there an easy way to allow constructs like unifdef-$(CONFIG_FOO)?
   

I think this cleverness has caused too much trouble already, and adding 
asm-*/kvm.h would have been better.

As I'm about to disappear for a week, consider a patch to remove the 
config dependency and add asm-*/kvm.h pre-acked for mainline.  Maybe the 
presence of those empty asm-*/kvm.h files will encourage further kvm 
ports to *.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] disable clock before rebooting.

2008-03-07 Thread Avi Kivity

Glauber Costa wrote:

 Why not go all the way and to _restart the same way?

 Because it got a parameter, and doing it in the same macro would make
 my beautiful macros ugly.
 Using another one, to pass the argument, didn't seem justifiable to 
 me, since there were just one of its kind.

Yes, of course.  My mistake.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] disable clock before rebooting.

2008-03-07 Thread Avi Kivity
Glauber Costa wrote:
 as for kexec, it uses precisely the shutdown function, doesn't it?

 Or is it crash_shutdown?
 Humm, /me looks, and I think it's the later, right?


Only on crash-triggered kexecs.  It can also happen via sys_reboot().  
Which, it appears, goes through machine_shutdown().

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] headersinstall of kvm.h does not work

2008-03-07 Thread Avi Kivity
Christian Borntraeger wrote:
 Am Freitag, 7. März 2008 schrieb Avi Kivity:
   
 As I'm about to disappear for a week, consider a patch to remove the 
 config dependency and add asm-*/kvm.h pre-acked for mainline.  Maybe the 
 presence of those empty asm-*/kvm.h files will encourage further kvm 
 ports to *.
 

 Something like the following for all architectures?

   

Yes.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/6] Latest in kernel PIT patch

2008-03-07 Thread Avi Kivity
Yang, Sheng wrote:
 Hi

 Here is the latest in kernel PIT patch. Not much change from last edition. 

 One known issue is on 2.6.9 pae guest(e.g. RHEL4), you need clock=pit 
 kernel 
 parameter to get the correct time. That's because the kernel is too active 
 to fix the lost interrupt when PIT interrupts pending... We may find more 
 elegant way to deal with it later.
   

Thanks, all applied.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/2] [PATCH] allow machine_crash_shutdown to be replaced

2008-03-07 Thread Avi Kivity
Glauber Costa wrote:
 This patch a llows machine_crash_shutdown to
 be replaced, just like any of the other functions
 in machine_ops

   
er, against what tree is this?  doesn't apply to kvm.git.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/6] KVM: In kernel pit model

2008-03-06 Thread Avi Kivity
Yang, Sheng wrote:
 Here is the updated patch. I kept 0xff because I think it's OK for understand 
 easily. :)

   

Any news on the regression with older Linux guests?  That's the only 
thing keeping my from applying the patchset.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [ANNOUNCE] kvm-guest-drivers-linux-1

2008-03-06 Thread Avi Kivity
This is the first release of block and network drivers for Linux guests 
running on a kvm host.  The drivers are intended for guest kernels 
2.6.18-2.6.24.  Newer kernels include the drivers; older kernels may not 
work.  kvm-61 or later is needed in the host.

Network throughput is around 1 Gbit/sec; latency is currently a less 
than stellar 0.3 msec.  Both these figures will be improved in the future.

The drivers are available from the download page of the kvm website, below.

To use, download the drivers into the guest (using one of the emulated 
network cards), unpack the package, and type

  make
  sudo make install

You will need the kernel development package installed.

To use the drivers, reboot with '-net nic,model=virtio' instead of the 
usual setting, and 'modprobe virtio_net' to load the module.  Similarly 
use '-drive ...,if=virtio' for the block device, and the corresponding 
virtio_blk.ko.

Initial release:
- virtio pci driver (Anthony Liguori, Rusty Russell, Dor Laor)
- virtio network driver (Anthony Liguori, Rusty Russell, Dor Laor)
- virtio block driver (Anthony Liguori, Rusty Russell, Dor Laor)
- backports to Linux 2.6.18-2.6.24 (Anthony Liguori, Rusty Russell, Dor 
Laor)

http://kvm.qumranet.com/

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 8/8] x86: KVM guest: VMX cr3 cache support

2008-03-06 Thread Avi Kivity
Zhao Forrest wrote:
 * We only need to hook operations that are MMU writes. We hook these so
 that
 * we can use lazy MMU mode to batch these operations. We could probably
 * improve the performance of the host code if we used some of the
 information
 @@ -219,6 +359,9 @@ static void paravirt_ops_setup(void)
 pv_mmu_ops.lazy_mode.enter = kvm_enter_lazy_mmu;
 pv_mmu_ops.lazy_mode.leave = kvm_leave_lazy_mmu;
 }
 +
 +if (kvm_para_has_feature(KVM_FEATURE_CR3_CACHE))
 
 Here guest OS calls cpuid() to know if KVM_FEATURE_CR3_CACHE is
 supported by KVM, so I think that the kernel counterpart of this
 patch(i.e. [kvm-devel] [PATCH 7/8] KVM: MMU: VMX cr3 cache support)
 should include the code to intercept cpuid trap and set
 KVM_FEATURE_CR3_CACHE. But I didn't find such code in [PATCH 7/8] KVM:
 MMU: VMX cr3 cache support. Do I miss anything relevant?

   

Userspace sets the cpuid information; this allows, for example, command 
line switches to hide paravirtualization support.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] automatic reboot winxp hanging on vnc...

2008-03-06 Thread Avi Kivity
Thomas Besser wrote:
 Avi Kivity wrote:
   
 Thomas Besser wrote:
 
 Is that a feature or a bug? Any hints to solve this problem?
   
 Strange.

 What kvm version are you using?
 

 Ups, sorry for forgetting this: I am using kvm-52 in a productive
 environment. Host ist Debian etch, debs are self backported from unstable.
 Kernel version 2.6.22.9.

 In two weeks I will have an testing system, so that I can probe newer
 versions too.

   

Okay, please try again with the newest kvm, the problem may have been 
fixed already.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Lguest] networking in lguest: strange error: lguest: Guest moved used index from 6 to 256

2008-03-06 Thread Avi Kivity
Dor Laor wrote:
 On Wed, 2008-03-05 at 17:10 -0600, Anthony Liguori wrote:
   
 Dor Laor wrote:
 
 Seems to work reliably with kvm, should do the same trick for lguest.
 You can download it from
 git://kvm.qumranet.com/home/dor/src/kvm-guest-drivers-linux

 Anthony, you can update your mercurial repository with our changes.
   
   
 Hrm, I thought we were standardizing on 
 http://www.kernel.org/pub/scm/virt/kvm/kvm-guest-drivers-linux.git

 

 Better this way, attached 3 patches needed for compilation.

   

Applied (as you saw), thanks -- but please, one patch per email next time.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 3/4] [PATCH] kvmclock: allow it to be turned off

2008-03-06 Thread Avi Kivity
Glauber Costa wrote:
 Apart from the fact that it will break every single guest out there, 
 that's ok. As I said: these things are so early, that maybe we can pay 
 this price. Your call.
   

Which guests?  kvmclock is only in kvm.git, and I don't think any distro 
is based on that.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 7/8] KVM: MMU: VMX cr3 cache support

2008-03-06 Thread Avi Kivity
Marcelo Tosatti wrote:

 Here CR3_TARGET_VALUEx is written.
 My question is:
 1 why is vmcs_writel(CR3_TARGET_VALUE0 + idx*2, cr3); called by
 vmx_set_cr3(), but not called by mmu_free_roots()?
 

 By clearing guest_cr3 entry of the shared area we avoid the guest from
 using it.

 So its unecessary to also clear the corresponding CR3_TARGET_VALUE0
 register.

   
 2 since cache is also mapped to guest OS. Is calling
 vmcs_writel(CR3_TARGET_VALUE0 + idx*2, cr3); necessary?
 

 As said above, no, because the guest will check
 cache-entry[idx].guest_cr3 before attempting to use a cached host_cr3
 value.
   


Only if you trust the guest.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Offline for a week

2008-03-06 Thread Avi Kivity
I will be off-line (and sometimes off-piste) from March 8 through March 
15.  I may have sporadic Internet access.

Andrew, should kvm.git not play nicely with the rest of the children, 
have it stand in the corner.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH/RFC 1/2] anon-inodes: Remove fd_install() from anon_inode_getfd()

2008-03-05 Thread Avi Kivity
Davide Libenzi wrote:
 I think that may be a bit cleaner than Al's approach, but it still
 leaves the same trap that create_vcpu_fd() falls into.  The current
 code is:

 static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 {
  int fd, r;
  struct inode *inode;
  struct file *file;

  r = anon_inode_getfd(fd, inode, file,
   kvm-vcpu, kvm_vcpu_fops, vcpu);
  if (r)
  return r;
  atomic_inc(vcpu-kvm-filp-f_count);
  return fd;
 }

 and with your proposal, the natural way to write that becomes:

 static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 {
  int fd, r;

  r = anon_inode_getfd(fd, NULL,
   kvm-vcpu, kvm_vcpu_fops, vcpu);
  if (r)
  return r;
  atomic_inc(vcpu-kvm-filp-f_count);
  return fd;
 }
 

 I don't know KVM code, but can't the private_data setup be completed 
 before calling anon_inode_getfd()?
   

Creating the fd is the last thing done when creating a vcpu.

 Or ...

 static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 {
   int fd, r;

   get_file(vcpu-kvm-filp);
   r = anon_inode_getfd(fd, NULL,
kvm-vcpu, kvm_vcpu_fops, vcpu);
   if (r) {
   fput(vcpu-kvm-filp);
   return r;
   }
   return fd;
 }
   

This seems reasonable.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

2008-03-05 Thread Avi Kivity
Robin Holt wrote:
 On Wed, Mar 05, 2008 at 07:09:55AM +0200, Avi Kivity wrote:
   
 Isn't that out of the question for .25?
 

 I keep hearing this mantra.  What is so compelling about the .25
 release?  When seems to be more important than what.  While I understand
 product release cycles, etc. and can certainly agree with them. I would
 like to know with what I am being asked to agree.

   

kvm gained the ability to swap in 2.6.25.  Without mmu notifiers, 
though, the guest can still easily pin all of its memory.

 That said, I agree we should probably finish getting the comments on
 Andrea's most recent patch, if any, cleared up and put that one in.
   

Great.  Thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/6] KVM: In kernel pit model

2008-03-05 Thread Avi Kivity
Yang, Sheng wrote:
 Thanks for comments!

 On Wednesday 05 March 2008 17:15:29 Ingo Molnar wrote:
   
 * Yang, Sheng [EMAIL PROTECTED] wrote:
 
 +#if 1
 +#define pit_debug(fmt, arg...) printk(KERN_WARNING fmt, ##arg)
 +#else
 +#define pit_debug(fmt, arg...)
 +#endif
   
 this should use pr_debug() instead i guess.
 

 Um... I followed example on ./virt/kvm/ioapic.c here. Though I think it's 
 good 
 to substitute all self defined debug printk with pr_debug, why KVM have 
 little pr_xxx(the only ones are in x86.c)? Maybe for KVM is acting more like 
 a separate driver, and using printk is easier for separate debug? I really 
 don't know...

   

It's mostly due to lack of knowledge about pr_debug(); it wasn't 
intentional.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] virtio-balloon: do not attempt to release more than available pages

2008-03-05 Thread Avi Kivity
Marcelo Tosatti wrote:
 Handle the case where the balloon target is larger than total ram size.

 BUG: unable to handle kernel paging request at 00100100
 IP: [881970f9] :virtio_balloon:leak__balloon+0x2e/0xbe

 Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

 Index: virtio/virtio_balloon.c
 ===
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -122,10 +122,21 @@ static void release_pages_by_pfn(const u
   }
  }
  
 +static void update_target_size(struct virtio_balloon *vb)
 +{
 + __le32 num_pages = cpu_to_le32(vb-num_pages);
 +
 + vb-vdev-config-set(vb-vdev,
 +   offsetof(struct virtio_balloon_config, num_pages),
 +   num_pages, sizeof(num_pages));
 +}
   

The target is host-owned; moreover the problem may be temporary, but 
you've changed the target permanently.

Suggest sending the host a message (like the page list) indicating it 
couldn't allocate any more.

Also, we may have driven the guest close to oom with this.  We need to 
notify the host when the guest gets into a low-memory cannot swap condition.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] still seeing network freezes with rtl8139 nic

2008-03-05 Thread Avi Kivity
david ahern wrote:
 Try adding the noapic option to your guest kernel. I re-ran that test on 
 kvm-62
 and my VM was able to run under load for more than 3-1/2 days (the network 
 never
 locked up; I stopped the test to try other variations).

 One side effect of the noapic option is that irq balancing is disabled -- all
 interrupts are delivered via CPU 0. I ran a few tests earlier this week 
 without
 the noapic option (hence with the apic) but with irq balancing disabled and
 still had the lockups. It seems to be something specific to the apic.

   

I got good results with apic and e1000.  Can you try it?

May be a guest driver bug.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/6] In kernel PIT patch

2008-03-05 Thread Avi Kivity
Anthony Liguori wrote:
 Playing a movie is a bit subjective.  I presume you're talking about the 
 standard HAL as presumably the ACPI HAL is using the pm timer?
   

ACPI HAL uses the apic timer, IIRC; perhaps the pm timer as well.

 So the two cases I'm hearing where timer accuracy should improve is 
 standard HAL on Windows and clock=pit on Linux?  I'd still like to see 
 what the actual difference in timer accuracy is.

It depends on the load.  As the load increases, the host process starts 
to miss timer signals.  With both pic and pit in userspace, you can 
detect those missed interrupts and inject them later once you get your 
timeslice.  With the pic in kernel, there is no way to do this.

The same thing happens with the apic timer, only there, it is easy to 
compensate because both parts are in the kernel.

   I have no doubt that 
 moving the pit into the kernel is more efficient.  Moving everything 
 into the kernel is more efficient because light weight exits are cheaper 
 than heavy weight exits.
   

Efficiency is only a secondary goal here.  The userspace PIT does not 
consume large amounts of CPU.

 The thing I'm trying to get at is a quantitative statement about why 
 moving the pit into the kernel is the right thing.  I'll try to give the 
 patches a try myself in the next couple of days.  I don't think it's 
 obvious that it's the right thing to do without some sort of benchmark 
 supporting it.
   

Playing a movie is better than any benchmark; it reflects actual user 
experience in a real and important use case.  Benchmarks are substitutes 
for real use cases, not the goal of the optimization.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] virtio-balloon: do not attempt to release more than available pages

2008-03-05 Thread Avi Kivity
Marcelo Tosatti wrote:
 On Wed, Mar 05, 2008 at 06:59:10PM +0200, Avi Kivity wrote:
   
 Marcelo Tosatti wrote:
 
 Handle the case where the balloon target is larger than total ram size.

 BUG: unable to handle kernel paging request at 00100100
 IP: [881970f9] :virtio_balloon:leak__balloon+0x2e/0xbe

 Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

 Index: virtio/virtio_balloon.c
 ===
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -122,10 +122,21 @@ static void release_pages_by_pfn(const u
 }
 }

 +static void update_target_size(struct virtio_balloon *vb)
 +{
 +   __le32 num_pages = cpu_to_le32(vb-num_pages);
 +
 +   vb-vdev-config-set(vb-vdev,
 + offsetof(struct virtio_balloon_config, 
 num_pages),
 + num_pages, sizeof(num_pages));
 +}
  
   
 The target is host-owned; moreover the problem may be temporary, but 
 you've changed the target permanently.

 Suggest sending the host a message (like the page list) indicating it 
 couldn't allocate any more.

 Also, we may have driven the guest close to oom with this.  We need to 
 notify the host when the guest gets into a low-memory cannot swap condition.
 

 I guess the description was not clear, you understood the opposite.

 The problem is when the target for total guest pages (not balloon target
 size) is set to be larger than the amount of total pages the guest has
 booted with. What happens then is that the driver tries to release pages
 from the balloon, without checking if there are any:

 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
 struct page *page;

   /* We can only do one array worth at a time. */
   num = min(num, ARRAY_SIZE(vb-pfns));

 for (vb-num_pfns = 0; vb-num_pfns  num; vb-num_pfns++) {
 page = list_first_entry(vb-pages, struct page, lru);
 list_del(page-lru);
 vb-pfns[vb-num_pfns] = page_to_pfn(page);
 vb-num_pages--;
 }

 vp-pages is empty here.

 So the patch checks for the availability of ballooned pages before
 attempting to release any, and sets num_pages to match that. 

 The host should not allow that to condition to happen, but its still
 fragile code in the guest driver.

   

Ah, I see.  We could simply ignore it, or adjust the target as you did.  
Not sure what is better.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ANNOUNCE] kvm-62 release

2008-03-05 Thread Avi Kivity
Avi Kivity wrote:
 Add cpus on the fly to your virtual machines with the new cpu hotplug 
 feature.


er, kvm-63, that is.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [ANNOUNCE] kvm-62 release

2008-03-05 Thread Avi Kivity
Add cpus on the fly to your virtual machines with the new cpu hotplug 
feature.

Changes from kvm-62:
- portability: make room for the ia64 register stack (Xiantao Zhang)
- fix leak when setting the pv clock to an invalid address (Marcelo Tosatti)
- detect vcpu triple faults (Joerg Roedel)
- fix race when instantiating a shadow pte
- fix host crash on guest kexec
- code cleanups (Harvey Harrison)
- better tsc handling on Intel hosts with stable tscs
- cpu hotplug (Glauber Costa)
- merge qemu-cvs
   - new curses display option
- change -hugetlb-path to -mem-path (Anthony Liguori)
- increase pci support from 6 slots to 32 slots
- document ./configure --disable-cpu-emulation (Jerone Young)
- fix powerpc cpu initialization (Jerone Young)
- simplify host_cpuid() assembly code


Notes:
  If you use the modules bundled with kvm-63, you can use any version
of Linux from 2.6.17 upwards.
  If you use the modules bundled with Linux 2.6.20, you need to use
kvm-12.
  If you use the modules bundled with Linux 2.6.21, you need to use
kvm-17.
  Modules from Linux 2.6.22 and up will work with any kvm version from
kvm-22.  Some features may only be available in newer releases.
  For best performance, use Linux 2.6.23-rc2 or later as the host.

http://kvm.qumranet.com


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [ANNOUNCE] kvm-62 release

2008-03-05 Thread Avi Kivity
Alexey Eremenko wrote:
 Very Nice. Must be KVM-63.

   
  - merge qemu-cvs
- new curses display option
 

 How to activate that one?

   


qemu -curses

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Notifier for Externally Mapped Memory (EMM) V1

2008-03-05 Thread Avi Kivity
Christoph Lameter wrote:
  
  /*
 + * Notifier for devices establishing their own references to Linux
 + * kernel pages in addition to the regular mapping via page
 + * table and rmap. The notifier allows the device to drop the mapping
 + * when the VM removes references to pages.
 + */
 +enum emm_operation {
 + emm_release,/* Process existing, */
 + emm_invalidate_start,   /* Before the VM unmaps pages */
 + emm_invalidate_end, /* After the VM unmapped pages */
 + emm_referenced  /* Check if a range was referenced */
 +};
   

Check and clear


btw, a similar test and clear dirty would be useful as well, no?

 +
 +struct emm_notifier {
 + int (*callback)(struct emm_notifier *e, struct mm_struct *mm,
 + enum emm_operation op,
 + unsigned long start, unsigned long end);
 + struct emm_notifier *next;
 +};
 +
   

It is cleaner for the user to specify individual callbacks instead of 
having a switch.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/4] [PATCH] kvmclock: release time_page if msr is rewritten

2008-03-05 Thread Avi Kivity
Glauber Costa wrote:
 If the calling cpu rewrites the system clock msr for any reason,
 release the page we allocated in the last time
   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 3/4] [PATCH] kvmclock: allow it to be turned off

2008-03-05 Thread Avi Kivity
Glauber Costa wrote:
 Use the lower 3 lower bits of the system time msr to turn off the clock.
 This means that all clock registration has to be aligned in a 4-byte boundary

   

3 bits - 8 bytes.

How about just using just bit 0 as an enable bit (not a disable bit).  
That means the default value of zero means the clock is disabled, and 
that we have a couple of more bits to enable future features.

 Signed-off-by: Glauber Costa [EMAIL PROTECTED]
 ---
  arch/x86/kvm/x86.c |5 +
  1 files changed, 5 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 6abd784..7ce14ce 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -591,6 +591,11 @@ int kvm_set_msr_common(struct kvm_vcpu *
   if (vcpu-arch.time_page)
   kvm_release_page_dirty(vcpu-arch.time_page);
  
 + /* 4-byte unaligned accesses are invalid */
 + if (data  0x7) {
 + vcpu-arch.time_page = NULL;
 + break;
 + }
   vcpu-arch.time = data  PAGE_MASK;
   vcpu-arch.time_offset = data  ~PAGE_MASK;
  
   


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 4/4] [PATCH] cleanup leftovers

2008-03-05 Thread Avi Kivity
Glauber Costa wrote:
 clean this leftover in kvmclock.c

   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/3] Expose thread id through info cpus

2008-03-05 Thread Avi Kivity
Glauber Costa wrote:
 Hey,

 This patch series expose the actual thread id of each cpu via the qemu
 monitor. It is done through info cpus, which I though would be the
 most natural command to do it. (If you disagree, please voice it)

 Goal is to allow tools like libvirt to easily grab it and feed taskset
 for thinks like cpu pinning, etc

 AFAIK, qemu runs all cpus in the same process, so for plain qemu, all cpus
 will show the same id. But KVM can benefit from it, by overriding this data
 in its ap initialization

 Of the whole series, only the last patch is kvm-specific.

 Many thanks to Anthony, who pointed me that this approach was possible.


   

Applied all, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] virtio-balloon: do not attempt to release more than available pages

2008-03-05 Thread Avi Kivity
Anthony Liguori wrote:

 Set the target in QEMU to be larger than guest ram size. The config
 space variable will be set negatively, so guest attempts to release
 pages from the balloon.

   
 

 Is an __le32 signed?  If so, we should just use an unsigned type.

   

That would balloon the guest into oblivion if the same condition happened.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 6/8] x86: KVM guest: hypercall batching

2008-03-05 Thread Avi Kivity
Zhao Forrest wrote:
 Hi Avi,

 After reading the patch, I think the hypercall batching mechanism is as 
 follows:
 1 defer the MMU-related operations and buffer them in
 kvm_para_state-mmu_queue[]
 2 during the flush period, kvm_mmu_op() is called to flush operations
 in kvm_para_state-mmu_queue[]
 3 kvm_mmu_op() generate a hypercall for each operation in
 kvm_para_state-mmu_queue[]; thus trigger a context switch from guest
 mode to kernel mode for each operation.

 My question is: Is it possible to only generate a single
 hypercall(thus a single context switch) for all buffered MMU
 operations in kvm_para_state-mmu_queue[]? This way we could further
 reduce overhead, am I right?
 BTW. I don't have a deep understanding of KVM. So this is just a
 question out of my curiosity.

   

mmu_queue_flush() is called once per batch, so we only have one 
hypercall per batch (at least if the data doesn't exceed 512 bytes).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:
 http://ols.108.redhat.com/2007/Reprints/kivity-Reprint.pdf

 
 Hi Avi,

 I have a question about KVM architecture after reading your paper.
 It reads:
 ..
 At the kernel level, the kernel causes the hardware
 to enter guest mode. If the processor exits guest
 mode due to an event such as an external interrupt
 or a shadow page table fault, the kernel performs
 the necessary handling and resumes guest execution.
 If the exit reason is due to an I/O instruction
 or a signal queued to the process, then the kernel
 exits to userspace.
 ..
 After reading your paper my understanding of KVM architecture is that
 for a particular VM the user mode(QEMU), kernel mode and guest mode share
 the same process context from host linux kernel's point of view, right?
   

Correct.  Virtual machine == process, virtual cpu == thread.

 If this is the case, see the below example:
 1 physical NIC interrupt is received on physical CPU 0 and host kernel
 determines that this is a network packet targeted to the emulated NIC
 for a VM
 2 at the same time this VM is running in guest mode on physical CPU 1
 My question is: at this time can host kernel *actively* interrupt VM
 and make it run in user mode to handle the incoming network data
 packet in QEMU? Or host kernel has to wait for
 VM(because of external interrupt or shadow page table fault or I/O
 instruction) to quit guest mode and wait for VM to voluntarily detect
 that incoming network packet is pending and switch to user space?
   

The incoming packet is processed by the host ethernet stack; it is 
forwarded to the bridge, which forwards it to the tap.  When the tap 
queues the packet, it sends a signal to qemu (since the tap file 
descriptor has a signal associated).  When the kernel delivers the 
signal, it notices the qemu thread is running on cpu 1, so it sends an 
inter-processor interrupt to cpu 1.  The interrupt causes the processor 
to leave guest mode and exit to the hypervisor, which notices that a 
signal is pending, so it exits to qemu which dequeues the packet and 
notifies the guest (if necessary) by injecting an interrupt.

Note that most of this path (including the IPI) is regular Linux code, 
not kvm related, and would happen for any other application in the same way.

 A further question is, how a VM detect the incoming pending network
 packet? In kernel space or in user space?
   

Are you talking about the host or guest?  If the host, the packet is 
received by the kernel, and further processing is done in userspace.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] use smp_cpus as lapic id

2008-03-04 Thread Avi Kivity
Glauber Costa wrote:
 apic is not acpi, although they are acronyms. Due to a confusion of 
 mine, those things were mixed, leading to a bug reported at
 https://sourceforge.net/tracker/index.php?func=detailaid=1903732group_id=180599atid=893831
  


 This patch fixes it, by assigning smp_cpus instead of MAX_CPUS to 
 lapic_id in the MP APIC tables.


Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:

 The incoming packet is processed by the host ethernet stack; it is
 forwarded to the bridge, which forwards it to the tap. When the tap
 queues the packet, it sends a signal to qemu (since the tap file
 descriptor has a signal associated). When the kernel delivers the
 signal, it notices the qemu thread is running on cpu 1, so it sends an
 inter-processor interrupt to cpu 1. The interrupt causes the processor
 
 Is the behavior of When the kernel delivers the signal, it notices the
 qemu thread is running on cpu 1, so it sends an IPI to cpu 1 a generic
 signal-delivery behavior in Linux kernel? Or KVM need to add some hook
 to achieve this?

   

This is generic behavior.  The same thing happens when running plain 
qemu, except that instead of breaking out from guest mode, the IPI 
causes the processor to break out from user mode and trap into the kernel.

This is one of the most powerful characteristics of kvm: it works with 
ordinary kernel mechanisms, so that all the properties and features of 
the kernel apply to kvm automatically.  In this particular case, it 
means that all the scheduler features, including real-time scheduling, 
apply to kvm guests.  With mmu notifiers, the trend will grow even stronger.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] 11s softlockup and hang with kvm on 2.6.24.3

2008-03-04 Thread Avi Kivity
Paul Collins wrote:
 Running kvm 62 on 2.6.24.3 (the extraversion is reported due to a local
 patch to vfat) I just got a couple of soft lockups and a hang.  I was
 installing FreeBSD 7 at the time.


 Mar  4 21:20:13 burly kernel: Call Trace:
 Mar  4 21:20:13 burly kernel:  [8024b60e] 
 hrtimer_try_to_cancel+0x67/0x70
 Mar  4 21:20:13 burly kernel:  [8024b629] hrtimer_cancel+0x12/0x16
 Mar  4 21:20:13 burly kernel:  [8024b623] hrtimer_cancel+0xc/0x16
 Mar  4 21:20:13 burly kernel:  [880fc9cd] 
 :kvm:kvm_migrate_apic_timer+0x19/0x2e
   

Can you reproduce and send the output of sysrq-t?  Looks like the timer 
callback is stuck.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 4/8] KVM: MMU: hypercall based pte updates and TLB flushes

2008-03-04 Thread Avi Kivity
Marcelo Tosatti wrote:
 Hi Avi,

 Looks nice.

 On Sun, Mar 02, 2008 at 06:31:17PM +0200, Avi Kivity wrote:
   
 +int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 +  gpa_t addr, unsigned long *ret)
 +{
 +int r;
 +struct kvm_pv_mmu_op_buffer buffer;
 

 Perhaps this structure is a little large to be on stack.

   

512 bytes should be fine as this isn't part of a particularly deep 
path.  If it gives us trouble we can reduce it as the size is 
independent from the guest buffer size.

 +down_read(current-mm-mmap_sem);
 +down_read(vcpu-kvm-slots_lock);
 

 The order should be slots_locks then mmap_sem. Need some comment in the
 code.
   

Changed, thanks.

As the patchset (less cr3 caching) passed the regression tests I'll 
apply it.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 4/8] KVM: MMU: hypercall based pte updates and TLB flushes

2008-03-04 Thread Avi Kivity
Avi Kivity wrote:

 As the patchset (less cr3 caching) passed the regression tests I'll 
 apply it.


Forgot about the unresolved CONFIG_HIGHPTE problem, so not applying for now.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Stop the clock!

2008-03-04 Thread Avi Kivity
With paravirt clocksource, reboot and kexec are broken: the clock keeps 
updating after the reboot, and the new kernel will have a random memory 
location trampled occasionally.

So we need to stop the clock on kexec (in the guest) and reboot (in the 
host).  On the host side, this can be done either in the kernel, or in 
userspace via new ioctls.

Joerg, I think you mentioned you were working on a vm-wide reset 
ioctl()?  If so, that would be the place to stop the clock on reboot.

Glauber, can you extend the interface to support stopping the clock?  It 
needs to be done even outside kexec, for example if the the user decides 
to stop using your clock.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM Test result, kernel daf4de3.., userspace 724f8a9.. One new issue

2008-03-04 Thread Avi Kivity
Yunfeng Zhao wrote:
  Hi, all,
  
 This is today's KVM test result against kvm.git 
 daf4de30ec718b16798aba07e9f25fd9e6ba9e53 and kvm-userspace.git 
 724f8a940ec0e78e607c051e6e82ca2f5055b1e1.
 In today's testing , save/restore crashed host once on pae/ia32e hosts.
 One  new issue has been found:
 1. blue screen when booting 64bits windows guests 
 /tracker/index.php?func=detailaid=1906751group_id=180599atid=893831
 https://sourceforge.net/tracker/index.php?func=detailaid=1906751group_id=180599atid=893831

   

This was caused by


commit 3a001629eea909b2aa97f001a9db4700f15d986b
Author: Amit Shah [EMAIL PROTECTED]
Date:   Thu Feb 28 16:06:15 2008 +0530

KVM: is_long_mode() should check for EFER.LMA

is_long_mode currently checks the LongModeEnable bit in
EFER instead of the LongModeActive bit. This is wrong, but
we survived this till now since it wasn't triggered. This
breaks guests that go from long mode to compatibility mode.

This is noticed on a solaris guest and fixes bug #1842160

Signed-off-by: Amit Shah [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e64e9f5..d83225e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -26,7 +26,7 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 static inline int is_long_mode(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
-   return vcpu-arch.shadow_efer  EFER_LME;
+   return vcpu-arch.shadow_efer  EFER_LMA;
 #else
return 0;
 #endif


I'm reverting that patch.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] The SMP RHEL 5.1 PAE guest can't boot up issue

2008-03-04 Thread Avi Kivity
Avi Kivity wrote:
 Dong, Eddie wrote:
 I don't know if the patch was still needed now, since it was posted
 long ago(I don't know which issue it solved). I'd like to post a
 revert patch if necessary.
   
 I believe the patch is still necessary, since we still need to
 guarantee that a vcpu's tsc is monotonous.  I think there are three
 issues to be addressed:

 1. The majority of intel machines don't need the offset adjustment
 since they already have a constant rate tsc that is synchronized on
 all cpus. I think this is indicated by X86_FEATURE_CONSTANT_TSC
 (though I'm not 100% certain if it means that the rate is the same
 for all cpus, Thomas can you clarify?)
 

 So why not make the TSC_OFFSET adjustment conditional?
   

 Yes, that's what I meant.  We just need to be sure that this is what 
 X86_FEATURE_CONSTANT_TSC means.

I changed tsc offset adjustment to only allow forward adjustment.  Since 
hosts with synced tsc never require positive adjustment, they should now 
have better quality tsc.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Javier Guerra wrote:
 On 3/4/08, Avi Kivity [EMAIL PROTECTED] wrote:
   
  apply to kvm guests.  With mmu notifiers, the trend will grow even stronger.
 

 could you (or anybody) elaborate on that? the mmu-related threads show
 lots of progress, but it's way (way) out of my league.

 AFAICT, it's about the infrastructure to later write drivers (virtio?)
 to DMA-heavy hardware (IB, RDMA, etc).  am i wrong?  or is it
 something more complete (like a ready to use driver)?

   

mmu notifiers provide a way for the core Linux memory management code to 
propagate changes in how Linux views a process' memory map to external 
memory management units that are also interested in that memory map.  
These changes include things like swapping, page migration, changes to 
memory protection, defragmentation, and copy-on-write.  In this context, 
kvm appears as a dma capable memory controller, like RDMA NICs or GPUs.

For kvm, this is important as it allows all those features to be used 
transparently with guests.

- swapping allows you to overcommit memory
- page migration allows optimization of memory placement within the host 
in response to changing workloads
- defragmentation will allow (if/when it is merged into Linux) more 
widespread use of large pages, which improve performance
- copy-on-write allows sharing identical pages of memory among guests, 
increasing guest density

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [GIT PULL] KVM fixes for 2.6.25-rc3

2008-03-04 Thread Avi Kivity
Linus, please pull the kvm fixes in the repo and branch

  git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git for-linus

comprising an ABI fix, a few host crash fixes, AMD specific fixes,
a Kbuild fix for the randconfig addicts, fallout from the scaling
work, and other miscellany.

Avi Kivity (5):
  KVM: Make the supported cpuid list a host property rather than a 
vm property
  KVM: Avoid infinite-frequency local apic timer
  KVM: Route irq 0 to vcpu 0 exclusively
  KVM: MMU: Fix race when instantiating a shadow pte
  KVM: VMX: Avoid rearranging switched guest msrs while they are loaded

Izik Eidus (1):
  KVM: remove the usage of the mmap_sem for the protection of the 
memory slots.

Joerg Roedel (4):
  KVM: SVM: Fix lazy FPU switching
  KVM: SVM: set NM intercept when enabling CR0.TS in the guest
  KVM: emulate access to MSR_IA32_MCG_CTL
  KVM: SVM: fix Windows XP 64 bit installation crash

Marcelo Tosatti (2):
  KVM: move alloc_apic_access_page() outside of non-preemptable region
  KVM: make MMU_DEBUG compile again

Paul Knowles (1):
  KVM: Fix kvm_arch_vcpu_ioctl_set_sregs so that set_cr0 works properly

Randy Dunlap (1):
  x86: disable KVM for Voyager and friends

 arch/x86/Kconfig   |2 +-
 arch/x86/kvm/lapic.c   |4 ++
 arch/x86/kvm/mmu.c |   38 ++-
 arch/x86/kvm/paging_tmpl.h |   20 +---
 arch/x86/kvm/svm.c |   26 ++
 arch/x86/kvm/vmx.c |   14 --
 arch/x86/kvm/x86.c |  114 
+---
 include/linux/kvm.h|4 +-
 include/linux/kvm_host.h   |1 +
 virt/kvm/ioapic.c  |8 +++
 virt/kvm/kvm_main.c|5 +-
 11 files changed, 156 insertions(+), 80 deletions(-)

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] loop in copy_user_generic_string

2008-03-04 Thread Avi Kivity
Zdenek Kabelac wrote:
 Hello


 I'm having weird problem and being a bit puzzled about where to look
 for this bug.

 I'm using T61 - C2D  2GB

 So I'll describe symptoms:

 When I run inside my 0.5G smp  qemu-kvm guest with Debian these two
 loops in parallel:

 'while : ; do dmsetup status  ; done'

 and

 'while : ; do cat /dev/zero /dev/mapper/any_free_to_use_lvm_partition ; done'

 after a while dmsetup start to loop in this place:

 [  356.257323]  [8117c017] ? copy_user_generic_string+0x17/0x40


 I'm using preemptible kernel and the code will stay in the
 copy_user_generic_string call forever eating 100%cpu - without
 preemption the kernel gets dead.

 With preemption when I run at this moment second dmsetup status in
 paralllel the busy-looped dmsetup gets finished and while loop starts
 to continue agains until next dmsetup busy-loop.

 I've noticed that if I change inside  drivers/md/dm-ioctl.c
 copy_params  the parameter tmp.data_size in the copy_from_user call to
 just page size (4kB) - or when I replace vmalloc to kmalloc - the busy
 loop will not happen.

 So it seems to be related to page jump somehow

 Anyway might have any idea - what is going on here ?
   

Most likely movs emulation is broken for long counts.  Please post a 
disassembly of copy_user_generic_string to make sure we're looking at 
the same code.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] loop in copy_user_generic_string

2008-03-04 Thread Avi Kivity
Zdenek Kabelac wrote:
 Is it emulated ? I've thought it's running natively with vmx?

   

In some cases (memory mapped I/O, writes to page tables) some 
instructions are emulated.  Usually they run natively.

Please post the output of 'kvm_stat -1' to ensure the problem is with 
the emulator.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

2008-03-04 Thread Avi Kivity
Peter Zijlstra wrote:
 On Tue, 2008-03-04 at 14:35 -0800, Christoph Lameter wrote:

   
 RCU means that the callbacks occur in an atomic context.
 

 Not really, if it requires moving the VM locks to sleepable locks under
 a .config option, I think its also fair to require PREEMPT_RCU.

 OTOH, if you want to unconditionally move the VM locks to sleepable
 locks you have a point.
   

Isn't that out of the question for .25?

I really wish we can get the atomic variant in now, and add on 
sleepability in .26, updating users if necessary.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:
 - swapping allows you to overcommit memory
 

 Normally swapping mechanism choose the Least Recently Used(LRU) pages
 of a process to be swapped out. When KVM uses MMU notifier in linux
 kernel to implement swapping for VM, could KVM choose LRU pages of a
 VM to swap out? If so, could you give a brief description about how
 this is implemented?
   

The Linux memory manager approximates LRU by scanning pages for the 
accessed bit, which is set in the pte by the processor when a page is 
accessed through that pte.  mmu notifiers provide a callback for the 
check, so that kvm can check the accessed bit on the shadow ptes.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/8] RFC: vcpu pinning at qemu start

2008-03-04 Thread Avi Kivity
Glauber Costa wrote:
 Hi guys,

 Here's a first series of patch aiming at vcpu pinning support in qemu.
 Ideally, as vcpu as just normal threads, the usual userspace tools can be used
 to set cpu affinities mask.

 However, It makes it very difficult to _start_ a vm with vcpus pinned, since
 we don't know the thread ids from qemu in advance, nor do we know when are the
 vcpus created.

 The patches introduce a -cpu-map option, that, if specified, starts the 
 virtual cpus
 with the specified affinities.

 Comments? Welcome. Random rants? Not welcome, but... how can I stop you? So 
 go ahead!

   

A monitor interface would be more useful than a command line option, as 
it allows you to migrate the vcpus at runtime, and also control 
hotplugged cpus.  For unmanaged use, taskset is probably sufficient to 
control affinity from the command line.

Normally I encourage splitting patches, but this is a bit extreme.  1 
and 3 are pointless without each other, 4 and 5, 7 and 8.  Hope that 
doesn't interfere with any pay-per-patch contract.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:
 Normally swapping mechanism choose the Least Recently Used(LRU) pages
 of a process to be swapped out. When KVM uses MMU notifier in linux
 kernel to implement swapping for VM, could KVM choose LRU pages of a
 VM to swap out? If so, could you give a brief description about how
 this is implemented?

   
 The Linux memory manager approximates LRU by scanning pages for the
 accessed bit, which is set in the pte by the processor when a page is
 accessed through that pte. mmu notifiers provide a callback for the
 check, so that kvm can check the accessed bit on the shadow ptes.
 

 If I understand correctly, when NPT is used by KVM in the future, this mmu
 notifier can't help much for swapping out pages used by VM, right?
   

No, NPT does not change things materially.  Shadow page tables are still 
used, though instead of mapping guest virtual addresses to host physical 
addresses, they now translate guest physical addresses to host physical 
addresses.  Swapping and all the other goodies still work.

 That is, when NPT is used, a balloon para-virt driver running on gust
 OS might be more efficient for swapping, am I right?
   

Ballooning is more efficient than swapping both with and without NPT.  
The problem with ballooning is that it requires guest cooperation.  The 
guest may not be able to balloon, or it may take a long time to balloon, 
while the host may need the memory immediately.  A rebooting guest also 
implicitly deflates its balloon, creating a large and unpredictable 
memory demand on the host.

A good solution needs to use ballooning with swapping as a fallback for 
guaranteeing that the system does not run out of memory.

A nice feature in 2.6.25 is the ability to select which guests will 
swap, via the memory controller feature (mlock() also works, but is 
relatively crude).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 23/23] QEMU/KVM: device hot-remove

2008-03-04 Thread Avi Kivity
Marcelo Tosatti wrote:
 Add monitor command to hot-remove devices.

 Remove device data on _EJ0 notification.

 Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

 Index: kvm-userspace.hotplug/qemu/monitor.c
 ===
 --- kvm-userspace.hotplug.orig/qemu/monitor.c
 +++ kvm-userspace.hotplug/qemu/monitor.c
 @@ -1355,6 +1355,7 @@ static term_cmd_t term_cmds[] = {
value, set maximum speed (in bytes) for migrations },
  { cpu_set, is, do_cpu_set_nr, cpu [online|offline], change cpu 
 state },
  { pci_add, ss, device_hot_add, nic|drive 
 [vlan=n][,macaddr=addr][,model=type] 
 [[file=file][,if=type][,bus=n][,unit=m][,media=d][index=i]], hotadd PCI 
 device },
 +{ pci_remove, i, device_hot_remove, slot number, hot remove PCI 
 device },
  { NULL, NULL, },
   

Should be pci_del for consistency with usb_del.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:
 Normally swapping mechanism choose the Least Recently Used(LRU) pages
 of a process to be swapped out. When KVM uses MMU notifier in linux
 kernel to implement swapping for VM, could KVM choose LRU pages of a
 VM to swap out? If so, could you give a brief description about how
 this is implemented?

   
 The Linux memory manager approximates LRU by scanning pages for the
 accessed bit, which is set in the pte by the processor when a page is
 accessed through that pte. mmu notifiers provide a callback for the
 check, so that kvm can check the accessed bit on the shadow ptes.
 

 Linux kernel maintains a reverse mapping from a page frame to all page tables
 pointing to this page frame. Does KVM need to maintain a similar reverse 
 mapping
 from a page frame to all shadow page tables pointing to this page frame?
   

Yes, look for 'rmap' in mmu.c.  The purpose was initially to be able to 
write-protect shadowed guest page tables without horrible worst-case 
performance, and was later extended to swapping.

With mmu notifiers, when the kernel swaps a page, it first scans its own 
rmap, then calls kvm which scans the kvm rmap.  So one way to look at 
mmu notifiers is as rmap extenders (that's not the whole story -- kvm 
ptes are in a different format than Linux ptes, so the code has to be 
different).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/6] KVM: In kernel pit model

2008-03-04 Thread Avi Kivity
Yang, Sheng wrote:
 +
 +static int pit_get_out(struct kvm *kvm, int channel)
 +{
 + struct kvm_kpit_channel_state *c =
 + kvm-arch.vpit-pit_state.channels[channel];
 + s64 d, t;
 + int out;
 +
 + ASSERT(mutex_is_locked(kvm-arch.vpit-pit_state.lock));
 +
 + t = ktime_to_ns(ktime_sub(ktime_get(), c-count_load_time));
 + d = muldiv64(t, PIT_FREQ, 1e9);
   

NSECS_PER_SEC to avoid people jumping on you saying you can't use 
floating point in the kernel (yes, the compiler converts it at 
compile-time, but they'll still say it).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] loop in copy_user_generic_string

2008-03-04 Thread Avi Kivity
Andi Kleen wrote:
 Avi Kivity [EMAIL PROTECTED] writes:
   
 Most likely movs emulation is broken for long counts.  Please post a 
 disassembly of copy_user_generic_string to make sure we're looking at 
 the same code.
 

 Be careful -- this code is patched at runtime and what you 
 see in the vmlinux is not necessarily the same that is executed

   

If the disassembled instruction isn't marked as an alternative in the 
source, then it can't be patched, right?


 Incidentially that might cause problems.

Specific to kvm?  how?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 23/23] QEMU/KVM: device hot-remove

2008-03-04 Thread Avi Kivity
Daniel P. Berrange wrote:
 (qemu) info block
 ide0-hd0: type=hd removable=0 file=/root/images/marcelo5.img ro=0 drv=raw
 ide1-cd0: type=cdrom removable=1 locked=0 [not inserted]
 floppy0: type=floppy removable=1 locked=0 [not inserted]
 sd0: type=floppy removable=1 locked=0 [not inserted]
 scsi0-hd0: type=hd removable=0 file=/tmp/bigfile ro=0 drv=raw
 scsi0-hd1: type=hd removable=0 file=/tmp/bigfile.2 ro=0 drv=raw

 (qemu) info network
 VLAN 0 devices:
   tap: ifname=tap0 setup_script=qemu-ifup-tap0
   rtl8139 pci macaddr=52:54:00:12:34:56
 

 This is utterly horrible for a human to parse  use if they're using the
 QEMU monitor, let alone something that libvirt could parse. In fact this
 doesn't let you map between the network device  pci device if there is
 more than one device added because 'info pci' doesn't show the MAC address
 info, and 'info network' does not show any PCI device number info - the
 same for disks.

   

We need a machine friendly protocol for libvirt and other management 
tools.  Versioned commands (with some backward compatibility), command 
discovery, and command/response tagging so you can associate an async 
reply to the command that triggered it, and quoting so that strings with 
spaces and other special chars are properly supported.  But how the 
information is presented is orthogonal to what information is presented.

btw, the qemu command line parses something fairly similar, I don't see 
why libvirt should have problems with it.  It wouldn't be fun to code, 
but is doable.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/8] RFC: vcpu pinning at qemu start

2008-03-04 Thread Avi Kivity
Anthony Liguori wrote:
 Glauber Costa wrote:
   
 Anthony Liguori wrote:

 No, it can't. Because at the time qemu starts, no vcpu - thread id 
 relationship exists at all. And we don't know when it will.
 

 Sure we do.  The vcpu - thread id relationship is valid after 
 kvm_init_ap() is called which is after machine init but before the 
 select loop is entered for the first time.  Therefore, if you start qemu 
 with -S, then connect on the monitor, and do an info cpus, you could be 
 guaranteed to be told the mapping.

 The threads are *idle* at this point so there's no harm if they were 
 started on the wrong CPU.  You can now taskset to your hearts content 
 and then when you're happy with placement, you can issue a 'cont' so 
 that the VM actually starts running.  I saw wrong because you can 
 still taskset the initial creation guaranteeing that the threads are 
 created on the right group of physical CPUs, you just can't specify the 
 exact mapping until you start interacting with the monitor.

   

Good points.  Initially I thought we ought to abstract the 
implementation and not expose the vcpu thread id, but I'm beginning to 
thing that due the wide variety of options (affinity, page migration, 
priority, cpu control groups) and the relative obscurity of the feature 
(which as you point out, isn't needed in the common case), we can export 
the thread id and let the management tools deal with it directly.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 14/23] QEMU/KVM: device hot-add

2008-03-04 Thread Avi Kivity
Marcelo Tosatti wrote:
 Add monitor command to hot-add PCI devices (nic and drive).

   

A drive is not a pci device.  One would hot-plug a scsi controller, and 
then hot-plug a device to that controller.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 23/23] QEMU/KVM: device hot-remove

2008-03-04 Thread Avi Kivity
Anthony Liguori wrote:
 Daniel P. Berrange wrote:
 Removing based on pci device number is very un-pleasant, since its 
 not something
 the user of the monitor cares about. Nor do they even know what the 
 PCI device number
 assigned by 'pci_add' is.

 As with addition, I'd like separate commands for NIC vs Drive, and 
 for the removal
 key to be based upon the same data used for addition. eg so one can 
 remove the
 NIC based on its MAC address, or remove the drive based on the 
 (if,bus,unit,filename)
 data items.

nic_remove [vlan=n][,macaddr=addr][,model=type]
drive_remove 
 [[file=file][,if=type][,bus=n][,unit=m][,media=d][index=i]]

 Though, perhaps still allow removal based on the PCI device ID as an 
 alternative
 for those who happen to have that data available.
   

 pci_remove is consistent with usb_del and things like stopcapture.  
 The thing to add would be an info pci that let a user associate the 
 slot number with higher level information about the device.


pci_add should return the slot information, which can later be used as 
an identifier for pci_remove. It would also be nice to be able to 
specify the slot in pci_add, though I hardly have a compelling use case 
for that.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 01/23] QEMU/KVM: add PCI IRQ routing information up to slot 32

2008-03-04 Thread Avi Kivity
Marcelo Tosatti wrote:
 Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

 Index: kvm-userspace.hotplug/bios/acpi-dsdt.dsl
 ===
 --- kvm-userspace.hotplug.orig/bios/acpi-dsdt.dsl
 +++ kvm-userspace.hotplug/bios/acpi-dsdt.dsl
 @@ -249,6 +249,162 @@ DefinitionBlock (
  Package() {0x0005, 1, LNKB, 0},
  Package() {0x0005, 2, LNKC, 0},
  Package() {0x0005, 3, LNKD, 0},
 +
 +// PCI Slot 6
 +Package() {0x0006, 0, LNKB, 0},
 +Package() {0x0006, 1, LNKC, 0},
 +Package() {0x0006, 2, LNKD, 0},
 +Package() {0x0006, 3, LNKA, 0},
   

This is already in kvm-userspace.git.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 0/8] RFC: vcpu pinning at qemu start

2008-03-04 Thread Avi Kivity
Anthony Liguori wrote:
 Glauber Costa wrote:
 My main interest is in management tools being able to specify pinning
 set ups at VM creation time.

 As I said, it can be done through tools like taskset, but then you'd 
 have to know:
  * when are the threads created
  * which thread ids corresponds to each cpu

 And of course, for an amount of time, the threads will be running in 
 a wrong cpu, which may affect workloads running there. (which is a 
 case cpu pinning usually tries to address)

 A management tool can start QEMU with -S to prevent any CPUs from 
 running, query the VCPU=thread id relationship (modifying info cpus 
 would be a good thing to do for this), taskset, and then run 'cont' in 
 the monitor if they desperately need this functionality.  However, I 
 don't think the vast majority of people need this particular 
 functionality.



Affinity control is probably useful mostly for numa configurations, 
where you want to restrict virtual cpus to run on the cores closest to 
memory.  However it may well be that the scheduler is already good 
enough to do this on its own.


 My feeling is that adding an interface to do this in QEMU encourages 
 people to not use the existing Linux tools for this or worse yet, to 
 think they can do a better job than Linux.  The whole reason this 
 exists in Xen is that Xen's schedulers were incapable of doing CPU 
 migration historically (which is no longer true since the credit 
 scheduler).  It was necessary to specify pinning upon creation or you 
 were stuck with round-robin placement.  So libvirt has APIs for this 
 because they were part of the Xen API because it was needed to get 
 reasonable performance at some point in time on Xen.  I don't think 
 this behavior is useful for KVM though.  Just because Xen does it 
 doesn't imply that we should do it.


In the brutal world of hypervisors, if your competitor has a feature, 
you must have it too.  I often get asked about cpu pinning in kvm.

[I'd like to see how Xen implements swapping, though]

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/23] [RFC] QEMU/KVM ACPI PCI hotplug

2008-03-04 Thread Avi Kivity
Marcelo Tosatti wrote:
 The following patchset adds ACPI PCI hotplug support for QEMU.

 It extends the number of slots with IRQ routing information from 6 to 32.

 The only PCI driver which the unregister method has been added is LSI SCSI, 
 would
 like more comments to implement that for the remaining drivers.

   

Very nice patchset, looks minimally intrusive for such complex 
functionality.

Please post the next iteration on qemu-devel to see if they have any 
objections.  Since this is a large patchset, I don't want to keep it 
churning for too long, so if you prefer, you can rip out drive hotplug 
and add it back later (see my comments to patch 14).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM architecture docs

2008-03-04 Thread Avi Kivity
Zhao Forrest wrote:
 when NPT is used by KVM in the future, this mmu
   

btw, NPT support is already integrated.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH][EXTBOOT] Fix read drive parameters to solve Grub Error 18

2008-03-04 Thread Avi Kivity
Anthony Liguori wrote:
 In certain circumstances, the calculated CHS can result in a total number of
 sectors that is less than the actual number of sectors.  I'm not entirely
 sure why this upsets grub, but it seems to be the source of the Grub Error 18
 that sometimes occurs when using extboot.

 The solution is to implement the read drive parameters function and return the
 actual numbers of sectors.  This requires changing the QEMU = extboot
 interface as this was not previously passed to extboot.

   

Applied, thanks.  Please separate qemu and extboot patches in the future.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] ncurses support

2008-03-03 Thread Avi Kivity
Aurelien Jarno wrote:
 Hi,

 ncurses support has been added recently to the QEMU CVS. Would it be
 possible to update KVM from the latest QEMU CVS to add ncurses support
 to KVM?
   

I've merged qemu-cvs, will push once it passes regression tests.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Avi Kivity
Jack Steiner wrote:
 The range invalidates have a performance advantage for the GRU. TLB 
 invalidates
 on the GRU are relatively slow (usec) and interfere somewhat with the 
 performance
 of other active GRU instructions. Invalidating a large chunk of addresses with
 a single GRU TLBINVAL operation is must faster than issuing a stream of single
 page TLBINVALs.

 I expect this performance advantage will also apply to other users of mmuops.
   

In theory this would apply to kvm as well (coalesce tlb flush IPIs, 
lookup shadow page table once), but is it really a fast path?  What 
triggers range operations for your use cases?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 1/8] KVM: add basic paravirt support

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Add basic KVM paravirt support. Avoid vm-exits on IO delays.

v1-v2:
- replace KVM_CAP_CLOCKSOURCE with KVM_CAP_PARA_FEATURES
- cover FEATURE_CLOCKSOURCE

v2-v3:
- switch to one ioctl per paravirt feature

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |1 +
 include/asm-x86/kvm_para.h |3 ++-
 include/linux/kvm.h|1 +
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f09840..cafed91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -805,6 +805,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_SET_TSS_ADDR:
case KVM_CAP_EXT_CPUID:
case KVM_CAP_CLOCKSOURCE:
+   case KVM_CAP_NOP_IO_DELAY:
r = 1;
break;
case KVM_CAP_VAPIC:
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index 5ab7d3d..ed5df3a 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,7 +10,8 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES 0x4001
-#define KVM_FEATURE_CLOCKSOURCE 0
+#define KVM_FEATURE_CLOCKSOURCE0
+#define KVM_FEATURE_NOP_IO_DELAY   1
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index e92e703..ea7907d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -236,6 +236,7 @@ struct kvm_vapic_addr {
 #define KVM_CAP_CLOCKSOURCE 8
 #define KVM_CAP_NR_VCPUS 9   /* returns max vcpus per vm */
 #define KVM_CAP_NR_MEMSLOTS 10   /* returns max memory slots per vm */
+#define KVM_CAP_NOP_IO_DELAY 11
 
 /*
  * ioctls for VM fds
-- 
1.5.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 3/8] KVM: Provide unlocked version of emulator_write_phys()

2008-03-02 Thread Avi Kivity
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |   21 ++---
 include/asm-x86/kvm_host.h |3 +++
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cafed91..a1a6f0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1771,22 +1771,29 @@ mmio:
return X86EMUL_UNHANDLEABLE;
 }
 
-static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const void *val, int bytes)
+int __emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
+ const void *val, int bytes)
 {
int ret;
 
-   down_read(vcpu-kvm-slots_lock);
ret = kvm_write_guest(vcpu-kvm, gpa, val, bytes);
-   if (ret  0) {
-   up_read(vcpu-kvm-slots_lock);
+   if (ret  0)
return 0;
-   }
kvm_mmu_pte_write(vcpu, gpa, val, bytes);
-   up_read(vcpu-kvm-slots_lock);
return 1;
 }
 
+static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
+   const void *val, int bytes)
+{
+   int ret;
+
+   down_read(vcpu-kvm-slots_lock);
+   ret =__emulator_write_phys(vcpu, gpa, val, bytes);
+   up_read(vcpu-kvm-slots_lock);
+   return ret;
+}
+
 static int emulator_write_emulated_onepage(unsigned long addr,
   const void *val,
   unsigned int bytes,
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 024b57c..0639010 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -430,6 +430,9 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
kvm_nr_mmu_pages);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 
+int __emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
+ const void *val, int bytes);
+
 enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
-- 
1.5.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 2/8] x86: KVM guest: add basic paravirt support

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Add basic KVM paravirt support. Avoid vm-exits on IO delays.

v1-v2:
- replace KVM_CAP_CLOCKSOURCE with KVM_CAP_PARA_FEATURES
- cover FEATURE_CLOCKSOURCE

v2-v3:
- switch to one ioctl per paravirt feature

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/Kconfig   |8 ++
 arch/x86/kernel/Makefile   |1 +
 arch/x86/kernel/kvm.c  |   52 
 arch/x86/kernel/setup_32.c |1 +
 arch/x86/kernel/setup_64.c |2 +
 include/linux/kvm_para.h   |6 +
 6 files changed, 70 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvm.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b9b92e5..b8642be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -382,6 +382,14 @@ config KVM_CLOCK
  provides the guest with timing infrastructure such as time of day, and
  system time
 
+config KVM_GUEST
+   bool KVM Guest support
+   select PARAVIRT
+   depends on !(X86_VISWS || X86_VOYAGER)
+   help
+This option enables various optimizations for running under the KVM
+hypervisor.
+
 source arch/x86/lguest/Kconfig
 
 config PARAVIRT
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a3379a3..1cc9d42 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -77,6 +77,7 @@ obj-$(CONFIG_DEBUG_RODATA_TEST)   += test_rodata.o
 obj-$(CONFIG_DEBUG_NX_TEST)+= test_nx.o
 
 obj-$(CONFIG_VMI)  += vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_GUEST)+= kvm.o
 obj-$(CONFIG_KVM_CLOCK)+= kvmclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
new file mode 100644
index 000..a8e36da
--- /dev/null
+++ b/arch/x86/kernel/kvm.c
@@ -0,0 +1,52 @@
+/*
+ * KVM paravirt_ops implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2007, Red Hat, Inc., Ingo Molnar [EMAIL PROTECTED]
+ * Copyright IBM Corporation, 2007
+ *   Authors: Anthony Liguori [EMAIL PROTECTED]
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/kvm_para.h
+#include linux/cpu.h
+#include linux/mm.h
+
+/*
+ * No need for any IO delay on KVM
+ */
+static void kvm_io_delay(void)
+{
+}
+
+static void paravirt_ops_setup(void)
+{
+   pv_info.name = KVM;
+   pv_info.paravirt_enabled = 1;
+
+   if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
+   pv_cpu_ops.io_delay = kvm_io_delay;
+
+}
+
+void __init kvm_guest_init(void)
+{
+   if (!kvm_para_available())
+   return;
+
+   paravirt_ops_setup();
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index 1362cb2..20fdda1 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -783,6 +783,7 @@ void __init setup_arch(char **cmdline_p)
 */
vmi_init();
 #endif
+   kvm_guest_init();
 
/*
 * NOTE: before this point _nobody_ is allowed to allocate
diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index 32cdad0..293a533 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -452,6 +452,8 @@ void __init setup_arch(char **cmdline_p)
init_apic_mappings();
ioapic_init_mappings();
 
+   kvm_guest_init();
+
/*
 * We trust e820 completely. No explicit ROM probing in memory.
 */
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 5497aac..9c462c9 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -20,6 +20,12 @@
 #include asm/kvm_para.h
 
 #ifdef __KERNEL__
+#ifdef CONFIG_KVM_GUEST
+void __init kvm_guest_init(void);
+#else
+#define kvm_guest_init() do { } while (0)
+#endif
+
 static inline int kvm_para_has_feature(unsigned int feature)
 {
if (kvm_arch_para_features()  (1UL  feature))
-- 
1.5.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list

[kvm-devel] [PATCH 8/8] x86: KVM guest: VMX cr3 cache support

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Add support for the cr3 cache feature on Intel VMX CPU's. This avoids
vmexits on context switch if the cr3 value is cached in one of the
entries (currently 4 are present).

This is especially important for Xenner, where each guest syscall
involves a cr3 switch.

v1-v2:
- handle the race which happens when the guest has the cache cleared
in the middle of kvm_write_cr3 by injecting a GP and trapping it to
fallback to hypercall variant (suggested by Avi).

v2-v3:
- one ioctl per paravirt feature

v3-v4:
- switch to mmu_op

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kernel/kvm.c |  145 -
 1 files changed, 144 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8405984..30e3568 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -26,14 +26,17 @@
 #include linux/cpu.h
 #include linux/mm.h
 #include linux/hardirq.h
+#include asm/tlbflush.h
+#include asm/asm.h
 
 #define MMU_QUEUE_SIZE 1024
 
 struct kvm_para_state {
+   struct kvm_cr3_cache cr3_cache;
u8 mmu_queue[MMU_QUEUE_SIZE];
int mmu_queue_len;
enum paravirt_lazy_mode mode;
-};
+} __attribute__ ((aligned(PAGE_SIZE)));
 
 static DEFINE_PER_CPU(struct kvm_para_state, para_state);
 
@@ -85,6 +88,121 @@ static void kvm_deferred_mmu_op(void *buffer, int len)
state-mmu_queue_len += len;
 }
 
+static void kvm_new_cr3(unsigned long cr3)
+{
+   struct kvm_mmu_op_set_cr3 scr3 = {
+   .header.op = KVM_MMU_OP_SET_CR3,
+   .cr3 = cr3,
+   };
+
+   kvm_mmu_op(scr3, sizeof scr3);
+}
+
+static unsigned long __force_order;
+
+/*
+ * Special, register-to-cr3 instruction based hypercall API
+ * variant to the KVM host. This utilizes the cr3 filter capability
+ * of the hardware - if this works out then no VM exit happens,
+ * if a VM exit happens then KVM will get the virtual address too.
+ */
+static void kvm_write_cr3(unsigned long guest_cr3)
+{
+   struct kvm_para_state *para_state = get_cpu_var(para_state);
+   struct kvm_cr3_cache *cache = para_state-cr3_cache;
+   int idx;
+
+   /*
+* Check the cache (maintained by the host) for a matching
+* guest_cr3 = host_cr3 mapping. Use it if found:
+*/
+   for (idx = 0; idx  cache-max_idx; idx++) {
+   if (cache-entry[idx].guest_cr3 == guest_cr3) {
+   unsigned long trap;
+
+   /*
+* Cache-hit: we load the cached host-CR3 value.
+* Fallback to hypercall variant if it raced with
+* the host clearing the cache after guest_cr3
+* comparison.
+*/
+   __asm__ __volatile__ (
+   mov %2, %0\n
+   0:  mov %3, %%cr3\n
+   1:\n
+   .section .fixup,\ax\\n
+   2:  mov %1, %0\n
+   jmp 1b\n
+   .previous\n
+   _ASM_EXTABLE(0b, 2b)
+   : =r (trap)
+   : n (1UL), n (0UL),
+ b (cache-entry[idx].host_cr3),
+ m (__force_order));
+   if (!trap)
+   goto out;
+   break;
+   }
+   }
+
+   /*
+* Cache-miss. Tell the host the new cr3 via hypercall (to avoid
+* aliasing problems with a cached host_cr3 == guest_cr3).
+*/
+   kvm_new_cr3(guest_cr3);
+out:
+   put_cpu_var(para_state);
+}
+
+/*
+ * Avoid the VM exit upon cr3 load by using the cached
+ * -active_mm-pgd value:
+ */
+static void kvm_flush_tlb_user(void)
+{
+   kvm_write_cr3(__pa(current-active_mm-pgd));
+}
+
+/*
+ * Disable global pages, do a flush, then enable global pages:
+ */
+static void kvm_flush_tlb_kernel(void)
+{
+   unsigned long orig_cr4 = read_cr4();
+
+   write_cr4(orig_cr4  ~X86_CR4_PGE);
+   kvm_flush_tlb_user();
+   write_cr4(orig_cr4);
+}
+
+static void register_cr3_cache(void *cache)
+{
+   struct kvm_para_state *state;
+
+   state = per_cpu(para_state, raw_smp_processor_id());
+   wrmsrl(KVM_MSR_SET_CR3_CACHE, __pa(state-cr3_cache));
+}
+
+static unsigned __init kvm_patch(u8 type, u16 clobbers, void *ibuf,
+unsigned long addr, unsigned len)
+{
+   switch (type) {
+   case PARAVIRT_PATCH(pv_mmu_ops.write_cr3):
+   return paravirt_patch_default(type, clobbers, ibuf, addr, len);
+   default:
+   return native_patch(type, clobbers, ibuf, addr, len);
+   }
+}
+
+static void __init setup_guest_cr3_cache

Re: [kvm-devel] [PATCH][QEMU] Change -hugetlb-path to -mem-path

2008-03-02 Thread Avi Kivity
Anthony Liguori wrote:
 This patch changes -hugetlb-path to -mem-path and also updates the code so 
 that
 it works for hugetlbfs or tmpfs.

   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] I/O bandwidth control on KVM

2008-03-02 Thread Avi Kivity
Anthony Liguori wrote:
 Hi Ryo,

 Ryo Tsuruta wrote:
   
 Hello all,

 I've implemented a block device which throttles block I/O bandwidth, 
 which I called dm-ioband, and been trying to throttle I/O bandwidth on
 KVM environment. But unfortunately it doesn't work well, the number of
 issued I/Os is not according to the bandwidth setting.
 On the other hand, I got the good result when accessing directly to
 the local disk on the local machine.

 I'm not so familiar with KVM. Could anyone give me any advice?
 

 If you are using virtio drivers in the guest (which I presume you are 
 given the reference to /dev/vda), try using the following -drive syntax:

 -drive file=/dev/mapper/ioband1,if=virtio,boot=on,cache=off

 This will force the use of O_DIRECT.  By default, QEMU does not open 
 with O_DIRECT so you'll see page cache effects.

   

Good point.  But IIRC cache=off is not limited to virtio?


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 5/8] x86: KVM guest: hypercall based pte updates and TLB flushes

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

v1-v2:
- guest passes physical destination addr, which is cheaper than doing v-p
translation in the host.
- infer size of pte from guest mode

v2-v3:
- switch to one ioctl per paravirt feature
- move hypercall handling to mmu.c

v3-v4:
- guest/host split
- fix 32-bit truncation issues
- adjust to mmu_op

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kernel/kvm.c |  120 +
 arch/x86/kvm/x86.c|2 +-
 include/linux/kvm.h   |2 +-
 3 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a8e36da..e28d818 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,108 @@ static void kvm_io_delay(void)
 {
 }
 
+static void kvm_mmu_op(void *buffer, unsigned len)
+{
+   int r;
+   unsigned long a1, a2;
+
+   do {
+   a1 = __pa(buffer);
+   a2 = 0;   /* on i386 __pa() always returns 4G */
+   r = kvm_hypercall3(KVM_HC_MMU_OP, len, a1, a2);
+   buffer += r;
+   len -= r;
+   } while (len);
+}
+
+static void kvm_mmu_write(void *dest, u64 val)
+{
+   struct kvm_mmu_op_write_pte wpte = {
+   .header.op = KVM_MMU_OP_WRITE_PTE,
+   .pte_phys = (unsigned long)__pa(dest),
+   .pte_val = val,
+   };
+
+   kvm_mmu_op(wpte, sizeof wpte);
+}
+
+/*
+ * We only need to hook operations that are MMU writes.  We hook these so that
+ * we can use lazy MMU mode to batch these operations.  We could probably
+ * improve the performance of the host code if we used some of the information
+ * here to simplify processing of batched writes.
+ */
+static void kvm_set_pte(pte_t *ptep, pte_t pte)
+{
+   kvm_mmu_write(ptep, pte_val(pte));
+}
+
+static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+   kvm_mmu_write(ptep, pte_val(pte));
+}
+
+static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
+{
+   kvm_mmu_write(pmdp, pmd_val(pmd));
+}
+
+#if PAGETABLE_LEVELS = 3
+#ifdef CONFIG_X86_PAE
+static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
+{
+   kvm_mmu_write(ptep, pte_val(pte));
+}
+
+static void kvm_set_pte_present(struct mm_struct *mm, unsigned long addr,
+   pte_t *ptep, pte_t pte)
+{
+   kvm_mmu_write(ptep, pte_val(pte));
+}
+
+static void kvm_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+   kvm_mmu_write(ptep, 0);
+}
+
+static void kvm_pmd_clear(pmd_t *pmdp)
+{
+   kvm_mmu_write(pmdp, 0);
+}
+#endif
+
+static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+   kvm_mmu_write(pgdp, pgd_val(pgd));
+}
+
+static void kvm_set_pud(pud_t *pudp, pud_t pud)
+{
+   kvm_mmu_write(pudp, pud_val(pud));
+}
+#endif /* PAGETABLE_LEVELS = 3 */
+
+static void kvm_flush_tlb(void)
+{
+   struct kvm_mmu_op_flush_tlb ftlb = {
+   .header.op = KVM_MMU_OP_FLUSH_TLB,
+   };
+
+   kvm_mmu_op(ftlb, sizeof ftlb);
+}
+
+static void kvm_release_pt(u32 pfn)
+{
+   struct kvm_mmu_op_release_pt rpt = {
+   .header.op = KVM_MMU_OP_RELEASE_PT,
+   .pt_phys = (u64)pfn  PAGE_SHIFT,
+   };
+
+   kvm_mmu_op(rpt, sizeof rpt);
+}
+
 static void paravirt_ops_setup(void)
 {
pv_info.name = KVM;
@@ -41,6 +143,24 @@ static void paravirt_ops_setup(void)
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
pv_cpu_ops.io_delay = kvm_io_delay;
 
+   if (kvm_para_has_feature(KVM_FEATURE_MMU_OP)) {
+   pv_mmu_ops.set_pte = kvm_set_pte;
+   pv_mmu_ops.set_pte_at = kvm_set_pte_at;
+   pv_mmu_ops.set_pmd = kvm_set_pmd;
+#if PAGETABLE_LEVELS = 3
+#ifdef CONFIG_X86_PAE
+   pv_mmu_ops.set_pte_atomic = kvm_set_pte_atomic;
+   pv_mmu_ops.set_pte_present = kvm_set_pte_present;
+   pv_mmu_ops.pte_clear = kvm_pte_clear;
+   pv_mmu_ops.pmd_clear = kvm_pmd_clear;
+#endif
+   pv_mmu_ops.set_pud = kvm_set_pud;
+   pv_mmu_ops.set_pgd = kvm_set_pgd;
+#endif
+   pv_mmu_ops.flush_tlb_user = kvm_flush_tlb;
+   pv_mmu_ops.release_pt = kvm_release_pt;
+   pv_mmu_ops.release_pd = kvm_release_pt;
+   }
 }
 
 void __init kvm_guest_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29f4f5d..92a51d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -817,7 +817,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_NR_MEMSLOTS:
r = KVM_MEMORY_SLOTS;
break;
-   case KVM_CAP_MMU_WRITE:
+   case

[kvm-devel] [PATCH 4/8] KVM: MMU: hypercall based pte updates and TLB flushes

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

v1-v2:
- guest passes physical destination addr, which is cheaper than doing v-p
translation in the host.
- infer size of pte from guest mode

v2-v3:
- switch to one ioctl per paravirt feature
- move hypercall handling to mmu.c

v3-v4:
- one mmu_op hypercall instead of one per op
- allow 64-bit gpa on hypercall
- don't pass host errors (-ENOMEM) to guest

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kvm/mmu.c |  135 +++-
 arch/x86/kvm/x86.c |   18 ++-
 include/asm-x86/kvm_host.h |4 +
 include/asm-x86/kvm_para.h |   29 +
 include/linux/kvm.h|1 +
 include/linux/kvm_para.h   |5 +-
 6 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4583329..14de7dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -40,7 +40,7 @@
  * 2. while doing 1. it walks guest-physical to host-physical
  * If the hardware supports that we don't need to do shadow paging.
  */
-static bool tdp_enabled = false;
+bool tdp_enabled = false;
 
 #undef MMU_DEBUG
 
@@ -167,6 +167,13 @@ static int dbg = 1;
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+struct kvm_pv_mmu_op_buffer {
+   void *ptr;
+   unsigned len;
+   unsigned processed;
+   char buf[512] __aligned(sizeof(long));
+};
+
 struct kvm_rmap_desc {
u64 *shadow_ptes[RMAP_EXT];
struct kvm_rmap_desc *more;
@@ -1995,6 +2002,132 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm 
*kvm)
return nr_mmu_pages;
 }
 
+static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
+   unsigned len)
+{
+   if (len  buffer-len)
+   return NULL;
+   return buffer-ptr;
+}
+
+static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
+   unsigned len)
+{
+   void *ret;
+
+   ret = pv_mmu_peek_buffer(buffer, len);
+   if (!ret)
+   return ret;
+   buffer-ptr += len;
+   buffer-len -= len;
+   buffer-processed += len;
+   return ret;
+}
+
+static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
+gpa_t addr, gpa_t value)
+{
+   int bytes = 8;
+   int r;
+
+   if (!is_long_mode(vcpu)  !is_pae(vcpu))
+   bytes = 4;
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   if (!__emulator_write_phys(vcpu, addr, value, bytes))
+   return -EFAULT;
+
+   return 1;
+}
+
+static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
+{
+   kvm_x86_ops-tlb_flush(vcpu);
+   return 1;
+}
+
+static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
+{
+   spin_lock(vcpu-kvm-mmu_lock);
+   mmu_unshadow(vcpu-kvm, addr  PAGE_SHIFT);
+   spin_unlock(vcpu-kvm-mmu_lock);
+   return 1;
+}
+
+static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu,
+struct kvm_pv_mmu_op_buffer *buffer)
+{
+   struct kvm_mmu_op_header *header;
+
+   header = pv_mmu_peek_buffer(buffer, sizeof *header);
+   if (!header)
+   return 0;
+   switch (header-op) {
+   case KVM_MMU_OP_WRITE_PTE: {
+   struct kvm_mmu_op_write_pte *wpte;
+
+   wpte = pv_mmu_read_buffer(buffer, sizeof *wpte);
+   if (!wpte)
+   return 0;
+   return kvm_pv_mmu_write(vcpu, wpte-pte_phys,
+   wpte-pte_val);
+   }
+   case KVM_MMU_OP_FLUSH_TLB: {
+   struct kvm_mmu_op_flush_tlb *ftlb;
+
+   ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb);
+   if (!ftlb)
+   return 0;
+   return kvm_pv_mmu_flush_tlb(vcpu);
+   }
+   case KVM_MMU_OP_RELEASE_PT: {
+   struct kvm_mmu_op_release_pt *rpt;
+
+   rpt = pv_mmu_read_buffer(buffer, sizeof *rpt);
+   if (!rpt)
+   return 0;
+   return kvm_pv_mmu_release_pt(vcpu, rpt-pt_phys);
+   }
+   default: return 0;
+   }
+}
+
+int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
+ gpa_t addr, unsigned long *ret)
+{
+   int r;
+   struct kvm_pv_mmu_op_buffer buffer;
+
+   down_read(current-mm-mmap_sem);
+   down_read(vcpu-kvm-slots_lock);
+
+   buffer.ptr = buffer.buf;
+   buffer.len = min(bytes, sizeof buffer.buf);
+   buffer.processed = 0;
+
+   r = kvm_read_guest(vcpu-kvm, addr, buffer.buf, buffer.len);
+   if (r)
+   goto out;
+
+   while (buffer.len

[kvm-devel] [PATCH 0/8] KVM paravirtualized mmu support, v4

2008-03-02 Thread Avi Kivity
This patchset is based on Marcelo's v3.  The biggest change is
dropping the generic hypercall multicall in favor of an mmu
specific hypercall.  This brings the following benefits:

- no need to check whether the various hypercalls are compatible
  (for example, you wouldn't want to allow a multicall within
  a multicall).

- allow a denser variable length encoding

- no need to return an error code, since mmu op failures are not
  recoverable anyway

- restartable call (this could have been implemented with
  multicalls as well)

- I like it better

Other changes:

- split guest changes from host changes

- some 32-bit truncation fixes

- don't pass host errors to guest; instead return to userspace

Still pending:

- retest cr3 cache

- fix CONFIG_HIGHPTE, where __pa() isn't sufficient to determine
  the pte physical address

   - can either add an op with a virt address or translate in the
 guest
   - translation in the guest seems preferable

- test on i386, pae, etc.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 6/8] x86: KVM guest: hypercall batching

2008-03-02 Thread Avi Kivity
From: Marcelo Tosatti [EMAIL PROTECTED]

Batch pte updates and tlb flushes in lazy MMU mode.

v1-v2:
- report individual hypercall error code, have multicall return number of
processed entries.
- cover entire multicall duration with slots_lock instead of
acquiring/reacquiring.

v2-v3:
- change to one ioctl per paravirt feature

v3-v4:
- adjust to mmu_op
- helper for getting para_state without debug warnings

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]
---
 arch/x86/kernel/kvm.c |   62 +++-
 1 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e28d818..8405984 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -25,6 +25,22 @@
 #include linux/kvm_para.h
 #include linux/cpu.h
 #include linux/mm.h
+#include linux/hardirq.h
+
+#define MMU_QUEUE_SIZE 1024
+
+struct kvm_para_state {
+   u8 mmu_queue[MMU_QUEUE_SIZE];
+   int mmu_queue_len;
+   enum paravirt_lazy_mode mode;
+};
+
+static DEFINE_PER_CPU(struct kvm_para_state, para_state);
+
+static struct kvm_para_state *kvm_para_state(void)
+{
+   return per_cpu(para_state, raw_smp_processor_id());
+}
 
 /*
  * No need for any IO delay on KVM
@@ -47,6 +63,28 @@ static void kvm_mmu_op(void *buffer, unsigned len)
} while (len);
 }
 
+static void mmu_queue_flush(struct kvm_para_state *state)
+{
+   if (state-mmu_queue_len) {
+   kvm_mmu_op(state-mmu_queue, state-mmu_queue_len);
+   state-mmu_queue_len = 0;
+   }
+}
+
+static void kvm_deferred_mmu_op(void *buffer, int len)
+{
+   struct kvm_para_state *state = kvm_para_state();
+
+   if (state-mode != PARAVIRT_LAZY_MMU) {
+   kvm_mmu_op(buffer, len);
+   return;
+   }
+   if (state-mmu_queue_len + len  sizeof state-mmu_queue)
+   mmu_queue_flush(state);
+   memcpy(state-mmu_queue + state-mmu_queue_len, buffer, len);
+   state-mmu_queue_len += len;
+}
+
 static void kvm_mmu_write(void *dest, u64 val)
 {
struct kvm_mmu_op_write_pte wpte = {
@@ -55,7 +93,7 @@ static void kvm_mmu_write(void *dest, u64 val)
.pte_val = val,
};
 
-   kvm_mmu_op(wpte, sizeof wpte);
+   kvm_deferred_mmu_op(wpte, sizeof wpte);
 }
 
 /*
@@ -122,7 +160,7 @@ static void kvm_flush_tlb(void)
.header.op = KVM_MMU_OP_FLUSH_TLB,
};
 
-   kvm_mmu_op(ftlb, sizeof ftlb);
+   kvm_deferred_mmu_op(ftlb, sizeof ftlb);
 }
 
 static void kvm_release_pt(u32 pfn)
@@ -135,6 +173,23 @@ static void kvm_release_pt(u32 pfn)
kvm_mmu_op(rpt, sizeof rpt);
 }
 
+static void kvm_enter_lazy_mmu(void)
+{
+   struct kvm_para_state *state = kvm_para_state();
+
+   paravirt_enter_lazy_mmu();
+   state-mode = paravirt_get_lazy_mode();
+}
+
+static void kvm_leave_lazy_mmu(void)
+{
+   struct kvm_para_state *state = kvm_para_state();
+
+   mmu_queue_flush(state);
+   paravirt_leave_lazy(paravirt_get_lazy_mode());
+   state-mode = paravirt_get_lazy_mode();
+}
+
 static void paravirt_ops_setup(void)
 {
pv_info.name = KVM;
@@ -160,6 +215,9 @@ static void paravirt_ops_setup(void)
pv_mmu_ops.flush_tlb_user = kvm_flush_tlb;
pv_mmu_ops.release_pt = kvm_release_pt;
pv_mmu_ops.release_pd = kvm_release_pt;
+
+   pv_mmu_ops.lazy_mode.enter = kvm_enter_lazy_mmu;
+   pv_mmu_ops.lazy_mode.leave = kvm_leave_lazy_mmu;
}
 }
 
-- 
1.5.4.2


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] catch vmentry failure (was enable gfxboot on VMX)

2008-03-02 Thread Avi Kivity
Guillaume Thouvenin wrote:
 On Mon, 18 Feb 2008 10:39:31 +0100
 Alexander Graf [EMAIL PROTECTED] wrote:


   
 So if you want to see a VMentry failure, just remove the SS patching
 and you'll see one. My guess would be that you see a lot of problems
 with otherwise working code too then, though, as SS can be anything in
 that state.
   

 So I made some tests and you were right, removing the SS patching
 showed VM entry failure but it also generated lots of problems. Thus I
 tried to modify a little bit the code and with the following patch (see
 the end of the email) I can detect VM Entry failures without generating
 other problems. It works when you use a distribution that is
 big-real-mode free. I pasted the patch just to show the idea. 

 It's interesting because we can continue to use the virtual mode for the
 majority of distribution and we can detect when a VM entry failure is
 detected it means that we need to switch from virtual mode to full real
 mode emulation. Such failure is caught in handle_vmentry_failure() when
 patch applied. If it's doable, the next step is the modification of the
 SS segment selector to succeed the vm-entry and the switch from virtual
 mode to a real mode emulation that could be done in
 handle_vmentry_failure(). Does it make sense?

   

Yes.  An alternative (useful if a failed vmentry corrupts the guest 
state) is to check all register state when switching modes.

 -
 + fix_rmode_seg(VCPU_SREG_CS, vcpu-arch.rmode.cs);
   fix_rmode_seg(VCPU_SREG_ES, vcpu-arch.rmode.es);
   fix_rmode_seg(VCPU_SREG_DS, vcpu-arch.rmode.ds);
   fix_rmode_seg(VCPU_SREG_GS, vcpu-arch.rmode.gs);
   fix_rmode_seg(VCPU_SREG_FS, vcpu-arch.rmode.fs);
 + fix_rmode_seg(VCPU_SREG_SS, vcpu-arch.rmode.ss);
   

Ideally you wouldn't call fix_rmode_seg() at all.  The guest will 
emulate until such time as the segments are valid for v8086, for example 
when the guest reloads them itself.

 + switch (basic_exit_reason) {
 + case EXIT_REASON_INVALID_GUEST_STATE:
 + printk(caused by invalid guest state (%ld).\n, 
 exit_qualification);
 + /* At this point we need to modify SS selector to pass 
 vmentry test.
 +  * This modification prevent the usage of virtual mode 
 to emulate real 
 +  * mode so we need to pass in big real mode emulation
 +  * with somehting like:
 +  * vcpu-arch.rmode.emulate = 1
   

Note you might need to emulate in protected mode as well, for a small 
part of the switch, for similar reasons.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ppc-devel] Top level kvm-userspace directory getting crowded ... need new dir for qemu dependencies

2008-03-02 Thread Avi Kivity
Jerone Young wrote:
 So I forgot to CC all the interested parties on this list (sorry about
 that I wasn't thinking at the time), but I did start up a conversation
 on linuxppc-dev on the subject of splitting out libfdt from dtc. Mainly
 to get the thought of what the dtc folks thought about splitting out
 libfdt.

 The outcome of this discussion is the point of libfdt is to be
 integrated into different projects. I could not make a good argument at
 all as to why it should be split out (actually I did a terrible job at
 it :-)). A good analogy was made also as this is equivalent to
 splitting libcrypto out of openssl.

 So the concessious from others in the libfdt community is the it should
 go in the project. This would be in line with what Hollis has been
 saying on the list.

 Now for us we can do one of the following options:
 1) Integrate libfdt into our kvm-userspace
or qemu (which would then require going upstream qemu folks also agree).

 2) Can use wget or something to first grab the dtc source and get libfdt
 from it. Then place in our make file and build it. As well as point
 cflags  ldflags to it. (This can be done, though I wanted to avoid
 going this route)

   

We definitely won't make the build so complicated as to depend on wget 
and Internet connectivity, so we'll just plant the tree in 
kvm-userspace.git.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


<    2   3   4   5   6   7   8   9   10   11   >