Re: Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-30 Thread Avi Kivity

Xu, Jiajun wrote:

Hi All,

This is our Weekly KVM Testing Report against lastest kvm.git
0c77713470debc666a07dc40080d728272bb58b9 and kvm-userspace.git
1223a029b36b0d9e73af76bcc274bb770f814886.

One New Issue:

1. perfctr wrmsr warning when booting 64bit RHEl5.3
https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831
  


This is the architectural performance counting msr which was enabled in 
4f76231 (KVM: x86: Ignore reads to EVNTSEL MSRs).  Amit, can you check 
if appropriate cpuid leaf 10 reporting will fix this?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-autotest] Fix command line for obtaining version number

2009-03-30 Thread Uri Lublin

Avi Kivity wrote:

Plain 'qemu' now runs an empty VM; a -help is needed to get the help message.

Signed-off-by: Avi Kivity a...@redhat.com
---
 client/tests/kvm_runtest_2/kvm_preprocessing.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm_runtest_2/kvm_preprocessing.py 
b/client/tests/kvm_runtest_2/kvm_preprocessing.py
index 5cd6e10..c9eb35d 100644
--- a/client/tests/kvm_runtest_2/kvm_preprocessing.py
+++ b/client/tests/kvm_runtest_2/kvm_preprocessing.py
@@ -214,7 +214,7 @@ def preprocess(test, params, env):
 # Get the KVM userspace version and write it as a keyval
 kvm_log.debug(Fetching KVM userspace version...)
 qemu_path = os.path.join(test.bindir, qemu)
-version_line = commands.getoutput(%s | head -n 1 % qemu_path)
+version_line = commands.getoutput(%s -help | head -n 1 % qemu_path)
 exp = re.compile([Vv]ersion .*?,)
 match = exp.search(version_line)
 if match:


Applied, Thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Rewrite twisted maze of if() statements with more straightforward switch()

2009-03-30 Thread Avi Kivity

Gleb Natapov wrote:

Signed-off-by: Gleb Natapov g...@redhat.com
  


This is actually not just a rewrite, but also a bugfix:


INTR_INFO);
@@ -3289,34 +3288,42 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
+	vmx-vcpu.arch.nmi_injected = false;

+   kvm_clear_exception_queue(vmx-vcpu);
+   kvm_clear_interrupt_queue(vmx-vcpu);
+
+   if (!idtv_info_valid)
+   return;
+
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
-   if (vmx-vcpu.arch.nmi_injected) {
+   
+   switch(type) {
+   case INTR_TYPE_NMI_INTR:
+   vmx-vcpu.arch.nmi_injected = true;
/*
  


The existing code would leave nmi_injected == false if we exit on 
NMI_INTR, so we drop an NMI here.




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Avi Kivity

Gleb Natapov wrote:

The patch fixes two problems with task switching.
1. Back link is written to a wrong TSS.
2. Instruction emulation is not needed if the reason for task switch
   is a task gate in IDT and access to it is caused by an external even.

2 is currently solved only for VMX since there is not reliable way to
skip an instruction in SVM. We should emulate it instead.

  


Looks good, but please split into (at least) two patches.  Also please 
provide a test case so we don't regress again.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cleanup to reuse is_long_mode()

2009-03-30 Thread Avi Kivity

Dong, Eddie wrote:

struct vcpu_svm *svm = to_svm(vcpu);
 
 #ifdef CONFIG_X86_64

-   if (vcpu-arch.shadow_efer  EFER_LME) {
+   if (is_long_mode(vcpu)) {
  


is_long_mode() actually tests EFER_LMA, so this is incorrect.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: Add reserved bits check

2009-03-30 Thread Dong, Eddie
 
 Just noticed that walk_addr() too can be called from tdp context, so
 need to make sure rsvd_bits_mask is initialized in init_kvm_tdp_mmu()
 as well.

Yes, fixed.
Thx, eddie


commit b282565503a78e75af643de42fe7bf495e2213ec
Author: root r...@eddie-wb.localdomain
Date:   Mon Mar 30 16:57:39 2009 +0800

Emulate #PF error code of reserved bits violation.

Signed-off-by: Eddie Dong eddie.d...@intel.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55fd4c5..4fe2742 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -261,6 +261,7 @@ struct kvm_mmu {
union kvm_mmu_page_role base_role;
 
u64 *pae_root;
+   u64 rsvd_bits_mask[2][4];
 };
 
 struct kvm_vcpu_arch {
@@ -791,5 +792,6 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ef060ec..2eab758 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -126,6 +126,7 @@ module_param(oos_shadow, bool, 0644);
 #define PFERR_PRESENT_MASK (1U  0)
 #define PFERR_WRITE_MASK (1U  1)
 #define PFERR_USER_MASK (1U  2)
+#define PFERR_RSVD_MASK (1U  3)
 #define PFERR_FETCH_MASK (1U  4)
 
 #define PT_DIRECTORY_LEVEL 2
@@ -179,6 +180,11 @@ static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mt_mask;
 
+static inline u64 rsvd_bits(int s, int e)
+{
+   return ((1ULL  (e - s + 1)) - 1)  s;
+}
+
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
 {
shadow_trap_nonpresent_pte = trap_pte;
@@ -2155,6 +2161,14 @@ static void paging_free(struct kvm_vcpu *vcpu)
nonpaging_free(vcpu);
 }
 
+static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
+{
+   int bit7;
+
+   bit7 = (gpte  7)  1;
+   return (gpte  vcpu-arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
+}
+
 #define PTTYPE 64
 #include paging_tmpl.h
 #undef PTTYPE
@@ -2163,6 +2177,54 @@ static void paging_free(struct kvm_vcpu *vcpu)
 #include paging_tmpl.h
 #undef PTTYPE
 
+void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
+{
+   struct kvm_mmu *context = vcpu-arch.mmu;
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+   u64 exb_bit_rsvd = 0;
+
+   if (!is_nx(vcpu))
+   exb_bit_rsvd = rsvd_bits(63, 63);
+   switch (level) {
+   case PT32_ROOT_LEVEL:
+   /* no rsvd bits for 2 level 4K page table entries */
+   context-rsvd_bits_mask[0][1] = 0;
+   context-rsvd_bits_mask[0][0] = 0;
+   if (is_cpuid_PSE36())
+   /* 36bits PSE 4MB page */
+   context-rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
+   else
+   /* 32 bits PSE 4MB page */
+   context-rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
+   context-rsvd_bits_mask[1][0] = 0;
+   break;
+   case PT32E_ROOT_LEVEL:
+   context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62);  /* PDE */
+   context-rsvd_bits_mask[0][0] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62);  /* PTE */
+   context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62) |
+   rsvd_bits(13, 20);  /* large page */
+   context-rsvd_bits_mask[1][0] = context-rsvd_bits_mask[0][0];
+   break;
+   case PT64_ROOT_LEVEL:
+   context-rsvd_bits_mask[0][3] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+   context-rsvd_bits_mask[1][3] = context-rsvd_bits_mask[0][3];
+   context-rsvd_bits_mask[1][2] = context-rsvd_bits_mask[0][2];
+   context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
+   context-rsvd_bits_mask[1][0] = context-rsvd_bits_mask[0][0];
+   break;
+   }
+}
+
 static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
 {
struct kvm_mmu *context = vcpu-arch.mmu;
@@ -2183,6 +2245,7 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu, int level)
 
 static int paging64_init_context(struct kvm_vcpu *vcpu)
 {
+   reset_rsvds_bits_mask(vcpu, 

Re: Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-30 Thread Amit Shah
On (Mon) Mar 30 2009 [10:07:58], Avi Kivity wrote:

 1. perfctr wrmsr warning when booting 64bit RHEl5.3
 https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831

 This is the architectural performance counting msr which was enabled in  
 4f76231 (KVM: x86: Ignore reads to EVNTSEL MSRs).  Amit, can you check  
 if appropriate cpuid leaf 10 reporting will fix this?

We already report 0s for the cpuid leaf 10; we need to report 0x3f in
EBX for leaf 10 to denote events corresponding to the bits aren't
available.

I checked and it didn't help (we can't rely on guests to abide by cpuid
flags)

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cleanup to reuse is_long_mode()

2009-03-30 Thread Avi Kivity

Dong, Eddie wrote:

Avi Kivity wrote:
  

Dong, Eddie wrote:


struct vcpu_svm *svm = to_svm(vcpu);

 #ifdef CONFIG_X86_64
-   if (vcpu-arch.shadow_efer  EFER_LME) {
+   if (is_long_mode(vcpu)) {

  

is_long_mode() actually tests EFER_LMA, so this is incorrect.



Something missing? Here is the definition of is_long_mode, the patch is just 
for equal replacement.
thx, eddie


static inline int is_long_mode(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_X86_64
return vcpu-arch.shadow_efer  EFER_LME;
#else
return 0;
#endif
}


You're looking at an old version.  Mine has EFER_LMA.  See 9d642b.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit

2009-03-30 Thread Avi Kivity

Dong, Eddie wrote:

@@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int 
level)
context-rsvd_bits_mask[1][0] = 0;
break;
case PT32E_ROOT_LEVEL:
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62) |
+   rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62);  /* PDE */
 		context-rsvd_bits_mask[0][0] = exb_bit_rsvd 


Are you sure that PDPTEs support NX?  They don't support R/W and U/S, so 
it seems likely that NX is reserved as well even when EFER.NXE is enabled.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] kvm: qemu: check device assignment command

2009-03-30 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
 I suggest replacing the parsing code with pci_parse_devaddr() (needs
 to be extended to support functions) so that all the checking and
 parsing is done in one place. 
 
 
 If use pci_parse_devaddr(), it needs to add domain section to
 assigning command, and add function section to pci_add/pci_del
 commands. What's more, pci_parse_devaddr() parses guest device bdf,
 there are some assumption, such as function is 0. But here parse
 host bdf. It's a little complex to combine them together.
 
 
 Right, but we end up with overall better code.

pci_parse_devaddr parses [[domain:][bus:]slot, it's valid when even enter 
only slot, whereas it must be bus:slot.func in device assignment command  
(-pcidevice host=bus:slot.func). So I implemented a dedicated function to parse 
device bdf in device assignment command, rather than mix two parsing function 
together.

Signed-off-by: Weidong Han weidong@intel.com

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index cef7c8a..53375ff 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -1195,8 +1195,7 @@ out:
  */
 AssignedDevInfo *add_assigned_device(const char *arg)
 {
-char *cp, *cp1;
-char device[8];
+char device[16];
 char dma[6];
 int r;
 AssignedDevInfo *adev;
@@ -1207,6 +1206,13 @@ AssignedDevInfo *add_assigned_device(const char *arg)
 return NULL;
 }
 r = get_param_value(device, sizeof(device), host, arg);
+if (!r)
+ goto bad;
+
+r = pci_parse_host_devaddr(device, adev-bus, adev-dev, adev-func);
+if (r)
+goto bad;
+
 r = get_param_value(adev-name, sizeof(adev-name), name, arg);
 if (!r)
snprintf(adev-name, sizeof(adev-name), %s, device);
@@ -1216,18 +1222,6 @@ AssignedDevInfo *add_assigned_device(const char *arg)
 if (r  !strncmp(dma, none, 4))
 adev-disable_iommu = 1;
 #endif
-cp = device;
-adev-bus = strtoul(cp, cp1, 16);
-if (*cp1 != ':')
-goto bad;
-cp = cp1 + 1;
-
-adev-dev = strtoul(cp, cp1, 16);
-if (*cp1 != '.')
-goto bad;
-cp = cp1 + 1;
-
-adev-func = strtoul(cp, cp1, 16);
 
 LIST_INSERT_HEAD(adev_head, adev, next);
 return adev;
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index eca0517..bf97c8c 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -163,6 +163,7 @@ static int pci_set_default_subsystem_id(PCIDevice *pci_dev)
 }
 
 /*
+ * Parse pci address in qemu command
  * Parse [[domain:]bus:]slot, return -1 on error
  */
 static int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned 
*slotp)
@@ -211,6 +212,55 @@ static int pci_parse_devaddr(const char *addr, int *domp, 
int *busp, unsigned *s
 return 0;
 }
 
+/*
+ * Parse device bdf in device assignment command:
+ *
+ * -pcidevice host=bus:dev.func
+ *
+ * Parse bus:slot.func return -1 on error
+ */
+int pci_parse_host_devaddr(const char *addr, int *busp,
+   int *slotp, int *funcp)
+{
+const char *p;
+char *e;
+int val;
+int bus = 0, slot = 0, func = 0;
+
+p = addr;
+val = strtoul(p, e, 16);
+if (e == p)
+   return -1;
+if (*e == ':') {
+   bus = val;
+   p = e + 1;
+   val = strtoul(p, e, 16);
+   if (e == p)
+   return -1;
+   if (*e == '.') {
+   slot = val;
+   p = e + 1;
+   val = strtoul(p, e, 16);
+   if (e == p)
+   return -1;
+   func = val;
+   } else
+   return -1;
+} else
+   return -1;
+
+if (bus  0xff || slot  0x1f || func  0x7)
+   return -1;
+
+if (*e)
+   return -1;
+
+*busp = bus;
+*slotp = slot;
+*funcp = func;
+return 0;
+}
+
 int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp)
 {
 char devaddr[32];
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index a7438f2..bfdd29a 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -227,6 +227,9 @@ PCIDevice *pci_find_device(int bus_num, int slot, int 
function);
 int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp);
 int pci_assign_devaddr(const char *addr, int *domp, int *busp, unsigned 
*slotp);
 
+int pci_parse_host_devaddr(const char *addr, int *busp,
+   int *slotp, int *funcp);
+
 void pci_info(Monitor *mon);
 PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Add reserved bits check

2009-03-30 Thread Avi Kivity

Dong, Eddie wrote:

Just noticed that walk_addr() too can be called from tdp context, so
need to make sure rsvd_bits_mask is initialized in init_kvm_tdp_mmu()
as well.



Yes, fixed.

  
Applied, thanks.  I also added unit tests for bit 51 of the pte and pde 
in the mmu tests.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can disk geometry be specified in libvirt?

2009-03-30 Thread Bike Snow
Hello

I'm trying to pass a fibre channel virtual disk to a KVM host via libvirt.

On the host, disk is:

Disk /dev/sdb: 53.6 GB, 53631516672 bytes
64 heads, 32 sectors/track, 6393 cylinders
Units = cylinders of 2048 * 4096 = 8388608 bytes
Disk identifier: 0x5e9ca6c0

As you can see, the sector size is 4096 and not the usual 512 bytes.

If I pass this to a KVM guest, I get this in the guest:

Disk /dev/vdc: 53.6 GB, 53631516672 bytes
64 heads, 32 sectors/track, 51147 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Disk identifier: 0x5e9ca6c0

As you can see, it's sector size has not been recognised correctly.
It's recognised as 512 bytes. Because of this, the disk cannot be
used.

Is there any way to pass the sector size to the guest?

Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU stuck for ...

2009-03-30 Thread Robert Wimmer
Hi,

many thanks for your replys. I've upgraded
some systems to kernel 2.6.29 a few days ago.
There was especially one system which nearly always
crashed during kernel compilation. With 2.6.29
as host an guest it currently works. Have now
compiled the kernel three times (always from
scratch) and nothing crashed. 

To use i686 (or x86) as host wouldn't be a option.

The preemtible kernel seems a possible way to go
if the crash happens again. But if it works now
I'll leave it as it is since there are still drivers
out there which have problems with preemt. kernels.

But there is something I still wonder: Is this the
right mailing list for such requests? If I read a
message like BUG: soft lockup - CPU#0 stuck for ...?
it looks for me like a bug which should be looked after
by the develpers but it seems that nobody here really
cares for such reports. I'm really a gratefull for
KVM and the work by done by all the developers but
isn't it in the interest of a company like Redhat to
get the product stable and to eliminate all known
bugs before the release of their new virtualisation
product? I really don't mean this as flame because
my intention is really to get KVM better. But the only
thing I can do is to submit bug reports since I'm not
a C/C++ developer.

Btw: Is there a overview what kernel settings
are recommended for KVM hosts and guests beside the
obvious ones? I've learned so far that the noop
I/O scheduler in the guest and deadline in the host
are good choices. I've read in the XFS filesystem FAQ 
that the KVM drive= option should include cache=none
to avoid filesystem corruption (which I've already had
in some KVMs and caused me to switch to ext3 instead).
The kernel settings are especially usefull for people
like me who're using Gentoo where you have to compile
everything yourself.

Keep the good work going!
Thanks!
Robert




many than

 Hi,
 I was also experiencing this problem a lot for quite a long time (and for
 wide range of KVM versions..)
 I might be completely wrong as I'm not sure if it was really the reason,
 but i THINK it disappeared when I started to use fully preemptible kernel on 
 host..
 You might want to try it...
 BR
 nik

On Sun, Mar 29, 2009 at 07:51:21AM +, Gerrit Slomma wrote:
 Robert Wimmer r.wimmer at tomorrow-focus.de writes:
 
  
  Hi,
  
  does anyone know how to solve the problem
  with BUG: soft lockup - CPU#0 stuck for ...?
  Today I got the messages below during compilation
  of the kernel modules in a guest. Using kvm84 and Kernel 2.6.29
  as host kernel and 2.6.28 as guest kernel during the
  hangup of the guest neither ssh or ping was possible.
  After about 2 minutes the guest was reachable again
  and I saw the messages below with dmesg.
  
  Maybe it is related with my prev. anserwed posting:
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/29677
  
  Thanks!
  Robert
  
  BUG: soft lockup - CPU#0 stuck for 61s!
  (...)
 
 Hello
 
 Do you use x86_64 or i686?
 Look at my post here 
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/29833
 And my Bug-report here https://bugzilla.redhat.com/show_bug.cgi?id=492688.
 I do not have the problems while running but after migrating. Problems with
 stuck CPUs vanish if i686 for the host is used - but i am testing further.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majordomo at vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] Fix handling of a fault during NMI unblocked due to IRET

2009-03-30 Thread Gleb Natapov
Bit 12 is undefined in any of the following cases:
 If the VM exit sets the valid bit in the IDT-vectoring information field.
 If the VM exit is due to a double fault.

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/vmx.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 37ae13d..14e3f48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3259,36 +3259,41 @@ static void update_tpr_threshold(struct kvm_vcpu *vcpu)
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
-   u32 idt_vectoring_info;
+   u32 idt_vectoring_info = vmx-idt_vectoring_info;
bool unblock_nmi;
u8 vector;
int type;
bool idtv_info_valid;
u32 error;
 
+   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info  INTR_INFO_UNBLOCK_NMI) != 0;
vector = exit_intr_info  INTR_INFO_VECTOR_MASK;
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Re-set bit block by NMI before VM entry if vmexit caused by
 * a guest IRET fault.
+* SDM 3: 23.2.2 (September 2008)
+* Bit 12 is undefined in any of the following cases:
+*  If the VM exit sets the valid bit in the IDT-vectoring
+*   information field.
+*  If the VM exit is due to a double fault.
 */
-   if (unblock_nmi  vector != DF_VECTOR)
+   if ((exit_intr_info  INTR_INFO_VALID_MASK)  unblock_nmi 
+   vector != DF_VECTOR  !idtv_info_valid)
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
  GUEST_INTR_STATE_NMI);
} else if (unlikely(vmx-soft_vnmi_blocked))
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
-   idt_vectoring_info = vmx-idt_vectoring_info;
-   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
if (vmx-vcpu.arch.nmi_injected) {
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Clear bit block by NMI before VM entry if a NMI delivery
 * faulted.
 */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] Rewrite twisted maze of if() statements with more straightforward switch()

2009-03-30 Thread Gleb Natapov
Also fix a bug when NMI could be dropped on exit. Although this should
never happen in practice.

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/vmx.c |   43 +--
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 14e3f48..1017544 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3264,7 +3264,6 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
u8 vector;
int type;
bool idtv_info_valid;
-   u32 error;
 
idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
@@ -3289,34 +3288,42 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
+   vmx-vcpu.arch.nmi_injected = false;
+   kvm_clear_exception_queue(vmx-vcpu);
+   kvm_clear_interrupt_queue(vmx-vcpu);
+
+   if (!idtv_info_valid)
+   return;
+
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
-   if (vmx-vcpu.arch.nmi_injected) {
+   
+   switch(type) {
+   case INTR_TYPE_NMI_INTR:
+   vmx-vcpu.arch.nmi_injected = true;
/*
 * SDM 3: 27.7.1.2 (September 2008)
-* Clear bit block by NMI before VM entry if a NMI delivery
-* faulted.
+* Clear bit block by NMI before VM entry if a NMI
+* delivery faulted.
 */
-   if (idtv_info_valid  type == INTR_TYPE_NMI_INTR)
-   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
-   GUEST_INTR_STATE_NMI);
-   else
-   vmx-vcpu.arch.nmi_injected = false;
-   }
-   kvm_clear_exception_queue(vmx-vcpu);
-   if (idtv_info_valid  (type == INTR_TYPE_HARD_EXCEPTION ||
-   type == INTR_TYPE_SOFT_EXCEPTION)) {
+   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+   GUEST_INTR_STATE_NMI);
+   break;
+   case INTR_TYPE_HARD_EXCEPTION:
+   case INTR_TYPE_SOFT_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
-   error = vmcs_read32(IDT_VECTORING_ERROR_CODE);
-   kvm_queue_exception_e(vmx-vcpu, vector, error);
+   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
-   }
-   kvm_clear_interrupt_queue(vmx-vcpu);
-   if (idtv_info_valid  type == INTR_TYPE_EXT_INTR) {
+   break;
+   case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
+   break;
+   default:
+   break;
}
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] Do not zero idt_vectoring_info in vmx_complete_interrupts().

2009-03-30 Thread Gleb Natapov
We will need it later in task_switch().
Code in handle_exception() is dead. is_external_interrupt(vect_info)
will always be false since idt_vectoring_info is zeroed in
vmx_complete_interrupts().

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/vmx.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1017544..0da7a9e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2613,11 +2613,6 @@ static int handle_exception(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
printk(KERN_ERR %s: unexpected, vectoring info 0x%x 
   intr info 0x%x\n, __func__, vect_info, intr_info);
 
-   if (!irqchip_in_kernel(vcpu-kvm)  is_external_interrupt(vect_info)) {
-   int irq = vect_info  VECTORING_INFO_VECTOR_MASK;
-   kvm_push_irq(vcpu, irq);
-   }
-
if ((intr_info  INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
return 1;  /* already handled by vmx_vcpu_run() */
 
@@ -3316,11 +3311,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
-   vmx-idt_vectoring_info = 0;
break;
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector);
-   vmx-idt_vectoring_info = 0;
break;
default:
break;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] Fix task switch back link handling.

2009-03-30 Thread Gleb Natapov
Back link is written to a wrong TSS now.

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/x86.c |   40 
 1 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae4918c..f14c622 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3697,7 +3697,6 @@ static void save_state_to_tss32(struct kvm_vcpu *vcpu,
tss-fs = get_segment_selector(vcpu, VCPU_SREG_FS);
tss-gs = get_segment_selector(vcpu, VCPU_SREG_GS);
tss-ldt_selector = get_segment_selector(vcpu, VCPU_SREG_LDTR);
-   tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR);
 }
 
 static int load_state_from_tss32(struct kvm_vcpu *vcpu,
@@ -3794,8 +3793,8 @@ static int load_state_from_tss16(struct kvm_vcpu *vcpu,
 }
 
 static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
-  u32 old_tss_base,
-  struct desc_struct *nseg_desc)
+ u16 old_tss_sel, u32 old_tss_base,
+ struct desc_struct *nseg_desc)
 {
struct tss_segment_16 tss_segment_16;
int ret = 0;
@@ -3814,6 +3813,16 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 
tss_selector,
   tss_segment_16, sizeof tss_segment_16))
goto out;
 
+   if (old_tss_sel != 0x) {
+   tss_segment_16.prev_task_link = old_tss_sel;
+
+   if (kvm_write_guest(vcpu-kvm,
+   get_tss_base_addr(vcpu, nseg_desc),
+   tss_segment_16.prev_task_link,
+   sizeof tss_segment_16.prev_task_link))
+   goto out;
+   }
+
if (load_state_from_tss16(vcpu, tss_segment_16))
goto out;
 
@@ -3823,7 +3832,7 @@ out:
 }
 
 static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
-  u32 old_tss_base,
+  u16 old_tss_sel, u32 old_tss_base,
   struct desc_struct *nseg_desc)
 {
struct tss_segment_32 tss_segment_32;
@@ -3843,6 +3852,16 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 
tss_selector,
   tss_segment_32, sizeof tss_segment_32))
goto out;
 
+   if (old_tss_sel != 0x) {
+   tss_segment_32.prev_task_link = old_tss_sel;
+
+   if (kvm_write_guest(vcpu-kvm,
+   get_tss_base_addr(vcpu, nseg_desc),
+   tss_segment_32.prev_task_link,
+   sizeof tss_segment_32.prev_task_link))
+   goto out;
+   }
+
if (load_state_from_tss32(vcpu, tss_segment_32))
goto out;
 
@@ -3898,12 +3917,17 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason)
 
kvm_x86_ops-skip_emulated_instruction(vcpu);
 
+   /* set back link to prev task only if NT bit is set in eflags
+  note that old_tss_sel is not used afetr this point */
+   if (reason != TASK_SWITCH_CALL  reason != TASK_SWITCH_GATE)
+   old_tss_sel = 0x;
+
if (nseg_desc.type  8)
-   ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_base,
-nseg_desc);
+   ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_sel,
+old_tss_base, nseg_desc);
else
-   ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_base,
-nseg_desc);
+   ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_sel,
+old_tss_base, nseg_desc);
 
if (reason == TASK_SWITCH_CALL || reason == TASK_SWITCH_GATE) {
u32 eflags = kvm_x86_ops-get_rflags(vcpu);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] Fix unneeded instruction skipping during task switching.

2009-03-30 Thread Gleb Natapov
There is no need to skip instruction if the reason for a task switch
is a task gate in IDT and access to it is caused by an external even.
The problem  is currently solved only for VMX since there is no reliable
way to skip an instruction in SVM. We should emulate it instead.

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/include/asm/svm.h |1 +
 arch/x86/kvm/svm.c |   25 ++---
 arch/x86/kvm/vmx.c |   40 +---
 arch/x86/kvm/x86.c |5 -
 4 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 82ada75..85574b7 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -225,6 +225,7 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_EVTINJ_VALID_ERR (1  11)
 
 #define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
+#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
 
 #defineSVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR
 #defineSVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1fcbc17..3ffb695 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1823,17 +1823,28 @@ static int task_switch_interception(struct vcpu_svm 
*svm,
struct kvm_run *kvm_run)
 {
u16 tss_selector;
+   int reason;
+   int int_type = svm-vmcb-control.exit_int_info 
+   SVM_EXITINTINFO_TYPE_MASK;
 
tss_selector = (u16)svm-vmcb-control.exit_info_1;
+
if (svm-vmcb-control.exit_info_2 
(1ULL  SVM_EXITINFOSHIFT_TS_REASON_IRET))
-   return kvm_task_switch(svm-vcpu, tss_selector,
-  TASK_SWITCH_IRET);
-   if (svm-vmcb-control.exit_info_2 
-   (1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
-   return kvm_task_switch(svm-vcpu, tss_selector,
-  TASK_SWITCH_JMP);
-   return kvm_task_switch(svm-vcpu, tss_selector, TASK_SWITCH_CALL);
+   reason = TASK_SWITCH_IRET;
+   else if (svm-vmcb-control.exit_info_2 
+(1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
+   reason = TASK_SWITCH_JMP;
+   else if (svm-vmcb-control.exit_int_info  SVM_EXITINTINFO_VALID)
+   reason = TASK_SWITCH_GATE;
+   else
+   reason = TASK_SWITCH_CALL;
+
+
+   if (reason != TASK_SWITCH_GATE || int_type == SVM_EXITINTINFO_TYPE_SOFT)
+   skip_emulated_instruction(svm-vcpu);
+
+   return kvm_task_switch(svm-vcpu, tss_selector, reason);
 }
 
 static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0da7a9e..01db958 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3025,22 +3025,40 @@ static int handle_task_switch(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long exit_qualification;
u16 tss_selector;
-   int reason;
+   int reason, type, idt_v;
+
+   idt_v = (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK);
+   type = (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK);
 
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
reason = (u32)exit_qualification  30;
-   if (reason == TASK_SWITCH_GATE  vmx-vcpu.arch.nmi_injected 
-   (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK) 
-   (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK)
-   == INTR_TYPE_NMI_INTR) {
-   vcpu-arch.nmi_injected = false;
-   if (cpu_has_virtual_nmis())
-   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
- GUEST_INTR_STATE_NMI);
+   if (reason == TASK_SWITCH_GATE  idt_v) {
+   switch (type) {
+   case INTR_TYPE_NMI_INTR:
+   vcpu-arch.nmi_injected = false;
+   if (cpu_has_virtual_nmis())
+   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+   break;
+   case INTR_TYPE_EXT_INTR:
+   kvm_clear_interrupt_queue(vcpu);
+   break;
+   case INTR_TYPE_HARD_EXCEPTION:
+   case INTR_TYPE_SOFT_EXCEPTION:
+   kvm_clear_exception_queue(vcpu);
+   break;
+   default:
+   break;
+   }
}
tss_selector = exit_qualification;
 
+   if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION 
+  type != INTR_TYPE_EXT_INTR 
+  type != INTR_TYPE_NMI_INTR))
+   skip_emulated_instruction(vcpu);
+
if (!kvm_task_switch(vcpu, tss_selector, reason))
return 0;
 
@@ -3292,8 

Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 10:39:21AM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 The patch fixes two problems with task switching.
 1. Back link is written to a wrong TSS.
 2. Instruction emulation is not needed if the reason for task switch
is a task gate in IDT and access to it is caused by an external even.

 2 is currently solved only for VMX since there is not reliable way to
 skip an instruction in SVM. We should emulate it instead.

   

 Looks good, but please split into (at least) two patches.  Also please  
 provide a test case so we don't regress again.

This what I am using for testing. After running make you should get
kernel.bin that can be booted from grub. Runs on real HW too. I am
planing to add more test.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/user/test/x86/kvmtest/Makefile b/user/test/x86/kvmtest/Makefile
new file mode 100644
index 000..b93935f
--- /dev/null
+++ b/user/test/x86/kvmtest/Makefile
@@ -0,0 +1,33 @@
+CC=gcc
+AS=gcc
+CFLAGS=-m32 -I. -O2 -Wall
+ASFLAGS=-m32 -I.
+OBJS=kernel.o lib.o boot.o memory.o gdt.o idt.o isrs.o tss.o uart.o
+ALLOBJS=$(OBJS) tests/tests.o
+
+PHONY := all
+all: kernel.bin
+   $(MAKE) -C tests
+
+kernel.bin: $(ALLOBJS) kernel.ld
+   ld -T kernel.ld $(ALLOBJS) -o $@
+
+install: kernel.bin
+   cp $ /boot/
+
+tests/tests.o:
+   $(MAKE) -C tests
+
+-include $(OBJS:.o=.d)
+
+# compile and generate dependency info
+%.o: %.c
+   gcc -c $(CFLAGS) $*.c -o $*.o
+   gcc -MM $(CFLAGS) $*.c  $*.d
+
+PHONY += clean
+clean:
+   $(MAKE) -C tests
+   -rm *.o *~ *.d kernel.bin
+
+.PHONY: $(PHONY)
diff --git a/user/test/x86/kvmtest/boot.S b/user/test/x86/kvmtest/boot.S
new file mode 100644
index 000..f74015c
--- /dev/null
+++ b/user/test/x86/kvmtest/boot.S
@@ -0,0 +1,357 @@
+/* boot.S - bootstrap the kernel */
+/* Copyright (C) 1999, 2001  Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+
+#define ASM 1
+#include multiboot.h
+#include kernel.h
+
+.text
+
+.globl  start, _start
+start:
+_start:
+jmp multiboot_entry
+
+/* Align 32 bits boundary. */
+.align  4
+
+/* Multiboot header. */
+multiboot_header:
+/* magic */
+.long   MULTIBOOT_HEADER_MAGIC
+/* flags */
+.long   MULTIBOOT_HEADER_FLAGS
+/* checksum */
+.long   -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)
+#ifndef __ELF__
+   /* header_addr */
+   .long   multiboot_header
+   /* load_addr */
+   .long   _start
+   /* load_end_addr */
+   .long   _edata
+   /* bss_end_addr */
+   .long   _end
+   /* entry_addr */
+   .long   multiboot_entry
+#endif /* ! __ELF__ */
+
+   multiboot_entry:
+   /* Initialize the stack pointer. */
+   movl$(STACK_START), %esp
+
+   /* Reset EFLAGS. */
+   pushl   $0
+   popf
+
+   /* Push the pointer to the Multiboot information structure. */
+   pushl   %ebx
+   /* Push the magic value. */
+   pushl   %eax
+
+   /* Now enter the C main function... */
+   callcmain
+
+   /* Halt. */
+   pushl   $halt_message
+   pushl   $0
+   callprintk
+
+   loop:   hlt
+   jmp loop
+
+.globl isr0
+.globl isr1
+.globl isr2
+.globl isr3
+.globl isr4
+.globl isr5
+.globl isr6
+.globl isr7
+.globl isr8
+.globl isr9
+.globl isr10
+.globl isr11
+.globl isr12
+.globl isr13
+.globl isr14
+.globl isr15
+.globl isr16
+.globl isr17
+.globl isr18
+.globl isr19
+.globl isr20
+.globl isr21
+.globl isr22
+.globl isr23
+.globl isr24
+.globl isr25
+.globl isr26
+.globl isr27
+.globl isr28
+.globl isr29
+.globl isr30
+.globl isr31
+
+/* 0: Divide By Zero Exception */
+isr0:
+   cli
+   pushl $0
+   pushl $0
+   jmp isr_common_stub
+
+/*  1: Debug Exception */
+isr1:
+   cli
+   pushl $0
+   pushl $1
+   jmp isr_common_stub
+
+/*  2: Non Maskable Interrupt Exception */
+isr2:
+   cli
+   pushl $0
+   pushl $2
+   jmp isr_common_stub
+
+/*  3: Int 3 Exception */
+isr3:
+   cli
+   pushl $0
+   pushl $3
+   jmp isr_common_stub
+
+/*  4: INTO Exception */
+isr4:
+   cli
+   pushl $0
+   pushl $4
+   jmp isr_common_stub
+
+/*  5: Out of Bounds Exception */
+isr5:
+   cli
+   pushl $0
+   pushl $5
+   jmp isr_common_stub
+
+/*  6: Invalid Opcode Exception 

Re: Live memory allocation?

2009-03-30 Thread Alberto Treviño
On Saturday 28 March 2009 11:17:42 am you wrote:
 KVM devs have a patch called KSM (short for kernel shared memory I think)
 that helps windows guests a good bit. See the original announcement [1]
 for some numbers. I spoke to one of the devs recently and they said they
 are going to resubmit it soon.

I remember the discussion about KSM.  First, the kernel developers were not 
very happy with the approach, and second, there were some patent 
implications with VMware.

Have these issues been resolved?  Don't get me wrong.  I'm not trying to 
stop KSM, I'm just wondering if I can get my hopes up again.  I thought KSM 
was a great idea and I'd love to get my hands on it.

-- 
Alberto Treviño
BYU Testing Center
Brigham Young University

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Tomasz Chmielewski

Avi Kivity schrieb:

(...)

Perhaps KSM would help you?  Alternately, a heuristic that scanned for 
(and
collapsed) fully zeroed pages when a page is faulted in for the first 
time could

catch these.
  


ksm will indeed collapse these pages.  Lighter-weight alternatives exist 
-- ballooning (need a Windows driver), or, like you mention, a simple 
scanner that looks for zero pages and drops them.  That could be 
implemented within qemu (with some simple kernel support for dropping 
zero pages atomically, say madvise(MADV_DROP_IFZERO).


From KSM description I can conclude that it allows dynamicly sharing 
identical memory pages between one or more processes.


What about cache/buffers sharing between the host kernel and running 
processes?



If I'm not mistaken, right now, memory is wasted by caching the same 
data by host and guest kernels.


For example, let's say we have a host with 2 GB RAM and it runs a 1 GB 
guest.

If we read ~900 MB file_1 (block device) on guest, then:
- guest's kernel will cache file_1
- host's kernel will cache the same area of file_1 (block device)

Now, if we want to read ~900 MB file_2 (or lots of files with that 
size), cache for file_1 will be emptied on both guest and host as we 
read file_2.
Ideal situation would be if host and guest caches could be shared, to 
a degree (and have both file_1 and file_2 in memory, doesn't matter if 
it's guest or host).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Avi Kivity

Tomasz Chmielewski wrote:


What about cache/buffers sharing between the host kernel and running 
processes?



If I'm not mistaken, right now, memory is wasted by caching the same 
data by host and guest kernels.


For example, let's say we have a host with 2 GB RAM and it runs a 1 GB 
guest.

If we read ~900 MB file_1 (block device) on guest, then:
- guest's kernel will cache file_1
- host's kernel will cache the same area of file_1 (block device)

Now, if we want to read ~900 MB file_2 (or lots of files with that 
size), cache for file_1 will be emptied on both guest and host as we 
read file_2.
Ideal situation would be if host and guest caches could be shared, 
to a degree (and have both file_1 and file_2 in memory, doesn't matter 
if it's guest or host).


Double caching is indeed a bad idea.  That's why you have cache=off 
(though it isn't recommended with qcow2).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-30 Thread Avi Kivity

Amit Shah wrote:

On (Mon) Mar 30 2009 [10:07:58], Avi Kivity wrote:

  

1. perfctr wrmsr warning when booting 64bit RHEl5.3
https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831
  
This is the architectural performance counting msr which was enabled in  
4f76231 (KVM: x86: Ignore reads to EVNTSEL MSRs).  Amit, can you check  
if appropriate cpuid leaf 10 reporting will fix this?



We already report 0s for the cpuid leaf 10; we need to report 0x3f in
EBX for leaf 10 to denote events corresponding to the bits aren't
available.

I checked and it didn't help (we can't rely on guests to abide by cpuid
flags)
  


I see this in the code:


/*
 * Check whether the Architectural PerfMon supports
 * Unhalted Core Cycles Event or not.
 * NOTE: Corresponding bit = 0 in ebx indicates event present.
 */
cpuid(10, (eax.full), ebx, unused, unused);
if ((eax.split.mask_length  
(ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||

(ebx  ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
return 0;



So I think it can be done.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Tomasz Chmielewski wrote:


What about cache/buffers sharing between the host kernel and running 
processes?



If I'm not mistaken, right now, memory is wasted by caching the same 
data by host and guest kernels.


For example, let's say we have a host with 2 GB RAM and it runs a 1 GB 
guest.

If we read ~900 MB file_1 (block device) on guest, then:
- guest's kernel will cache file_1
- host's kernel will cache the same area of file_1 (block device)

Now, if we want to read ~900 MB file_2 (or lots of files with that 
size), cache for file_1 will be emptied on both guest and host as we 
read file_2.
Ideal situation would be if host and guest caches could be shared, 
to a degree (and have both file_1 and file_2 in memory, doesn't matter 
if it's guest or host).


Double caching is indeed a bad idea.  That's why you have cache=off 
(though it isn't recommended with qcow2).


cache= option is about write cache, right?

Here, I'm talking about read cache.

Or, does cache=none disable read cache as well?


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] qemu: SMBIOS passing support

2009-03-30 Thread Alex Williamson

Is there any interest in this series?  Aside from copying host SMBIOS
entries, it also seems useful for providing information to the guest
about their virtual machine pool (perhaps via a type 3 entry), or
whatever other bits of data someone might find useful (type 11, OEM
string for instance).  Thanks,

Alex

On Mon, 2009-03-23 at 13:05 -0600, Alex Williamson wrote:
 This series adds a new -smbios option for x86 that allows individual
 SMBIOS entries to be passed into the guest VM.  This follows the same
 basic path as the support for loading ACPI tables.  While SMBIOS is
 independent of ACPI, I chose to add the smbios_entry_add() function to
 acpi.c because they're both somewhat PC BIOS related (and ia64 can
 support SMBIOS and might be able to make use of it there).
 
 This feature allows the guest to see certain properties of the host if
 configured correctly.  For instance, the system model and serial number
 in the type 1 entry.  Obviously its only built at boot, so doesn't get
 updated for migration scenarios.  User provided entries will supersede
 generated entries, so care should be taken when passing entries which
 describe physical properties, such as memory size and address ranges.
 Thanks,
 
 Alex 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] qemu: SMBIOS passing support

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 07:59:36AM -0600, Alex Williamson wrote:
 
 Is there any interest in this series?  Aside from copying host SMBIOS
 entries, it also seems useful for providing information to the guest
 about their virtual machine pool (perhaps via a type 3 entry), or
 whatever other bits of data someone might find useful (type 11, OEM
 string for instance).  Thanks,
 
I think the patch is useful. Haven't looked at implementation though.

 Alex
 
 On Mon, 2009-03-23 at 13:05 -0600, Alex Williamson wrote:
  This series adds a new -smbios option for x86 that allows individual
  SMBIOS entries to be passed into the guest VM.  This follows the same
  basic path as the support for loading ACPI tables.  While SMBIOS is
  independent of ACPI, I chose to add the smbios_entry_add() function to
  acpi.c because they're both somewhat PC BIOS related (and ia64 can
  support SMBIOS and might be able to make use of it there).
  
  This feature allows the guest to see certain properties of the host if
  configured correctly.  For instance, the system model and serial number
  in the type 1 entry.  Obviously its only built at boot, so doesn't get
  updated for migration scenarios.  User provided entries will supersede
  generated entries, so care should be taken when passing entries which
  describe physical properties, such as memory size and address ranges.
  Thanks,
  
  Alex 
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] qemu: SMBIOS passing support

2009-03-30 Thread Daniel P. Berrange
On Mon, Mar 30, 2009 at 07:59:36AM -0600, Alex Williamson wrote:
 
 Is there any interest in this series?  Aside from copying host SMBIOS
 entries, it also seems useful for providing information to the guest
 about their virtual machine pool (perhaps via a type 3 entry), or
 whatever other bits of data someone might find useful (type 11, OEM
 string for instance).  Thanks,
 
 Alex
 
 On Mon, 2009-03-23 at 13:05 -0600, Alex Williamson wrote:
  This series adds a new -smbios option for x86 that allows individual
  SMBIOS entries to be passed into the guest VM.  This follows the same
  basic path as the support for loading ACPI tables.  While SMBIOS is
  independent of ACPI, I chose to add the smbios_entry_add() function to
  acpi.c because they're both somewhat PC BIOS related (and ia64 can
  support SMBIOS and might be able to make use of it there).
  
  This feature allows the guest to see certain properties of the host if
  configured correctly.  For instance, the system model and serial number
  in the type 1 entry.  Obviously its only built at boot, so doesn't get
  updated for migration scenarios.  User provided entries will supersede
  generated entries, so care should be taken when passing entries which
  describe physical properties, such as memory size and address ranges.
  Thanks,

I can't help thinking that if we wish to provide metadata to guest OS
like system model, serial number, etc, then we'd be better off using
explicit named flags (or QEMU config file settings once that exists)

  -system-serial 2141241521  -system-model Some Virtual Machine

and have QEMU generate the neccessary SMBIOS data, or other equivalent 
data tables to suit the non-PC based machine types for which SMBIOS
is not relevant.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-30 Thread Amit Shah
On (Mon) Mar 30 2009 [16:55:05], Avi Kivity wrote:
 Amit Shah wrote:
 On (Mon) Mar 30 2009 [10:07:58], Avi Kivity wrote:

   
 1. perfctr wrmsr warning when booting 64bit RHEl5.3
 https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831
   
 This is the architectural performance counting msr which was enabled 
 in  4f76231 (KVM: x86: Ignore reads to EVNTSEL MSRs).  Amit, can you 
 check  if appropriate cpuid leaf 10 reporting will fix this?
 

 We already report 0s for the cpuid leaf 10; we need to report 0x3f in
 EBX for leaf 10 to denote events corresponding to the bits aren't
 available.

 I checked and it didn't help (we can't rely on guests to abide by cpuid
 flags)
   

 I see this in the code:

 /*
  * Check whether the Architectural PerfMon supports
  * Unhalted Core Cycles Event or not.
  * NOTE: Corresponding bit = 0 in ebx indicates event present.
  */
 cpuid(10, (eax.full), ebx, unused, unused);
 if ((eax.split.mask_length   
 (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
 (ebx  ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
 return 0;


 So I think it can be done.

Only if the guest kernel (or module accessing those registers) look at
the cpuid output, right? I checked this for the Kaspersky AV on Windows,
the crash bug I was solving and that program doesn't seem to check
cpuid.

RHEL 5.3 is based on 2.6.18 and this patch appears to have entered in
2.6.21. I saw this on 5.3 as well.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Avi Kivity

Tomasz Chmielewski wrote:


Double caching is indeed a bad idea.  That's why you have cache=off 
(though it isn't recommended with qcow2).


cache= option is about write cache, right?

Here, I'm talking about read cache.

Or, does cache=none disable read cache as well?


cache=writethrough disables the write cache
cache=none disables host caching completely

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] qemu: SMBIOS passing support

2009-03-30 Thread Avi Kivity

Daniel P. Berrange wrote:

I can't help thinking that if we wish to provide metadata to guest OS
like system model, serial number, etc, then we'd be better off using
explicit named flags (or QEMU config file settings once that exists)

  -system-serial 2141241521  -system-model Some Virtual Machine

and have QEMU generate the neccessary SMBIOS data, or other equivalent 
data tables to suit the non-PC based machine types for which SMBIOS

is not relevant.
  


-smbios serial=blah,model=bleach ?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-30 Thread Avi Kivity

Amit Shah wrote:



/*
 * Check whether the Architectural PerfMon supports
 * Unhalted Core Cycles Event or not.
 * NOTE: Corresponding bit = 0 in ebx indicates event present.
 */
cpuid(10, (eax.full), ebx, unused, unused);
if ((eax.split.mask_length   
(ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||

(ebx  ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
return 0;

  

So I think it can be done.



Only if the guest kernel (or module accessing those registers) look at
the cpuid output, right? I checked this for the Kaspersky AV on Windows,
the crash bug I was solving and that program doesn't seem to check
cpuid.
  


The only way to solve all possible cases is to implement the performance 
counters MSRs.  That's not going to happen in a hurry, we're looking at 
making the known cases work.



RHEL 5.3 is based on 2.6.18 and this patch appears to have entered in
2.6.21. I saw this on 5.3 as well.
  


The snippet I quoted came from RHEL 5.3.  It checks cpuid so we should 
be able to make it fail gracefully.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Tomasz Chmielewski

Avi Kivity schrieb:

Tomasz Chmielewski wrote:


Double caching is indeed a bad idea.  That's why you have cache=off 
(though it isn't recommended with qcow2).


cache= option is about write cache, right?

Here, I'm talking about read cache.

Or, does cache=none disable read cache as well?


cache=writethrough disables the write cache
cache=none disables host caching completely


Still, if there is free memory on host, why not use it for cache?


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Javier Guerra
On Mon, Mar 30, 2009 at 10:15 AM, Tomasz Chmielewski man...@wpkg.org wrote:
 Still, if there is free memory on host, why not use it for cache?

because it's best used on the guest; which will do anyway.  so, not
cacheing already-cached data, it's free to cache other more important
things, or to keep more of the VMs memory on RAM.



-- 
Javier
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] qemu: SMBIOS passing support

2009-03-30 Thread Alex Williamson
On Mon, 2009-03-30 at 17:59 +0300, Avi Kivity wrote:
 Daniel P. Berrange wrote:
  I can't help thinking that if we wish to provide metadata to guest OS
  like system model, serial number, etc, then we'd be better off using
  explicit named flags (or QEMU config file settings once that exists)
 
-system-serial 2141241521  -system-model Some Virtual Machine
 
  and have QEMU generate the neccessary SMBIOS data, or other equivalent 
  data tables to suit the non-PC based machine types for which SMBIOS
  is not relevant.

 
 -smbios serial=blah,model=bleach ?
 

Unfortunately that does make them smbios specific, while I think Daniel
is pointing out that several options may be useful on other platforms.

This is basically the same issue we have with -uuid already.  -uuid is a
non-smbios specific option, but rombios will incorporate the data when
it builds the type 1 entry.  I've retained this functionality, so that a
-uuid option will override the uuid in a passed in type 1 entry.  This
could be further extended with separate patches to provide serial or
model numbers generically, but allow them to override smbios values.
This seems complimentary to the patches in this series, but I don't
think it replaces all the functionality we get from a raw smbios entry
interface.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-30 Thread Brian Jackson
On Monday 30 March 2009 08:23:44 Alberto Treviño wrote:
 On Saturday 28 March 2009 11:17:42 am you wrote:
  KVM devs have a patch called KSM (short for kernel shared memory I think)
  that helps windows guests a good bit. See the original announcement [1]
  for some numbers. I spoke to one of the devs recently and they said they
  are going to resubmit it soon.

 I remember the discussion about KSM.  First, the kernel developers were not
 very happy with the approach, and second, there were some patent
 implications with VMware.


Some (one?) of the kernel devs didn't like it, then admitted that he hadn't 
even read the patch. And as Alan Cox pointed out, if there was some patent 
problem, it should be handled by lawyers. There was also prior art (even in 
Linux) from quite some time ago. So, I think we are safe for now.

--Brian Jackson



 Have these issues been resolved?  Don't get me wrong.  I'm not trying to
 stop KSM, I'm just wondering if I can get my hopes up again.  I thought KSM
 was a great idea and I'd love to get my hands on it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU stuck for ...

2009-03-30 Thread Brian Jackson
On Monday 30 March 2009 06:37:35 Robert Wimmer wrote:
 Hi,

 many thanks for your replys. I've upgraded
 some systems to kernel 2.6.29 a few days ago.
 There was especially one system which nearly always
 crashed during kernel compilation. With 2.6.29
 as host an guest it currently works. Have now
 compiled the kernel three times (always from
 scratch) and nothing crashed.

 To use i686 (or x86) as host wouldn't be a option.

 The preemtible kernel seems a possible way to go
 if the crash happens again. But if it works now
 I'll leave it as it is since there are still drivers
 out there which have problems with preemt. kernels.

 But there is something I still wonder: Is this the
 right mailing list for such requests? If I read a
 message like BUG: soft lockup - CPU#0 stuck for ...?
 it looks for me like a bug which should be looked after
 by the develpers but it seems that nobody here really
 cares for such reports. I'm really a gratefull for
 KVM and the work by done by all the developers but
 isn't it in the interest of a company like Redhat to
 get the product stable and to eliminate all known
 bugs before the release of their new virtualisation
 product? I really don't mean this as flame because
 my intention is really to get KVM better. But the only
 thing I can do is to submit bug reports since I'm not
 a C/C++ developer.


I think your problem is timing. All the devs seem to be really focused on 
getting kvm merged into upstream qemu properly right now. Following the list 
I've noticed that at least one of the devs seems to do a weekly review of the 
list and tries to handle all the bugs he sees. I actually think filing bugs for 
bugs is probably a better way to go because it's easier for the devs to keep 
track of them there (rather than having to read through a ton of mailing list 
messages, some of which don't even have to do with kvm). Moral of the story... 
even though nobody replied to you (yet?) your reports and time spent finding 
workarounds is appreciated.



 Btw: Is there a overview what kernel settings
 are recommended for KVM hosts and guests beside the
 obvious ones? I've learned so far that the noop
 I/O scheduler in the guest and deadline in the host
 are good choices. I've read in the XFS filesystem FAQ
 that the KVM drive= option should include cache=none
 to avoid filesystem corruption (which I've already had
 in some KVMs and caused me to switch to ext3 instead).
 The kernel settings are especially usefull for people
 like me who're using Gentoo where you have to compile
 everything yourself.

 Keep the good work going!
 Thanks!
 Robert




 many than

  Hi,
  I was also experiencing this problem a lot for quite a long time (and for
  wide range of KVM versions..)
  I might be completely wrong as I'm not sure if it was really the reason,
  but i THINK it disappeared when I started to use fully preemptible kernel
  on host.. You might want to try it...
  BR
  nik

 On Sun, Mar 29, 2009 at 07:51:21AM +, Gerrit Slomma wrote:
  Robert Wimmer r.wimmer at tomorrow-focus.de writes:
   Hi,
  
   does anyone know how to solve the problem
   with BUG: soft lockup - CPU#0 stuck for ...?
   Today I got the messages below during compilation
   of the kernel modules in a guest. Using kvm84 and Kernel 2.6.29
   as host kernel and 2.6.28 as guest kernel during the
   hangup of the guest neither ssh or ping was possible.
   After about 2 minutes the guest was reachable again
   and I saw the messages below with dmesg.
  
   Maybe it is related with my prev. anserwed posting:
   http://article.gmane.org/gmane.comp.emulators.kvm.devel/29677
  
   Thanks!
   Robert
  
   BUG: soft lockup - CPU#0 stuck for 61s!
   (...)
 
  Hello
 
  Do you use x86_64 or i686?
  Look at my post here
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/29833 And my
  Bug-report here https://bugzilla.redhat.com/show_bug.cgi?id=492688. I do
  not have the problems while running but after migrating. Problems with
  stuck CPUs vanish if i686 for the host is used - but i am testing
  further.
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majordomo at vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Rewrite twisted maze of if() statements with more straightforward switch()

2009-03-30 Thread Jan Kiszka
Avi Kivity wrote:
 Gleb Natapov wrote:
 Signed-off-by: Gleb Natapov g...@redhat.com
   
 
 This is actually not just a rewrite, but also a bugfix:
 
 INTR_INFO);
 @@ -3289,34 +3288,42 @@ static void vmx_complete_interrupts(struct
 vcpu_vmx *vmx)
  vmx-vnmi_blocked_time +=
  ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
  
 +vmx-vcpu.arch.nmi_injected = false;
 +kvm_clear_exception_queue(vmx-vcpu);
 +kvm_clear_interrupt_queue(vmx-vcpu);
 +
 +if (!idtv_info_valid)
 +return;
 +
  vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
  type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
 -if (vmx-vcpu.arch.nmi_injected) {
 +   
 +switch(type) {
 +case INTR_TYPE_NMI_INTR:
 +vmx-vcpu.arch.nmi_injected = true;
  /*
   
 
 The existing code would leave nmi_injected == false if we exit on
 NMI_INTR, so we drop an NMI here.
 

I think NMI_INTR and nmi_injected always go together. However, the
rework looks good and more logical to me, too. Will see that I can give
this (more precisely -v2) a try with our scenarios ASAP.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Jan Kiszka
Gleb Natapov wrote:
 The patch fixes two problems with task switching.
 1. Back link is written to a wrong TSS.
 2. Instruction emulation is not needed if the reason for task switch
is a task gate in IDT and access to it is caused by an external even.
 
 2 is currently solved only for VMX since there is not reliable way to
 skip an instruction in SVM. We should emulate it instead.

Does this series fix all issues Bernhard, Thomas and Julian stumbled over?

Jan

 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
 
  arch/x86/include/asm/svm.h |1 +
  arch/x86/kvm/svm.c |   25 ++---
  arch/x86/kvm/vmx.c |   40 +---
  arch/x86/kvm/x86.c |   40 +++-
  4 files changed, 79 insertions(+), 27 deletions(-)
 
 diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
 index 82ada75..85574b7 100644
 --- a/arch/x86/include/asm/svm.h
 +++ b/arch/x86/include/asm/svm.h
 @@ -225,6 +225,7 @@ struct __attribute__ ((__packed__)) vmcb {
  #define SVM_EVTINJ_VALID_ERR (1  11)
  
  #define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
 +#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
  
  #define  SVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR
  #define  SVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 1fcbc17..3ffb695 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1823,17 +1823,28 @@ static int task_switch_interception(struct vcpu_svm 
 *svm,
   struct kvm_run *kvm_run)
  {
   u16 tss_selector;
 + int reason;
 + int int_type = svm-vmcb-control.exit_int_info 
 + SVM_EXITINTINFO_TYPE_MASK;
  
   tss_selector = (u16)svm-vmcb-control.exit_info_1;
 +
   if (svm-vmcb-control.exit_info_2 
   (1ULL  SVM_EXITINFOSHIFT_TS_REASON_IRET))
 - return kvm_task_switch(svm-vcpu, tss_selector,
 -TASK_SWITCH_IRET);
 - if (svm-vmcb-control.exit_info_2 
 - (1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
 - return kvm_task_switch(svm-vcpu, tss_selector,
 -TASK_SWITCH_JMP);
 - return kvm_task_switch(svm-vcpu, tss_selector, TASK_SWITCH_CALL);
 + reason = TASK_SWITCH_IRET;
 + else if (svm-vmcb-control.exit_info_2 
 +  (1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
 + reason = TASK_SWITCH_JMP;
 + else if (svm-vmcb-control.exit_int_info  SVM_EXITINTINFO_VALID)
 + reason = TASK_SWITCH_GATE;
 + else
 + reason = TASK_SWITCH_CALL;
 +
 +
 + if (reason != TASK_SWITCH_GATE || int_type == SVM_EXITINTINFO_TYPE_SOFT)
 + skip_emulated_instruction(svm-vcpu);
 +
 + return kvm_task_switch(svm-vcpu, tss_selector, reason);
  }
  
  static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 0da7a9e..01db958 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3025,22 +3025,40 @@ static int handle_task_switch(struct kvm_vcpu *vcpu, 
 struct kvm_run *kvm_run)
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   unsigned long exit_qualification;
   u16 tss_selector;
 - int reason;
 + int reason, type, idt_v;
 +
 + idt_v = (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK);
 + type = (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK);
  
   exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
  
   reason = (u32)exit_qualification  30;
 - if (reason == TASK_SWITCH_GATE  vmx-vcpu.arch.nmi_injected 
 - (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK) 
 - (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK)
 - == INTR_TYPE_NMI_INTR) {
 - vcpu-arch.nmi_injected = false;
 - if (cpu_has_virtual_nmis())
 - vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
 -   GUEST_INTR_STATE_NMI);
 + if (reason == TASK_SWITCH_GATE  idt_v) {
 + switch (type) {
 + case INTR_TYPE_NMI_INTR:
 + vcpu-arch.nmi_injected = false;
 + if (cpu_has_virtual_nmis())
 + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
 +   GUEST_INTR_STATE_NMI);
 + break;
 + case INTR_TYPE_EXT_INTR:
 + kvm_clear_interrupt_queue(vcpu);
 + break;
 + case INTR_TYPE_HARD_EXCEPTION:
 + case INTR_TYPE_SOFT_EXCEPTION:
 + kvm_clear_exception_queue(vcpu);
 + break;
 + default:
 + break;
 + }
   }
   tss_selector = exit_qualification;
  
 + if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION 
 +type != 

Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 06:04:45PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  The patch fixes two problems with task switching.
  1. Back link is written to a wrong TSS.
  2. Instruction emulation is not needed if the reason for task switch
 is a task gate in IDT and access to it is caused by an external even.
  
  2 is currently solved only for VMX since there is not reliable way to
  skip an instruction in SVM. We should emulate it instead.
 
 Does this series fix all issues Bernhard, Thomas and Julian stumbled over?
 
Haven't tried. I wrote my own tests for task switching. How can I check it?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Jan Kiszka
Gleb Natapov wrote:
 On Mon, Mar 30, 2009 at 06:04:45PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 The patch fixes two problems with task switching.
 1. Back link is written to a wrong TSS.
 2. Instruction emulation is not needed if the reason for task switch
is a task gate in IDT and access to it is caused by an external even.

 2 is currently solved only for VMX since there is not reliable way to
 skip an instruction in SVM. We should emulate it instead.
 Does this series fix all issues Bernhard, Thomas and Julian stumbled over?

 Haven't tried. I wrote my own tests for task switching. How can I check it?
 

There is a test case attached to Julian's sourceforge-reported bug:

https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599

And I guess Thomas or Bernhard will be happy to give it a try, too... :)

There was one issue, the IRQ injection bug [1] which was related to IRQ
tasks IIRC. Thomas and I finally suspected after a private chat that
there is actually a different reason behind it, something like
interrupt.pending should be cleared when the injection took place via an
(emulated) task switch. Any news on this, Thomas?

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/29288



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 06:35:05PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Mon, Mar 30, 2009 at 06:04:45PM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  The patch fixes two problems with task switching.
  1. Back link is written to a wrong TSS.
  2. Instruction emulation is not needed if the reason for task switch
 is a task gate in IDT and access to it is caused by an external even.
 
  2 is currently solved only for VMX since there is not reliable way to
  skip an instruction in SVM. We should emulate it instead.
  Does this series fix all issues Bernhard, Thomas and Julian stumbled over?
 
  Haven't tried. I wrote my own tests for task switching. How can I check it?
  
 
 There is a test case attached to Julian's sourceforge-reported bug:
 
 https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599
 
I'll try that.

 And I guess Thomas or Bernhard will be happy to give it a try, too... :)
 
 There was one issue, the IRQ injection bug [1] which was related to IRQ
 tasks IIRC. Thomas and I finally suspected after a private chat that
 there is actually a different reason behind it, something like
 interrupt.pending should be cleared when the injection took place via an
 (emulated) task switch. Any news on this, Thomas?
 
If this is the case then the patch series should fix it.

 Jan
 
 [1] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/29288
 


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 06:35:05PM +0200, Jan Kiszka wrote:
  Haven't tried. I wrote my own tests for task switching. How can I check it?
  
 
 There is a test case attached to Julian's sourceforge-reported bug:
 
 https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599
 
Works for me.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU stuck for ...

2009-03-30 Thread Gerrit Slomma
Nikola Ciprich extmaillist at linuxbox.cz writes:

 
 Hi,
 I was also experiencing this problem a lot for quite a long time (and for
 wide range of KVM versions..)
 I might be completely wrong as I'm not sure if it was really the reason,
 but i THINK it disappeared when I started to use fully preemptible kernel on
host..
 You might want to try it...
 BR
 nik

Alas i can't! I am on thin ice compiling libvirt, virt-manager and kvm on my 
own.
Company rules say i have to use upstream packages and there those that come from
the install-mediums. But if live-migration won't work i have to use VMWare.
Seems like i take the fallback of i686 - live-migration seems to work there -
and wait for RHEL 5.4 in fall. Then kvm is said to be the default virtualization
from Red Hat.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO on guest is 20 times slower than host

2009-03-30 Thread Kurt Yoder


On Mar 29, 2009, at 10:29 AM, Avi Kivity wrote:


Kurt Yoder wrote:

slow host cpu information, core 1 of 16:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 4
model name  : Quad-Core AMD Opteron(tm) Processor 8382
stepping: 2
cpu MHz : 2611.998
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr  
pge mca

cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall mmxext fxsr_opt
pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni  
monitor

cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips: 5223.97
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate





Can you loading kvm_amd on this host with 'modprobe kvm-amd npt=0'?


So that's most likely the problem for me:

m...@host:/etc/nagios/nrpe_directives$ sudo modprobe kvm-amd npt=0
FATAL: Error inserting kvm_amd (/lib/modules/2.6.27-11-server/kernel/ 
arch/x86/kvm/kvm-amd.ko): Operation not supported

m...@host:/etc/nagios/nrpe_directives$ uname -a
Linux boron 2.6.27-11-server #1 SMP Thu Jan 29 20:13:12 UTC 2009  
x86_64 GNU/Linux



It looks like I need to enable SVM in my BIOS. I'll do that and report  
back on the results.


-Kurt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2517725 ] Windows 7 CPU Runaway

2009-03-30 Thread SourceForge.net
Bugs item #2517725, was opened at 2009-01-18 12:44
Message generated for change (Comment added) made by martyg7
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2517725group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Martin Gallant (martyg7)
Assigned to: Nobody/Anonymous (nobody)
Summary: Windows 7 CPU Runaway

Initial Comment:
After some uptime in the client, I have observed Windows7 going into a
CPU runaway lockup.

Host is Athlon x2 running Debian Lenny amd64
 + kernel.org 2.6.28 kernel
 + kvm-83
Client is Windows 7 beta 32-bit image, (UP or SMP) with virtio drivers.

This happens when running the guest either as UP or SMPx2.
When running as UP, only one of my host CPUs are affected.
When running as SMPx2, both of my host CPUs are affected.

This can be reproduced with reasonable reliability by either
a) Commanding a restart in the guest machine
b) Significant sustained disk IO traffic, e.g. 200+ MB/s

Invocation:
sudo $KVM -name kvm-windows7 -smp 2 -m 1024 -localtime \
-drive file=/dev/vm/kvm-windows7 \
-drive file=/dev/vm/usenet \
-net nic,macaddr=00:ff:3e:a4:f4:20,model=virtio -net tap \
-daemonize -vnc localhost:1,to=4 -usbdevice tablet

After lockup, here is the backtrace I am pulling from gdb
[Am I doing this right?  I am a little rusty]

(gdb) attach 12717

(gdb) bt
#0  0x7f37b1496ce2 in select () from /lib/libc.so.6
#1  0x004088cb in main_loop_wait (timeout=0)
at /usr/src/kvm-83/qemu/vl.c:3637
#2  0x005142ea in kvm_main_loop ()
at /usr/src/kvm-83/qemu/qemu-kvm.c:600
#3  0x0040c952 in main (argc=value optimized out,
argv=0x7fffba8f1f78, envp=value optimized out)
at /usr/src/kvm-83/qemu/vl.c:3799

Detailed configuration information attached.
Nohing I can find in the logs I would consider relevant.



--

Comment By: Martin Gallant (martyg7)
Date: 2009-03-30 15:36

Message:
Still present, and easily reproducible, on 2.6.29/kvm-84.

--

Comment By: Martin Gallant (martyg7)
Date: 2009-01-18 14:35

Message:
I can reproduce this problem at will using -nic model=e1000
So this has nothing to do with virtio network drivers

I am attaching a gdb log showing backtrace of all 3 process threads

--

Comment By: Technologov (technologov)
Date: 2009-01-18 13:24

Message:
VirtIO drivers will not be supported in Windows 7 BETA -- only in Final.

-Alexey

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2517725group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM + virt-manager: which is the perfect host Linux distro?

2009-03-30 Thread Bill Davidsen

Evert wrote:

Hi all,

I am about to install a new host system, which will be hosting various 
guest systems by means of KVM  virt-manager for GUI.


What would be the best choice for host OS distro? Red Hat, or will any 
mature Linux distro do?
Personally I am more of a Gentoo guy, but if there is 1 distro which is 
clearly better as host OS when it comes to KVM+virt-manager, I am 
willing to use something else...  ;-)


Fedora supports KVM and virt-manager nicely. I can't say it's the best, only 
that it works solidly.


--
Bill Davidsen david...@tmr.com
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More vcd info wanted

2009-03-30 Thread Ryan Harper
* Bill Davidsen david...@tmr.com [2009-03-30 15:51]:
 I am looking for detailed information or a single reproducible example of 
 starting a VM using the qemu-kvm command from a script under Linux (and a 
 display script on a control host, obviously). What software needs to be 
 installed and running on the host, and what needs to be on the remote 
 accessing display.
 
 Please: this is not a question about doing something else using some other 
 method, I need to be able to drop a disk image and a few parameters into a 
 KVM host and start it in such a way that there is not human intervention 
 nor previous preparation such as virt-manager or similar.
 
 I run desktops and servers under KVM using both command line start and 
 managers, I just keep running into documentation which tells me to use a 
 vnc specifier without explanation of what that might look like or a 
 single reproducible example of same.

-vnc localhost:1 -- will display the guest VGA display on the localhost.
A remote system can do:

vncviewer ${kvmhost}:1 

to view the guest VGA.

 
 The host will be given a disk image and some parameters such as MAC address 
 and memory size, and the machine which will have the display. That's my 
 starting point, KVM host info will be used to start the viewer on another 
 machine.
 
 -- 
 Bill Davidsen david...@tmr.com
   We have more to fear from the bungling of the incompetent than from
 the machinations of the wicked.  - from Slashdot
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm binary names

2009-03-30 Thread Bill Davidsen

Daniel P. Berrange wrote:

On Fri, Mar 20, 2009 at 10:57:50AM -0700, jd wrote:

Hi
   What is the motivation for having different kvm binary names on various linux distributions.. ? 


-- kvm
-- qemu-system-x86_84
-- qemu-kvm


I can tell you the history from the Fedora POV at least...

We already had 'qemu', 'qemu-system-x86_64', etc from the existing
plain qemu emulator RPMs we distributed.

The KVM makefile creates a binary call qemu-system-x86_64 but this
clashes with the existing QEMU RPM, so we had to rename it somehow
to allow parallel installation of KVM and QEMU RPMs.

KVM already ships with a python script called 'kvm' and we didn't
want to clash with that either, so we eventually settled on calling
it 'qemu-kvm'. Other distros didn't worry about clash with the python
script so called their binary just 'kvm'

Don't stop there, why does Fedora have both qemu-ppc and qemu-system-ppc and 
so forth? There are many of these, arm and m68k for instance. On x86 I 
assume that they are both emulated, and they are not two names for the same 
executable or such, so what are they and how to choose which to use?


--
Bill Davidsen david...@tmr.com
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More vcd info wanted

2009-03-30 Thread Bill Davidsen

Ryan Harper wrote:

* Bill Davidsen david...@tmr.com [2009-03-30 15:51]:
I am looking for detailed information or a single reproducible example of 
starting a VM using the qemu-kvm command from a script under Linux (and a 
display script on a control host, obviously). What software needs to be 
installed and running on the host, and what needs to be on the remote 
accessing display.


Please: this is not a question about doing something else using some other 
method, I need to be able to drop a disk image and a few parameters into a 
KVM host and start it in such a way that there is not human intervention 
nor previous preparation such as virt-manager or similar.


I run desktops and servers under KVM using both command line start and 
managers, I just keep running into documentation which tells me to use a 
vnc specifier without explanation of what that might look like or a 
single reproducible example of same.


-vnc localhost:1 -- will display the guest VGA display on the localhost.
A remote system can do:

vncviewer ${kvmhost}:1 


to view the guest VGA.

Thanks, will try later tonight. Have to have a bit of care getting the number 
right (unique) since there might be more than one of these, but this may be all 
I need.


The host will be given a disk image and some parameters such as MAC address 
and memory size, and the machine which will have the display. That's my 
starting point, KVM host info will be used to start the viewer on another 
machine.




--
Bill Davidsen david...@tmr.com
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More vcd info wanted

2009-03-30 Thread Anthony Liguori

Ryan Harper wrote:

* Bill Davidsen david...@tmr.com [2009-03-30 15:51]:
  
I am looking for detailed information or a single reproducible example of 
starting a VM using the qemu-kvm command from a script under Linux (and a 
display script on a control host, obviously). What software needs to be 
installed and running on the host, and what needs to be on the remote 
accessing display.


Please: this is not a question about doing something else using some other 
method, I need to be able to drop a disk image and a few parameters into a 
KVM host and start it in such a way that there is not human intervention 
nor previous preparation such as virt-manager or similar.


I run desktops and servers under KVM using both command line start and 
managers, I just keep running into documentation which tells me to use a 
vnc specifier without explanation of what that might look like or a 
single reproducible example of same.



-vnc localhost:1 -- will display the guest VGA display on the localhost.
A remote system can do:

vncviewer ${kvmhost}:1 
  


If you say -vnc localhost:1, then vncviewer ${kvmhost}:1 will certainly 
not work.


You have to say -vnc :1, then vncviewer ${kvmhost}:1.

And happiness will ensue by s/vncviewer/vinagre/.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm binary names

2009-03-30 Thread Glauber Costa
On Mon, Mar 30, 2009 at 6:12 PM, Bill Davidsen david...@tmr.com wrote:
 Daniel P. Berrange wrote:

 On Fri, Mar 20, 2009 at 10:57:50AM -0700, jd wrote:

 Hi
   What is the motivation for having different kvm binary names on various
 linux distributions.. ?
 -- kvm
 -- qemu-system-x86_84
 -- qemu-kvm

 I can tell you the history from the Fedora POV at least...

 We already had 'qemu', 'qemu-system-x86_64', etc from the existing
 plain qemu emulator RPMs we distributed.

 The KVM makefile creates a binary call qemu-system-x86_64 but this
 clashes with the existing QEMU RPM, so we had to rename it somehow
 to allow parallel installation of KVM and QEMU RPMs.

 KVM already ships with a python script called 'kvm' and we didn't
 want to clash with that either, so we eventually settled on calling
 it 'qemu-kvm'. Other distros didn't worry about clash with the python
 script so called their binary just 'kvm'

 Don't stop there, why does Fedora have both qemu-ppc and qemu-system-ppc
 and so forth? There are many of these, arm and m68k for instance. On x86
 I assume that they are both emulated, and they are not two names for the
 same executable or such, so what are they and how to choose which to use?
one of them being the userspace linux emulator, and the other, the
system emulator.




-- 
Glauber  Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


read/write performance degredation in ubuntu/debian?

2009-03-30 Thread Richard Wurman
I have run iozone on my  ubuntu 8.10 VM and it says:

random
randombkwd  record  stride
  KB  reclen   write rewritereadrereadread
writeread rewriteread   fwrite frewrite   fread  freread
4096   4   17520   49143   1194284   87115
44566   94074  50770   903871705444816  10441588611

... and that is unbearably slow. My  debian etch VM is somewhat better:

   random
randombkwd  record  stride
  KB  reclen   write rewritereadrereadread
writeread rewriteread   fwrite frewrite   fread  freread
4096   4   29559   80552   142286   144129  109956
73759  122737   86741  1148942800168013  128927   131687

..but viewing still very slow by a factor of 20, compared to my rPath
(a redhat-derived distro) 1.07 VM:

random
randombkwd  record  stride
  KB  reclen   write rewritereadrereadread
writeread rewriteread   fwrite frewrite   fread  freread
4096   4  475392  724942  1043308  1080466 1012386
768492  970375 1683570  972908   447357   693303  993935  1059245

Maybe in debian/ubuntu there is some kernel setting related to disk
I/O that I need to tweak? Anyone else seen this problem before?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: More vcd info wanted

2009-03-30 Thread Charles Duffy

Bill Davidsen wrote:

Ryan Harper wrote:

-vnc localhost:1 -- will display the guest VGA display on the localhost.
A remote system can do:

vncviewer ${kvmhost}:1
to view the guest VGA.

Thanks, will try later tonight. Have to have a bit of care getting the 
number right (unique) since there might be more than one of these, but 
this may be all I need.


You might consider using libvirt, which (among many other relevant 
features) can dynamically assign VNC ports (thus managing the uniqueness 
constraint) and will expose the currently selected port as part of the 
domain's XML configuration. (Getting a VNC viewer going with libvirt is 
considerably easier than that, though -- virt-viewer VM_NAME will do 
the trick).


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Julian Stecklina
Gleb Natapov g...@redhat.com writes:

 On Mon, Mar 30, 2009 at 06:35:05PM +0200, Jan Kiszka wrote:
  Haven't tried. I wrote my own tests for task switching. How can I check it?
  
 
 There is a test case attached to Julian's sourceforge-reported bug:
 
 https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599
 
 Works for me.

Then the patches should be fine (at least for me *g*).

Regards,
-- 
Julian Stecklina

The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners - Ernst Jan Plugge

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-30 Thread Izik Eidus
KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.
The sharing is unnoticeable by the process that use this memory.
(the shared pages are marked as readonly, and in case of write
do_wp_page() take care to create new copy of the page)

To find identical pages ksm use algorithm that is split into three
primery levels:

1) Ksm will start scan the memory and will calculate checksum for each
   page that is registred to be scanned.
   (In the first round of the scanning, ksm would only calculate
this checksum for all the pages)

2) Ksm will go again on the whole memory and will recalculate the
   checmsum of the pages, pages that are found to have the same
   checksum value, would be considered pages that are most likely
   wont changed
   Ksm will insert this pages into sorted by page content RB-tree that
   is called unstable tree, the reason that this tree is called
   unstable is due to the fact that the page contents might changed
   while they are still inside the tree, and therefore the tree would
   become corrupted.
   Due to this problem ksm take two more steps in addition to the
   checksum calculation:
   a) Ksm will throw and recreate the entire unstable tree each round
  of memory scanning - so if we have corruption, it will be fixed
  when we will rebuild the tree.
   b) Ksm is using RB-tree, that its balancing is made by the node color
  and not by the content, so even if the page get corrupted, it still
  would take the same amount of time to search on it.

3) In addition to the unstable tree, ksm hold another tree that is called
   stable tree - this tree is RB-tree that is sorted by the pages
   content and all its pages are write protected, and therefore it cant get
   corrupted.
   Each time ksm will find two identcial pages using the unstable tree,
   it will create new write-protected shared page, and this page will be
   inserted into the stable tree, and would be saved there, the
   stable tree, unlike the unstable tree, is never throwen away, so each
   page that we find would be saved inside it.

Taking into account the three levels that described above, the algorithm
work like that:

search primary tree (sorted by entire page contents, pages write protected)
- if match found, merge
- if no match found...
  - search secondary tree (sorted by entire page contents, pages not write
protected)
- if match found, merge
  - remove from secondary tree and insert merged page into primary tree
- if no match found...
  - checksum
- if checksum hasn't changed
  - insert into secondary tree
- if it has, store updated checksum (note: first time this page
  is handled it won't have a checksum, so checksum will appear
  as changed, so it takes two passes w/ no other matches to
  get into secondary tree)
  - do not insert into any tree, will see it again on next pass

The basic idea of this algorithm, is that even if the unstable tree doesnt
promise to us to find two identical pages in the first round, we would
probably find them in the second or the third or the tenth round,
then after we have found this two identical pages only once, we will insert
them into the stable tree, and then they would be protected there forever.
So the all idea of the unstable tree, is just to build the stable tree and
then we will find the identical pages using it.

The current implemantion can be improved alot:
we dont have to calculate exspensive checksum, we can just use the host
dirty bit.

currently we dont support shared pages swapping (other pages that are not
shared can be swapped (all the pages that we didnt find to be identical
to other pages...).

Walking on the tree, we keep call to get_user_pages(), we can optimized it
by saving the pfn, and using mmu notifiers to know when the virtual address
mapping was changed.

We currently scan just programs that were registred to be used by ksm, we
would later want to add the abilaty to tell ksm to scan PIDS (so you can
scan closed binary applications as well).

Right now ksm scanning is made by just one thread, multiple scanners
support might would be needed.

This driver is very useful for KVM as in cases of runing multiple guests
operation system of the same type.
(For desktop work loads we have achived more than x2 memory overcommit
(more like x3))

This driver have found users other than KVM, for example CERN,
Fons Rademakers:
on many-core machines we run one large detector simulation program per core.
These simulation programs are identical but run each in their own process and
need about 2 - 2.5 GB RAM.
We typically buy machines with 2GB RAM per core and so have a problem to run
one of these programs per core.
Of 

[PATCH 1/4] MMU_NOTIFIERS: add set_pte_at_notify()

2009-03-30 Thread Izik Eidus
this macro allow setting the pte in the shadow page tables directly
instead of flushing the shadow page table entry and then get vmexit in
order to set it.

This function is optimzation for kvm/users of mmu_notifiers for COW
pages, it is useful for kvm when ksm is used beacuse it allow kvm
not to have to recive VMEXIT and only then map the shared page into
the mmu shadow pages, but instead map it directly at the same time
linux map the page into the host page table.

this mmu notifer macro is working by calling to callback that will map
directly the physical page into the shadow page tables.

(users of mmu_notifiers that didnt implement the set_pte_at_notify()
call back will just recive the mmu_notifier_invalidate_page callback)

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/mmu_notifier.h |   34 ++
 mm/memory.c  |   10 --
 mm/mmu_notifier.c|   20 
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b77486d..8bb245f 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -61,6 +61,15 @@ struct mmu_notifier_ops {
 struct mm_struct *mm,
 unsigned long address);
 
+   /* 
+   * change_pte is called in cases that pte mapping into page is changed
+   * for example when ksm mapped pte to point into a new shared page.
+   */
+   void (*change_pte)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long address,
+  pte_t pte);
+
/*
 * Before this is invoked any secondary MMU is still ok to
 * read/write to the page previously pointed to by the Linux
@@ -154,6 +163,8 @@ extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
 extern void __mmu_notifier_release(struct mm_struct *mm);
 extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
  unsigned long address);
+extern void __mmu_notifier_change_pte(struct mm_struct *mm, 
+ unsigned long address, pte_t pte);
 extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address);
 extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
@@ -175,6 +186,13 @@ static inline int mmu_notifier_clear_flush_young(struct 
mm_struct *mm,
return 0;
 }
 
+static inline void mmu_notifier_change_pte(struct mm_struct *mm,
+  unsigned long address, pte_t pte)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_change_pte(mm, address, pte);
+}
+
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address)
 {
@@ -236,6 +254,16 @@ static inline void mmu_notifier_mm_destroy(struct 
mm_struct *mm)
__young;\
 })
 
+#define set_pte_at_notify(__mm, __address, __ptep, __pte)  \
+({ \
+   struct mm_struct *___mm = __mm; \
+   unsigned long ___address = __address;   \
+   pte_t ___pte = __pte;   \
+   \
+   set_pte_at(__mm, __address, __ptep, ___pte);\
+   mmu_notifier_change_pte(___mm, ___address, ___pte); \
+})
+
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -248,6 +276,11 @@ static inline int mmu_notifier_clear_flush_young(struct 
mm_struct *mm,
return 0;
 }
 
+static inline void mmu_notifier_change_pte(struct mm_struct *mm,
+  unsigned long address, pte_t pte)
+{
+}
+
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address)
 {
@@ -273,6 +306,7 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct 
*mm)
 
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define ptep_clear_flush_notify ptep_clear_flush
+#define set_pte_at_notify set_pte_at
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
diff --git a/mm/memory.c b/mm/memory.c
index baa999e..0382a34 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2031,9 +2031,15 @@ gotten:
 * seen in the presence of one thread doing SMC and another
 * thread doing COW.
 */
-   ptep_clear_flush_notify(vma, address, page_table);
+   ptep_clear_flush(vma, address, page_table);
page_add_new_anon_rmap(new_page, vma, address);
-  

[PATCH 2/4] add page_wrprotect(): write protecting page.

2009-03-30 Thread Izik Eidus
this patch add new function called page_wrprotect(),
page_wrprotect() is used to take a page and mark all the pte that
point into it as readonly.

The function is working by walking the rmap of the page, and setting
each pte realted to the page as readonly.

The odirect_sync parameter is used to protect against possible races
with odirect while we are marking the pte as readonly,
as noted by Andrea Arcanglei:

While thinking at get_user_pages_fast I figured another worse way
things can go wrong with ksm and o_direct: think a thread writing
constantly to the last 512bytes of a page, while another thread read
and writes to/from the first 512bytes of the page. We can lose
O_DIRECT reads, the very moment we mark any pte wrprotected...

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/rmap.h |   11 
 mm/rmap.c|  139 ++
 2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b35bc0e..469376d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -118,6 +118,10 @@ static inline int try_to_munlock(struct page *page)
 }
 #endif
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+int page_wrprotect(struct page *page, int *odirect_sync, int count_offset);
+#endif
+
 #else  /* !CONFIG_MMU */
 
 #define anon_vma_init()do {} while (0)
@@ -132,6 +136,13 @@ static inline int page_mkclean(struct page *page)
return 0;
 }
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+static inline int page_wrprotect(struct page *page, int *odirect_sync,
+int count_offset)
+{
+   return 0;
+}
+#endif
 
 #endif /* CONFIG_MMU */
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 1652166..95c55ea 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -585,6 +585,145 @@ int page_mkclean(struct page *page)
 }
 EXPORT_SYMBOL_GPL(page_mkclean);
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+
+static int page_wrprotect_one(struct page *page, struct vm_area_struct *vma,
+ int *odirect_sync, int count_offset)
+{
+   struct mm_struct *mm = vma-vm_mm;
+   unsigned long address;
+   pte_t *pte;
+   spinlock_t *ptl;
+   int ret = 0;
+
+   address = vma_address(page, vma);
+   if (address == -EFAULT)
+   goto out;
+
+   pte = page_check_address(page, mm, address, ptl, 0);
+   if (!pte)
+   goto out;
+
+   if (pte_write(*pte)) {
+   pte_t entry;
+
+   flush_cache_page(vma, address, pte_pfn(*pte));
+   /*
+* Ok this is tricky, when get_user_pages_fast() run it doesnt
+* take any lock, therefore the check that we are going to make
+* with the pagecount against the mapcount is racey and
+* O_DIRECT can happen right after the check.
+* So we clear the pte and flush the tlb before the check
+* this assure us that no O_DIRECT can happen after the check
+* or in the middle of the check.
+*/
+   entry = ptep_clear_flush(vma, address, pte);
+   /*
+* Check that no O_DIRECT or similar I/O is in progress on the
+* page
+*/
+   if ((page_mapcount(page) + count_offset) != page_count(page)) {
+   *odirect_sync = 0;
+   set_pte_at_notify(mm, address, pte, entry);
+   goto out_unlock;
+   }
+   entry = pte_wrprotect(entry);
+   set_pte_at_notify(mm, address, pte, entry);
+   }
+   ret = 1;
+
+out_unlock:
+   pte_unmap_unlock(pte, ptl);
+out:
+   return ret;
+}
+
+static int page_wrprotect_file(struct page *page, int *odirect_sync,
+  int count_offset)
+{
+   struct address_space *mapping;
+   struct prio_tree_iter iter;
+   struct vm_area_struct *vma;
+   pgoff_t pgoff = page-index  (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+   int ret = 0;
+
+   mapping = page_mapping(page);
+   if (!mapping)
+   return ret;
+
+   spin_lock(mapping-i_mmap_lock);
+
+   vma_prio_tree_foreach(vma, iter, mapping-i_mmap, pgoff, pgoff)
+   ret += page_wrprotect_one(page, vma, odirect_sync,
+ count_offset);
+
+   spin_unlock(mapping-i_mmap_lock);
+
+   return ret;
+}
+
+static int page_wrprotect_anon(struct page *page, int *odirect_sync,
+  int count_offset)
+{
+   struct vm_area_struct *vma;
+   struct anon_vma *anon_vma;
+   int ret = 0;
+
+   anon_vma = page_lock_anon_vma(page);
+   if (!anon_vma)
+   return ret;
+
+   /*
+* If the page is inside the swap cache, its _count number was
+* increased by one, therefore we have to increase 

[PATCH 3/4] add replace_page(): change the page pte is pointing to.

2009-03-30 Thread Izik Eidus
replace_page() allow changing the mapping of pte from one physical page
into diffrent physical page.

this function is working by removing oldpage from the rmap and calling
put_page on it, and by setting the pte to point into newpage and by
inserting it to the rmap using page_add_file_rmap().

note: newpage must be non anonymous page, the reason for this is:
replace_page() is built to allow mapping one page into more than one
virtual addresses, the mapping of this page can happen in diffrent
offsets inside each vma, and therefore we cannot trust the page-index
anymore.

the side effect of this issue is that newpage cannot be anything but
kernel allocated page that is not swappable.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/mm.h |5 +++
 mm/memory.c|   80 
 2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065cdf8..b19e4c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1237,6 +1237,11 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned 
long addr,
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+int replace_page(struct vm_area_struct *vma, struct page *oldpage,
+struct page *newpage, pte_t orig_pte, pgprot_t prot);
+#endif
+
 struct page *follow_page(struct vm_area_struct *, unsigned long address,
unsigned int foll_flags);
 #define FOLL_WRITE 0x01/* check pte is writable */
diff --git a/mm/memory.c b/mm/memory.c
index 0382a34..3946e79 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1562,6 +1562,86 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned 
long addr,
 }
 EXPORT_SYMBOL(vm_insert_mixed);
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+
+/**
+ * replace_page - replace page in vma with new page
+ * @vma:  vma that hold the pte oldpage is pointed by.
+ * @oldpage:  the page we are replacing with newpage
+ * @newpage:  the page we replace oldpage with
+ * @orig_pte: the original value of the pte
+ * @prot: page protection bits
+ *
+ * Returns 0 on success, -EFAULT on failure.
+ *
+ * Note: @newpage must not be an anonymous page because replace_page() does
+ * not change the mapping of @newpage to have the same values as @oldpage.
+ * @newpage can be mapped in several vmas at different offsets (page-index).
+ */
+int replace_page(struct vm_area_struct *vma, struct page *oldpage,
+struct page *newpage, pte_t orig_pte, pgprot_t prot)
+{
+   struct mm_struct *mm = vma-vm_mm;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep;
+   spinlock_t *ptl;
+   unsigned long addr;
+   int ret;
+
+   BUG_ON(PageAnon(newpage));
+
+   ret = -EFAULT;
+   addr = page_address_in_vma(oldpage, vma);
+   if (addr == -EFAULT)
+   goto out;
+
+   pgd = pgd_offset(mm, addr);
+   if (!pgd_present(*pgd))
+   goto out;
+
+   pud = pud_offset(pgd, addr);
+   if (!pud_present(*pud))
+   goto out;
+
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_present(*pmd))
+   goto out;
+
+   ptep = pte_offset_map_lock(mm, pmd, addr, ptl);
+   if (!ptep)
+   goto out;
+
+   if (!pte_same(*ptep, orig_pte)) {
+   pte_unmap_unlock(ptep, ptl);
+   goto out;
+   }
+
+   ret = 0;
+   get_page(newpage);
+   page_add_file_rmap(newpage);
+
+   flush_cache_page(vma, addr, pte_pfn(*ptep));
+   ptep_clear_flush(vma, addr, ptep);
+   set_pte_at_notify(mm, addr, ptep, mk_pte(newpage, prot));
+
+   page_remove_rmap(oldpage);
+   if (PageAnon(oldpage)) {
+   dec_mm_counter(mm, anon_rss);
+   inc_mm_counter(mm, file_rss);
+   }
+   put_page(oldpage);
+
+   pte_unmap_unlock(ptep, ptl);
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL_GPL(replace_page);
+
+#endif
+
 /*
  * maps a range of physical memory into the requested pages. the old
  * mappings are removed. any references to nonexistent pages results
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] kvm support for ksm

2009-03-30 Thread Izik Eidus
apply it against Avi git tree.

Izik Eidus (3):
  kvm: dont hold pagecount reference for mapped sptes pages.
  kvm: add SPTE_HOST_WRITEABLE flag to the shadow ptes.
  kvm: add support for change_pte mmu notifiers

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   89 ---
 arch/x86/kvm/paging_tmpl.h  |   16 ++-
 virt/kvm/kvm_main.c |   14 ++
 4 files changed, 101 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.

2009-03-30 Thread Izik Eidus
When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus iei...@redhat.com
---
 arch/x86/kvm/mmu.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b625ed4..df8fbaf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -567,9 +567,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
if (*spte  shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
if (is_writeble_pte(*spte))
-   kvm_release_pfn_dirty(pfn);
-   else
-   kvm_release_pfn_clean(pfn);
+   kvm_set_pfn_dirty(pfn);
rmapp = gfn_to_rmap(kvm, sp-gfns[spte - sp-spt], is_large_pte(*spte));
if (!*rmapp) {
printk(KERN_ERR rmap_remove: %p %llx 0-BUG\n, spte, *spte);
@@ -1812,8 +1810,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
page_header_update_slot(vcpu-kvm, shadow_pte, gfn);
if (!was_rmapped) {
rmap_add(vcpu, shadow_pte, gfn, largepage);
-   if (!is_rmap_pte(*shadow_pte))
-   kvm_release_pfn_clean(pfn);
+   kvm_release_pfn_clean(pfn);
} else {
if (was_writeble)
kvm_release_pfn_dirty(pfn);
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] kvm: add SPTE_HOST_WRITEABLE flag to the shadow ptes.

2009-03-30 Thread Izik Eidus
this flag notify that the host physical page we are pointing to from
the spte is write protected, and therefore we cant change its access
to be write unless we run get_user_pages(write = 1).

(this is needed for change_pte support in kvm)

Signed-off-by: Izik Eidus iei...@redhat.com
---
 arch/x86/kvm/mmu.c |   14 ++
 arch/x86/kvm/paging_tmpl.h |   16 +---
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index df8fbaf..6b4d795 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -138,6 +138,8 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#define SPTE_HOST_WRITEABLE (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
+
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {
@@ -1676,7 +1678,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
unsigned pte_access, int user_fault,
int write_fault, int dirty, int largepage,
int global, gfn_t gfn, pfn_t pfn, bool speculative,
-   bool can_unsync)
+   bool can_unsync, bool reset_host_protection)
 {
u64 spte;
int ret = 0;
@@ -1719,6 +1721,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
kvm_x86_ops-get_mt_mask_shift();
spte |= mt_mask;
}
+   if (reset_host_protection)
+   spte |= SPTE_HOST_WRITEABLE;
 
spte |= (u64)pfn  PAGE_SHIFT;
 
@@ -1764,7 +1768,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 unsigned pt_access, unsigned pte_access,
 int user_fault, int write_fault, int dirty,
 int *ptwrite, int largepage, int global,
-gfn_t gfn, pfn_t pfn, bool speculative)
+gfn_t gfn, pfn_t pfn, bool speculative,
+bool reset_host_protection)
 {
int was_rmapped = 0;
int was_writeble = is_writeble_pte(*shadow_pte);
@@ -1793,7 +1798,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
was_rmapped = 1;
}
if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
- dirty, largepage, global, gfn, pfn, speculative, true)) {
+ dirty, largepage, global, gfn, pfn, speculative, true,
+ reset_host_protection)) {
if (write_fault)
*ptwrite = 1;
kvm_x86_ops-tlb_flush(vcpu);
@@ -1840,7 +1846,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
|| (largepage  iterator.level == PT_DIRECTORY_LEVEL)) {
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 0, write, 1, pt_write,
-largepage, 0, gfn, pfn, false);
+largepage, 0, gfn, pfn, false, true);
++vcpu-stat.pf_fixed;
break;
}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index eae9499..9fdacd0 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -259,10 +259,14 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
if (mmu_notifier_retry(vcpu, vcpu-arch.update_pte.mmu_seq))
return;
kvm_get_pfn(pfn);
+   /*
+* we call mmu_set_spte() with reset_host_protection = true beacuse that
+* vcpu-arch.update_pte.pfn was fetched from get_user_pages(write = 1).
+*/
mmu_set_spte(vcpu, spte, page-role.access, pte_access, 0, 0,
 gpte  PT_DIRTY_MASK, NULL, largepage,
 gpte  PT_GLOBAL_MASK, gpte_to_gfn(gpte),
-pfn, true);
+pfn, true, true);
 }
 
 /*
@@ -297,7 +301,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 gw-ptes[gw-level-1]  PT_DIRTY_MASK,
 ptwrite, largepage,
 gw-ptes[gw-level-1]  PT_GLOBAL_MASK,
-gw-gfn, pfn, false);
+gw-gfn, pfn, false, true);
break;
}
 
@@ -547,6 +551,7 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
int i, offset, nr_present;
+bool reset_host_protection = 1;
 
offset = nr_present = 0;
 
@@ -584,9 +589,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
 
nr_present++;
pte_access = 

[PATCH 3/3] kvm: add support for change_pte mmu notifiers

2009-03-30 Thread Izik Eidus
this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   68 +++
 virt/kvm/kvm_main.c |   14 
 3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8351c4d..9062729 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -791,5 +791,6 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6b4d795..f8816dd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -257,6 +257,11 @@ static pfn_t spte_to_pfn(u64 pte)
return (pte  PT64_BASE_ADDR_MASK)  PAGE_SHIFT;
 }
 
+static pte_t ptep_val(pte_t *ptep)
+{
+   return *ptep;
+}
+
 static gfn_t pse36_gfn_delta(u32 gpte)
 {
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
@@ -678,7 +683,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
return write_protected;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+  unsigned long data)
 {
u64 *spte;
int need_tlb_flush = 0;
@@ -693,8 +699,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp)
return need_tlb_flush;
 }
 
+static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
+unsigned long data)
+{
+   int need_flush = 0;
+   u64 *spte, new_spte;
+   pte_t *ptep = (pte_t *)data;
+   pfn_t new_pfn;
+
+   new_pfn = pte_pfn(ptep_val(ptep));
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   BUG_ON(!is_shadow_present_pte(*spte));
+   rmap_printk(kvm_set_pte_rmapp: spte %p %llx\n, spte, *spte);
+   need_flush = 1;
+   if (pte_write(ptep_val(ptep))) {
+   rmap_remove(kvm, spte);
+   set_shadow_pte(spte, shadow_trap_nonpresent_pte);
+   spte = rmap_next(kvm, rmapp, NULL);
+   } else {
+   new_spte = *spte ~ (PT64_BASE_ADDR_MASK);
+   new_spte |= new_pfn  PAGE_SHIFT;
+
+   if (!pte_write(ptep_val(ptep))) {
+   new_spte = ~PT_WRITABLE_MASK;
+   new_spte = ~SPTE_HOST_WRITEABLE;
+   if (is_writeble_pte(*spte))
+   kvm_set_pfn_dirty(spte_to_pfn(*spte));
+   }
+   set_shadow_pte(spte, new_spte);
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+   }
+   if (need_flush)
+   kvm_flush_remote_tlbs(kvm);
+
+   return 0;
+}
+
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
- int (*handler)(struct kvm *kvm, unsigned long *rmapp))
+ unsigned long data,
+ int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+unsigned long data))
 {
int i;
int retval = 0;
@@ -715,11 +761,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
end = start + (memslot-npages  PAGE_SHIFT);
if (hva = start  hva  end) {
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
-   retval |= handler(kvm, memslot-rmap[gfn_offset]);
+   retval |= handler(kvm, memslot-rmap[gfn_offset],
+ data);
retval |= handler(kvm,
  memslot-lpage_info[
  gfn_offset /
- 
KVM_PAGES_PER_HPAGE].rmap_pde);
+ KVM_PAGES_PER_HPAGE].rmap_pde,
+ data);
}
}
 
@@ -728,10 +776,16 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long 
hva,
 
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
-   return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
+   return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
+}
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+   kvm_handle_hva(kvm, hva, (unsigned long)pte, kvm_set_pte_rmapp);
 }
 
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int 

[PATCH 0/2] kvm-userspace ksm support

2009-03-30 Thread Izik Eidus
Apply it against Avi kvm-userspace git tree.

Izik Eidus (2):
  qemu: add ksm support
  qemu: add ksmctl.

 qemu/ksm.h |   70 
 qemu/vl.c  |   34 +
 user/Makefile  |6 +++-
 user/config-x86-common.mak |2 +-
 user/ksmctl.c  |   69 +++
 5 files changed, 179 insertions(+), 2 deletions(-)
 create mode 100644 qemu/ksm.h
 create mode 100644 user/ksmctl.c

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] qemu: add ksm support

2009-03-30 Thread Izik Eidus
Signed-off-by: Izik Eidus iei...@redhat.com
---
 qemu/ksm.h |   70 
 qemu/vl.c  |   34 +
 2 files changed, 104 insertions(+), 0 deletions(-)
 create mode 100644 qemu/ksm.h

diff --git a/qemu/ksm.h b/qemu/ksm.h
new file mode 100644
index 000..2fb91a8
--- /dev/null
+++ b/qemu/ksm.h
@@ -0,0 +1,70 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+
+#include sys/types.h
+#include sys/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+struct ksm_kthread_info {
+   __u32 sleep; /* number of microsecoends to sleep */
+   __u32 pages_to_scan; /* number of pages to scan */
+   __u32 flags; /* control flags */
+__u32 pad;
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+/*
+ * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
+ * (can stop the kernel thread from working by setting running = 0)
+ */
+#define KSM_START_STOP_KTHREAD  _IOW(KSMIO,  0x02,\
+ struct ksm_kthread_info)
+/*
+ * KSM_GET_INFO_KTHREAD - return information about the kernel thread
+ * scanning speed.
+ */
+#define KSM_GET_INFO_KTHREAD_IOW(KSMIO,  0x03,\
+ struct ksm_kthread_info)
+
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/qemu/vl.c b/qemu/vl.c
index c52d2d7..54a9dd9 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -130,6 +130,7 @@ int main(int argc, char **argv)
 #define main qemu_main
 #endif /* CONFIG_COCOA */
 
+#include ksm.h
 #include hw/hw.h
 #include hw/boards.h
 #include hw/usb.h
@@ -4873,6 +4874,37 @@ static void termsig_setup(void)
 
 #endif
 
+static int ksm_register_memory(void)
+{
+int fd;
+int ksm_fd;
+int r = 1;
+struct ksm_memory_region ksm_region;
+
+fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
+if (fd == -1)
+goto out;
+
+ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
+if (ksm_fd == -1)
+goto out_free;
+
+ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE;
+ksm_region.addr = (unsigned long)phys_ram_base;
+r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
+if (r)
+goto out_free1;
+
+return r;
+
+out_free1:
+close(ksm_fd);
+out_free:
+close(fd);
+out:
+return r;
+}
+
 int main(int argc, char **argv, char **envp)
 {
 #ifdef CONFIG_GDBSTUB
@@ -5862,6 +5894,8 @@ int main(int argc, char **argv, char **envp)
 /* init the dynamic translator */
 cpu_exec_init_all(tb_size * 1024 * 1024);
 
+ksm_register_memory();
+
 bdrv_init();
 dma_helper_init();
 
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] qemu: add ksmctl.

2009-03-30 Thread Izik Eidus
userspace tool to control the ksm kernel thread

Signed-off-by: Izik Eidus iei...@redhat.com
---
 user/Makefile  |6 +++-
 user/config-x86-common.mak |2 +-
 user/ksmctl.c  |   69 
 3 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100644 user/ksmctl.c

diff --git a/user/Makefile b/user/Makefile
index cf7f8ed..a291b37 100644
--- a/user/Makefile
+++ b/user/Makefile
@@ -39,6 +39,10 @@ autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d
 
 LDFLAGS += -pthread -lrt
 
+ksmctl_objs= ksmctl.o
+ksmctl: $(ksmctl_objs)
+   $(CC) $(LDFLAGS) $^ -o $@
+
 kvmtrace_objs= kvmtrace.o
 
 kvmctl: $(kvmctl_objs)
@@ -56,4 +60,4 @@ $(libcflat): $(cflatobjs)
 -include .*.d
 
 clean: arch_clean
-   $(RM) kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs)
+   $(RM) ksmctl kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs)
diff --git a/user/config-x86-common.mak b/user/config-x86-common.mak
index e789fd4..4303aee 100644
--- a/user/config-x86-common.mak
+++ b/user/config-x86-common.mak
@@ -1,6 +1,6 @@
 #This is a make file with common rules for both x86  x86-64
 
-all: kvmctl kvmtrace test_cases
+all: ksmctl kvmctl kvmtrace test_cases
 
 kvmctl_objs= main.o iotable.o ../libkvm/libkvm.a
 balloon_ctl: balloon_ctl.o
diff --git a/user/ksmctl.c b/user/ksmctl.c
new file mode 100644
index 000..034469f
--- /dev/null
+++ b/user/ksmctl.c
@@ -0,0 +1,69 @@
+#include stdio.h
+#include stdlib.h
+#include string.h
+#include sys/types.h
+#include sys/stat.h
+#include sys/ioctl.h
+#include fcntl.h
+#include sys/mman.h
+#include unistd.h
+#include ../qemu/ksm.h
+
+int main(int argc, char *argv[])
+{
+   int fd;
+   int used = 0;
+   int fd_start;
+   struct ksm_kthread_info info;
+   
+
+   if (argc  2) {
+   fprintf(stderr, usage: %s {start npages sleep | stop | 
info}\n, argv[0]);
+   exit(1);
+   }
+
+   fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
+   if (fd == -1) {
+   fprintf(stderr, could not open /dev/ksm\n);
+   exit(1);
+   }
+
+   if (!strncmp(argv[1], start, strlen(argv[1]))) {
+   used = 1;
+   if (argc  4) {
+   fprintf(stderr,
+   usage: %s start npages_to_scan sleep\n,
+   argv[0]);
+   exit(1);
+   }
+   info.pages_to_scan = atoi(argv[2]);
+   info.sleep = atoi(argv[3]);
+   info.flags = ksm_control_flags_run;
+
+   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
+   if (fd_start == -1) {
+   fprintf(stderr, KSM_START_KTHREAD failed\n);
+   exit(1);
+   }
+   printf(created scanner\n);
+   }
+
+   if (!strncmp(argv[1], stop, strlen(argv[1]))) {
+   used = 1;
+   info.flags = 0;
+   fd_start = ioctl(fd, KSM_START_STOP_KTHREAD, info);
+   printf(stopped scanner\n);
+   }
+
+   if (!strncmp(argv[1], info, strlen(argv[1]))) {
+   used = 1;
+   ioctl(fd, KSM_GET_INFO_KTHREAD, info);
+printf(flags %d, pages_to_scan %d, sleep_time %d\n,
+info.flags, info.pages_to_scan, info.sleep);
+   }
+
+   if (!used)
+   fprintf(stderr, unknown command %s\n, argv[1]);
+
+   return 0;
+}
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread Izik Eidus
Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/ksm.h|   69 +++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1431 
 5 files changed, 1508 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..5776dce
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,69 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+struct ksm_kthread_info {
+   __u32 sleep; /* number of microsecoends to sleep */
+   __u32 pages_to_scan; /* number of pages to scan */
+   __u32 flags; /* control flags */
+__u32 pad;
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+/*
+ * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
+ * (can stop the kernel thread from working by setting running = 0)
+ */
+#define KSM_START_STOP_KTHREAD  _IOW(KSMIO,  0x02,\
+ struct ksm_kthread_info)
+/*
+ * KSM_GET_INFO_KTHREAD - return information about the kernel thread
+ * scanning speed.
+ */
+#define KSM_GET_INFO_KTHREAD_IOW(KSMIO,  0x03,\
+ struct ksm_kthread_info)
+
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index a820f81..6d4f8df 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -29,6 +29,7 @@
 

FW: Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit

2009-03-30 Thread Dong, Eddie
Avi Kivity wrote:
 Dong, Eddie wrote:
 @@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu
  *vcpu, int level) context-rsvd_bits_mask[1][0] = 0;
  break;
  case PT32E_ROOT_LEVEL:
 +context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
 +rsvd_bits(maxphyaddr, 62) |
 +rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
  context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
  rsvd_bits(maxphyaddr, 62);  /* PDE */
  context-rsvd_bits_mask[0][0] = exb_bit_rsvd
 
 Are you sure that PDPTEs support NX?  They don't support R/W and U/S,
 so it seems likely that NX is reserved as well even when EFER.NXE is
 enabled. 


Gil:
Here is the original mail in KVM mailinglist. If you would be able to 
help, that is great.
thx, eddie--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux

2009-03-30 Thread Anthony Liguori

Izik Eidus wrote:

I am sending another seires of patchs for kvm kernel and kvm-userspace
that would allow users of kvm to test ksm with it.
The kvm patchs would apply to Avi git tree.
  
Any reason to not take these through upstream QEMU instead of 
kvm-userspace?  In principle, I don't see anything that would prevent 
normal QEMU from almost making use of this functionality.  That would 
make it one less thing to eventually have to merge...


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread Anthony Liguori

Izik Eidus wrote:

Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Ksm api:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_START_STOP_KTHREAD:
Return information about the kernel thread, the inforamtion is returned
using the ksm_kthread_info structure:
ksm_kthread_info:
__u32 sleep:
number of microsecoends to sleep between each iteration of
scanning.

__u32 pages_to_scan:
number of pages to scan for each iteration of scanning.

__u32 max_pages_to_merge:
maximum number of pages to merge in each iteration of scanning
(so even if there are still more pages to scan, we stop this
iteration)

__u32 flags:
   flags to control ksmd (right now just ksm_control_flags_run
  available)
  


Wouldn't this make more sense as a sysfs interface?  That is, the 
KSM_START_STOP_KTHREAD part, not necessarily the rest of the API.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-03-30 Thread KAMEZAWA Hiroyuki
On Tue, 31 Mar 2009 02:59:20 +0300
Izik Eidus iei...@redhat.com wrote:

 Ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 Pages that are merged are marked as readonly and are COWed when any
 application try to change them.
 
 Ksm is used for cases where using fork() is not suitable,
 one of this cases is where the pages of the application keep changing
 dynamicly and the application cannot know in advance what pages are
 going to be identical.
 
 Ksm works by walking over the memory pages of the applications it
 scan in order to find identical pages.
 It uses a two sorted data strctures called stable and unstable trees
 to find in effective way the identical pages.
 
 When ksm finds two identical pages, it marks them as readonly and merges
 them into single one page,
 after the pages are marked as readonly and merged into one page, linux
 will treat this pages as normal copy_on_write pages and will fork them
 when write access will happen to them.
 
 Ksm scan just memory areas that were registred to be scanned by it.
 
 Ksm api:
 
 KSM_GET_API_VERSION:
 Give the userspace the api version of the module.
 
 KSM_CREATE_SHARED_MEMORY_AREA:
 Create shared memory reagion fd, that latter allow the user to register
 the memory region to scan by using:
 KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION
 
 KSM_START_STOP_KTHREAD:
 Return information about the kernel thread, the inforamtion is returned
 using the ksm_kthread_info structure:
 ksm_kthread_info:
 __u32 sleep:
 number of microsecoends to sleep between each iteration of
 scanning.
 
 __u32 pages_to_scan:
 number of pages to scan for each iteration of scanning.
 
 __u32 max_pages_to_merge:
 maximum number of pages to merge in each iteration of scanning
 (so even if there are still more pages to scan, we stop this
 iteration)
 
 __u32 flags:
flags to control ksmd (right now just ksm_control_flags_run
 available)
 
 KSM_REGISTER_MEMORY_REGION:
 Register userspace virtual address range to be scanned by ksm.
 This ioctl is using the ksm_memory_region structure:
 ksm_memory_region:
 __u32 npages;
  number of pages to share inside this memory region.
 __u32 pad;
 __u64 addr:
 the begining of the virtual address of this region.
 
 KSM_REMOVE_MEMORY_REGION:
 Remove memory region from ksm.
 
 Signed-off-by: Izik Eidus iei...@redhat.com
 ---
  include/linux/ksm.h|   69 +++
  include/linux/miscdevice.h |1 +
  mm/Kconfig |6 +
  mm/Makefile|1 +
  mm/ksm.c   | 1431 
 
  5 files changed, 1508 insertions(+), 0 deletions(-)
  create mode 100644 include/linux/ksm.h
  create mode 100644 mm/ksm.c
 
 diff --git a/include/linux/ksm.h b/include/linux/ksm.h
 new file mode 100644
 index 000..5776dce
 --- /dev/null
 +++ b/include/linux/ksm.h
 @@ -0,0 +1,69 @@
 +#ifndef __LINUX_KSM_H
 +#define __LINUX_KSM_H
 +
 +/*
 + * Userspace interface for /dev/ksm - kvm shared memory
 + */
 +
 +#include linux/types.h
 +#include linux/ioctl.h
 +
 +#include asm/types.h
 +
 +#define KSM_API_VERSION 1
 +
 +#define ksm_control_flags_run 1
 +
 +/* for KSM_REGISTER_MEMORY_REGION */
 +struct ksm_memory_region {
 + __u32 npages; /* number of pages to share */
 + __u32 pad;
 + __u64 addr; /* the begining of the virtual address */
 +__u64 reserved_bits;
 +};
 +
 +struct ksm_kthread_info {
 + __u32 sleep; /* number of microsecoends to sleep */
 + __u32 pages_to_scan; /* number of pages to scan */
 + __u32 flags; /* control flags */
 +__u32 pad;
 +__u64 reserved_bits;
 +};
 +
 +#define KSMIO 0xAB
 +
 +/* ioctls for /dev/ksm */
 +
 +#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
 +/*
 + * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
 + */
 +#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd 
 */
 +/*
 + * KSM_START_STOP_KTHREAD - control the kernel thread scanning speed
 + * (can stop the kernel thread from working by setting running = 0)
 + */
 +#define KSM_START_STOP_KTHREAD_IOW(KSMIO,  0x02,\
 +   struct ksm_kthread_info)
 +/*
 + * KSM_GET_INFO_KTHREAD - return information about the kernel thread
 + * scanning speed.
 + */
 +#define KSM_GET_INFO_KTHREAD  _IOW(KSMIO,  0x03,\
 +   struct ksm_kthread_info)
 +
 +
 +/* ioctls for SMA fds */
 +
 +/*
 + * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
 + * scanned by kvm.
 + */
 +#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
 +   struct ksm_memory_region)
 +/*
 + * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
 + */
 +#define KSM_REMOVE_MEMORY_REGION 

DMA errors in guest caused by corrupted(?) disk image

2009-03-30 Thread Matthew Palmer
Hi,

I've just come across a somewhat strange problem that was suggested I
report to the list.

The problem manifested itself as DMA errors and the like popping up in the
guest, like I'd expect to see if a disk in a physical machine was dying,
like this:

  hda: dma_timer_expiry: dma status == 0x21

The VM has previously been quite stable until this problem started part of
the way through today.  Another guest on the same host machine is fine.

After talking to iggy on IRC, I tried running qemu-convert over the disk
image to copy it to another image, and that solved the problem.  So it looks
like disk image corruption somehow manages to manifest itself as a DMA error
in the guest...

I'm starting KVM like so:

  kvm -m 512 -net nic,macaddr=$macaddr -net tap,iface=$iface -hda hda.qc2

As the filename suggests, it's a qcow2 image, 30GB in size.

The guest is a 32 bit RHEL3 installation.  The host is a 64 bit Debian Lenny
machine, running kvm 84.

I've got a copy of the dodgy disk image, although it's 2.7GB so not so easy
to ship around.  I can do any diagnostics on it that people need to try and
track down the cause of the problem.  I tried doing another qemu-convert
(with a view to seeing the differences between the two images) but the copy
is 500MB smaller (zero blocks removed, presumably) so a diff probably isn't
going to help much.

- Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Segfault while booting Windows XP x64

2009-03-30 Thread Mike Kelly
I'm on a Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, using a 2.6.29
vanilla kernel, x86_64. kvm userland version 84.

When I try to boot my x64 Windows XP, it gets partway through the
windows booting process, with the progress bar and what not. Then, I
get the attached backtrace.

The various -no-kvm options don't seem to make a difference.

I created, and was able to boot, this image using linux 2.6.28. I'll
give it a shot again later to confirm that is still the case.

Thanks in advance.

-- 
Mike Kelly
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-pc-linux-gnu...
Starting program: /usr/bin/kvm -usb -usbdevice tablet -name winxp-x64 
winxp-x64.kvm
[Thread debugging using libthread_db enabled]
[New Thread 0x7fe4d978b740 (LWP 29948)]
[New Thread 0x7fe4ccf9d950 (LWP 29951)]
[New Thread 0x7fe4cb6d5950 (LWP 29955)]
[Thread 0x7fe4cb6d5950 (LWP 29955) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe4ccf9d950 (LWP 29951)]
qemu_paio_cancel (fd=value optimized out, aiocb=0x2909230) at 
posix-aio-compat.c:184
184 TAILQ_REMOVE(request_list, aiocb, node);

Thread 2 (Thread 0x7fe4ccf9d950 (LWP 29951)):
#0  qemu_paio_cancel (fd=value optimized out, aiocb=0x2909230) at 
posix-aio-compat.c:184
ret = value optimized out
#1  0x0041acf8 in raw_aio_cancel (blockacb=value optimized out) at 
block-raw-posix.c:681
ret = value optimized out
acb = (RawAIOCB *) 0x2909210
#2  0x00433790 in ide_dma_cancel (bm=0x27dfe60) at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/hw/ide.c:2973
No locals.
#3  0x004337f5 in bmdma_cmd_writeb (opaque=0x27dfe60, addr=0, 
val=value optimized out)
at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/hw/ide.c:2987
No locals.
#4  0x00520d5d in kvm_outb (opaque=value optimized out, addr=0, 
data=0 '\0')
at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/qemu-kvm.c:684
No locals.
#5  0x0054cfa5 in kvm_run (kvm=0x2716010, vcpu=0, env=0x2725f90) at 
libkvm.c:722
r = value optimized out
fd = 12
run = (struct kvm_run *) 0x7fe4cc799000
#6  0x00521529 in kvm_cpu_exec (env=value optimized out) at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/qemu-kvm.c:205
r = value optimized out
#7  0x00521818 in ap_main_loop (_env=value optimized out) at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/qemu-kvm.c:414
env = (CPUX86State *) 0x2725f90
signals = {__val = {18446744067267100671, 18446744073709551615 repeats 
15 times}}
data = (struct ioperm_data *) 0x0
#8  0x7fe4d89eff97 in start_thread () from /lib/libpthread.so.0
No locals.
#9  0x7fe4d792bdfd in clone () from /lib/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7fe4d978b740 (LWP 29948)):
#0  0x7fe4d7925452 in select () from /lib/libc.so.6
No symbol table info available.
#1  0x00409eab in main_loop_wait (timeout=0) at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/vl.c:3647
ioh = value optimized out
rfds = {fds_bits = {164992, 0 repeats 15 times}}
wfds = {fds_bits = {0 repeats 16 times}}
xfds = {fds_bits = {0 repeats 16 times}}
ret = value optimized out
nfds = 17
tv = {tv_sec = 0, tv_usec = 999644}
#2  0x00520fea in kvm_main_loop () at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/qemu-kvm.c:596
fds = {15, 16}
mask = {__val = {268443648, 0 repeats 15 times}}
sigfd = 17
#3  0x0040e4db in main (argc=value optimized out, 
argv=0x7fffe17aa448, envp=value optimized out)
at 
/var/tmp/paludis/build/app-virtualization-kvm-84/work/kvm-84/qemu/vl.c:3809
use_gdbstub = 0
gdbstub_port = 0x54f5ef 1234
boot_devices_bitmap = 0
i = value optimized out
snapshot = 0
linux_boot = value optimized out
net_boot = value optimized out
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x58cc6b 
boot_devices = 0x54f881 cad
ds = value optimized out
dcl = value optimized out
cyls = 0
heads = 0
secs = 0
translation = 0
net_clients = {0x54f45d nic, 0x54f885 user, 0x0, 0x7fe4d95972ee 
\205À\017\217z\001, 0x0, 
  0x7fe4d9596bec \205Àt\A\213D$\f\205Àu\027\205í\017\037D, 0x7fe40001 
Address 0x7fe40001 out of bounds, 0x7fe4d97a95b8 \220\225zÙä\177, 
  0x0, 0x1 Address 0x1 out of bounds, 0x71dd557f Address 0x71dd557f out of 
bounds, 0x7fe4d9596ffa 

RE: Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit

2009-03-30 Thread Neiger, Gil
PDPTEs are used only if CR0.PG=CR4.PAE=1.

In that situation, their format depends the value of IA32_EFER.LMA.

If IA32_EFER.LMA=0, bit 63 is reserved and must be 0 in any PDPTE that is 
marked present.  The execute-disable setting of a page is determined only by 
the PDE and PTE.

If IA32_EFER.LMA=1, bit 63 is used for the execute-disable in PML4 entries, 
PDPTEs, PDEs, and PTEs (assuming IA32_EFER.NXE=1).

- Gil

-Original Message-
From: Dong, Eddie 
Sent: Monday, March 30, 2009 5:51 PM
To: Neiger, Gil
Cc: Avi Kivity; kvm@vger.kernel.org; Dong, Eddie
Subject: FW: Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit

Avi Kivity wrote:
 Dong, Eddie wrote:
 @@ -2199,6 +2194,9 @@ void reset_rsvds_bits_mask(struct kvm_vcpu
  *vcpu, int level) context-rsvd_bits_mask[1][0] = 0;
  break;
  case PT32E_ROOT_LEVEL:
 +context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
 +rsvd_bits(maxphyaddr, 62) |
 +rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
  context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
  rsvd_bits(maxphyaddr, 62);  /* PDE */
  context-rsvd_bits_mask[0][0] = exb_bit_rsvd
 
 Are you sure that PDPTEs support NX?  They don't support R/W and U/S,
 so it seems likely that NX is reserved as well even when EFER.NXE is
 enabled. 


Gil:
Here is the original mail in KVM mailinglist. If you would be able to 
help, that is great.
thx, eddie--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segfault while booting Windows XP x64

2009-03-30 Thread Gleb Natapov
On Mon, Mar 30, 2009 at 11:26:52PM -0400, Mike Kelly wrote:
 I'm on a Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, using a 2.6.29
 vanilla kernel, x86_64. kvm userland version 84.
 
 When I try to boot my x64 Windows XP, it gets partway through the
 windows booting process, with the progress bar and what not. Then, I
 get the attached backtrace.
 
 The various -no-kvm options don't seem to make a difference.
 
 I created, and was able to boot, this image using linux 2.6.28. I'll
 give it a shot again later to confirm that is still the case.
 
Are you sure you have write permission to that image?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html