[PATCH] Merge commit 'qemu-svn/trunk'

2009-03-29 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* commit 'qemu-svn/trunk': (27 commits)
  gdbstub: Allow re-instantiation (Jan Kiszka)
  char: Fix closing of various char devices (Jan Kiszka)
  qemu-img: adding a -F base_fmt option to qemu-img create -b (Uri Lublin)
  block-qcow2: keep backing file format in a qcow2 extension (Uri Lublin)
  block: support known backing format for image create and open (Uri Lublin)
  Introducing qcow2 extensions (Uri Lublin)
  kvm: Drop kvm_patch_opcode_byte (Jan Kiszka)
  ROM write access for debugging (Jan Kiszka)
  Use the DMA api to map virtio elements.
  virtio-blk: use generic vectored I/O APIs (Christoph Hellwig)
  add qemu_iovec_init_external (Christoph Hellwig)
  Clean some PCI defines (Stefan Weil)
  Fix monitor command (screendump) (Stefan Weil)
  Remove nodisk_ok machine feature (Jan Kiszka)
  musicpal: Fix regression caused by 6839 (Jan Kiszka)
  gdbstub: Drop redundant memset after qemu_mallocz (Jan Kiszka)
  get roms more room. (Glauber Costa)
  new scsi-generic abstraction, use SG_IO (Christoph Hellwig)
  Document sun ID PROM contents
  Fix DMA API when handling an immediate error from block layer (Avi Kivity)
  ...

Conflicts:
qemu/hw/pc.c
qemu/hw/pci.h
qemu/hw/virtio.c
qemu/qemu-doc.texi
qemu/vl.c

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: fix tsc_khz on older i386

2009-03-29 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Older i386 kernels do not export tsc_khz.  Try to make a best effort attempt
by looking up tsc_khz, and falling back to a constant if it isn't exported.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/external-module-compat-comm.h 
b/kernel/external-module-compat-comm.h
index e0dc577..c955927 100644
--- a/kernel/external-module-compat-comm.h
+++ b/kernel/external-module-compat-comm.h
@@ -387,11 +387,16 @@ static inline struct page *__kvm_vm_fault(struct 
vm_area_struct *vma,
 #endif
 
 #if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,23)
-extern unsigned int tsc_khz;
-#endif
+
+unsigned kvm_get_tsc_khz(void);
+#define kvm_tsc_khz (kvm_get_tsc_khz())
+
+#else
 
 #define kvm_tsc_khz tsc_khz
 
+#endif
+
 #if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,21)
 
 #include linux/ktime.h
diff --git a/kernel/external-module-compat.c b/kernel/external-module-compat.c
index f425e08..0d858be 100644
--- a/kernel/external-module-compat.c
+++ b/kernel/external-module-compat.c
@@ -334,3 +334,21 @@ int kvm_pcidev_msi_enabled(struct pci_dev *dev)
 }
 
 #endif
+
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,23)
+
+extern unsigned tsc_khz;
+static unsigned tsc_khz_dummy = 200;
+static unsigned *tsc_khz_p;
+
+unsigned kvm_get_tsc_khz(void)
+{
+   if (!tsc_khz_p) {
+   tsc_khz_p = symbol_get(tsc_khz);
+   if (!tsc_khz_p)
+   tsc_khz_p = tsc_khz_dummy;
+   }
+   return *tsc_khz_p;
+}
+
+#endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: use statfs to determine size of huge pages

2009-03-29 Thread Avi Kivity
From: Joerg Roedel joerg.roe...@amd.com

The current method of finding out the size of huge pages does not work
reliably anymore. Current Linux supports more than one huge page size
but /proc/meminfo only show one of the supported sizes.
To find out the real page size used can be found by calling statfs. This
patch changes qemu to use statfs instead of parsing /proc/meminfo.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/sysemu.h b/qemu/sysemu.h
index 4457a40..d765465 100644
--- a/qemu/sysemu.h
+++ b/qemu/sysemu.h
@@ -105,7 +105,7 @@ extern int graphic_rotate;
 extern int no_quit;
 extern int semihosting_enabled;
 extern int old_param;
-extern int hpagesize;
+extern long hpagesize;
 extern const char *bootp_filename;
 
 #ifdef USE_KQEMU
diff --git a/qemu/vl.c b/qemu/vl.c
index 6d39f37..b3da7ad 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -42,6 +42,7 @@
 #include sys/ioctl.h
 #include sys/resource.h
 #include sys/socket.h
+#include sys/vfs.h
 #include netinet/in.h
 #include net/if.h
 #if defined(__NetBSD__)
@@ -262,7 +263,7 @@ const char *mem_path = NULL;
 #ifdef MAP_POPULATE
 int mem_prealloc = 1;  /* force preallocation of physical target memory */
 #endif
-int hpagesize = 0;
+long hpagesize = 0;
 const char *cpu_vendor_string;
 #ifdef TARGET_ARM
 int old_param = 0;
@@ -4305,32 +4306,27 @@ void qemu_get_launch_info(int *argc, char ***argv, int 
*opt_daemonize, const cha
 }
 
 #ifdef USE_KVM
-static int gethugepagesize(void)
+
+#define HUGETLBFS_MAGIC   0x958458f6
+
+static long gethugepagesize(const char *path)
 {
-int ret, fd;
-char buf[4096];
-const char *needle = Hugepagesize:;
-char *size;
-unsigned long hugepagesize;
+struct statfs fs;
+int ret;
 
-fd = open(/proc/meminfo, O_RDONLY);
-if (fd  0) {
-   perror(open);
-   exit(0);
-}
+do {
+   ret = statfs(path, fs);
+} while (ret != 0  errno == EINTR);
 
-ret = read(fd, buf, sizeof(buf));
-if (ret  0) {
-   perror(read);
-   exit(0);
+if (ret != 0) {
+   perror(statfs);
+   return 0;
 }
 
-size = strstr(buf, needle);
-if (!size)
-   return 0;
-size += strlen(needle);
-hugepagesize = strtol(size, NULL, 0);
-return hugepagesize;
+if (fs.f_type != HUGETLBFS_MAGIC)
+   fprintf(stderr, Warning: path not on HugeTLBFS: %s\n, path);
+
+return fs.f_bsize;
 }
 
 static void *alloc_mem_area(size_t memory, unsigned long *len, const char 
*path)
@@ -4350,7 +4346,7 @@ static void *alloc_mem_area(size_t memory, unsigned long 
*len, const char *path)
 if (asprintf(filename, %s/kvm.XX, path) == -1)
return NULL;
 
-hpagesize = gethugepagesize() * 1024;
+hpagesize = gethugepagesize(path);
 if (!hpagesize)
return NULL;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

2009-03-29 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Conflicts:
arch/ia64/include/asm/kvm_host.h
arch/ia64/kvm/kvm-ia64.c
arch/x86/include/asm/kvm.h
arch/x86/kvm/i8254.c
arch/x86/kvm/i8254.h
arch/x86/kvm/vmx.c
arch/x86/kvm/x86.c
include/linux/kvm.h
include/linux/kvm_host.h
include/linux/kvm_types.h
virt/kvm/ioapic.c
virt/kvm/irq_comm.c
virt/kvm/kvm_main.c

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

2009-03-29 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU stuck for ...

2009-03-29 Thread Gerrit Slomma
Robert Wimmer r.wimmer at tomorrow-focus.de writes:

 
 Hi,
 
 does anyone know how to solve the problem
 with BUG: soft lockup - CPU#0 stuck for ...?
 Today I got the messages below during compilation
 of the kernel modules in a guest. Using kvm84 and Kernel 2.6.29
 as host kernel and 2.6.28 as guest kernel during the
 hangup of the guest neither ssh or ping was possible.
 After about 2 minutes the guest was reachable again
 and I saw the messages below with dmesg.
 
 Maybe it is related with my prev. anserwed posting:
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/29677
 
 Thanks!
 Robert
 
 BUG: soft lockup - CPU#0 stuck for 61s!
 (...)

Hello

Do you use x86_64 or i686?
Look at my post here 
http://article.gmane.org/gmane.comp.emulators.kvm.devel/29833
And my Bug-report here https://bugzilla.redhat.com/show_bug.cgi?id=492688.
I do not have the problems while running but after migrating. Problems with
stuck CPUs vanish if i686 for the host is used - but i am testing further.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Add reserved bits check

2009-03-29 Thread Avi Kivity

Dong, Eddie wrote:
 
+static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)

+{
+   int ps = 0;
+
+   if (level == PT_DIRECTORY_LEVEL)
+   ps = !!(gpte  PT_PAGE_SIZE_MASK);
  


No need for this.  If you set rsvd_bits_mask[1][0] == 
rsvd_bits_mask[0][0], then you get the same behaviour.  The first index 
is not the page size, it's just bit 7.


You'll need to fill all the indexes for bit 7 == 1, but it's worth it, 
with the 1GB pages patch.



+   return (gpte  vcpu-arch.mmu.rsvd_bits_mask[ps][level-1]) != 0;
+}
+
 #define PTTYPE 64
 #include paging_tmpl.h
 #undef PTTYPE
 
+int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)

+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x8008, 0);
+   if (best)
+   return best-eax  0xff;
+   return 32;
+}
+
  


Best to return 36 if the cpu doesn't support cpuid 8008 but does 
support pae.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Defer remote tlb flushes on invlpg (v3)

2009-03-29 Thread Avi Kivity

Marcelo, Andrea?

Avi Kivity wrote:

KVM currently flushes the tlbs on all cpus when emulating invlpg.  This
is because at the time of invlpg we lose track of the page, and leaving
stale tlb entries could cause the guest to access the page when it is
later freed (say after being swapped out).

However we have a second change to flush the tlbs, when an mmu notifier is
called to let us know the host pte has been invalidated.  We can safely
defer the flush to this point, which occurs much less frequently.  Of course,
we still do a local tlb flush when emulating invlpg.

Signed-off-by: Avi Kivity a...@redhat.com
---

Changes from v2:
- dropped remote flushes from guest pagetable write protect paths
- fixed up memory barriers
- use existing local tlb flush in invlpg, no need to add another one

 arch/x86/kvm/mmu.c |3 +--
 arch/x86/kvm/paging_tmpl.h |5 +
 include/linux/kvm_host.h   |2 ++
 virt/kvm/kvm_main.c|   17 +++--
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a36f7f..f0ea56c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1184,8 +1184,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
for_each_sp(pages, sp, parents, i)
protected |= rmap_write_protect(vcpu-kvm, sp-gfn);
 
-		if (protected)

-   kvm_flush_remote_tlbs(vcpu-kvm);
+   kvm_flush_remote_tlbs_cond(vcpu-kvm, protected);
 
 		for_each_sp(pages, sp, parents, i) {

kvm_sync_page(vcpu, sp);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 855eb71..2273b26 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -445,7 +445,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
gpa_t pte_gpa = -1;
int level;
u64 *sptep;
-   int need_flush = 0;
 
 	spin_lock(vcpu-kvm-mmu_lock);
 
@@ -465,7 +464,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)

rmap_remove(vcpu-kvm, sptep);
if (is_large_pte(*sptep))
--vcpu-kvm-stat.lpages;
-   need_flush = 1;
+   vcpu-kvm-remote_tlbs_dirty = true;
}
set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
break;
@@ -475,8 +474,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
break;
}
 
-	if (need_flush)

-   kvm_flush_remote_tlbs(vcpu-kvm);
spin_unlock(vcpu-kvm-mmu_lock);
 
 	if (pte_gpa == -1)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 11eb702..b779c57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -125,6 +125,7 @@ struct kvm_kernel_irq_routing_entry {
 struct kvm {
struct mutex lock; /* protects the vcpus array and APIC accesses */
spinlock_t mmu_lock;
+   bool remote_tlbs_dirty;
struct rw_semaphore slots_lock;
struct mm_struct *mm; /* userspace tied to this vm */
int nmemslots;
@@ -235,6 +236,7 @@ void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_cond(struct kvm *kvm, bool cond);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 
 long kvm_arch_dev_ioctl(struct file *filp,

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 68b217e..12afa50 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -758,10 +758,18 @@ static bool make_all_cpus_request(struct kvm *kvm, 
unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)

 {
+   kvm-remote_tlbs_dirty = false;
+   smp_wmb();
if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
 }
 
+void kvm_flush_remote_tlbs_cond(struct kvm *kvm, bool cond)

+{
+   if (cond || kvm-remote_tlbs_dirty)
+   kvm_flush_remote_tlbs(kvm);
+}
+
 void kvm_reload_remote_mmus(struct kvm *kvm)
 {
make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
@@ -841,8 +849,7 @@ static void kvm_mmu_notifier_invalidate_page(struct 
mmu_notifier *mn,
spin_unlock(kvm-mmu_lock);
 
 	/* we've to flush the tlb before the pages can be freed */

-   if (need_tlb_flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_cond(kvm, need_tlb_flush);
 
 }
 
@@ -866,8 +873,7 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,

spin_unlock(kvm-mmu_lock);
 
 	/* we've to flush the tlb before the pages can be freed */

-   if (need_tlb_flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_cond(kvm, need_tlb_flush);
 }
 
 static void 

Re: [PATCH 2/7] kvm mmu: infrastructure changes for multiple huge page support

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch includes most of the necessary changes to the KVM SoftMMU for
supporting more than one huge page size. The changes in this patch
include:

  * introduce 'enum kvm_page_size' which is used to represent the page
size used
  


How about the concept of page_level instead, which returns the level of 
a page in terms of the paging hierarchy?  Would be easier to compare 
against in the various walkers.


I've been thinking for a while to make levels zero-based, so 0 would be 
a 4K page, 1 a 2M page, and 2 a 1GB page.





--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch adds the necessary data structures to take care of write
protections in place within a second huge page sized page.

 
 struct kvm_vcpu_arch {

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9936b45..7d4162d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -390,6 +390,15 @@ static int *slot_largepage_idx(gfn_t gfn, struct 
kvm_memory_slot *slot)
return slot-lpage_info[idx].write_count;
 }
 
+static int *slot_hugepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)

+{
+   unsigned long idx;
+
+   idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
+ (slot-base_gfn / KVM_PAGES_PER_1G_PAGE);
+   return slot-hpage_info[idx].write_count;
+}
  


A page level argument would remove the need for this duplication, as 
well as all the constants.


 
+static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)

+{
+   struct kvm_memory_slot *slot;
+   int *hugepage_idx;
+
+   gfn = unalias_gfn(kvm, gfn);
+   slot = gfn_to_memslot_unaliased(kvm, gfn);
+   if (slot) {
+   hugepage_idx = slot_hugepage_idx(gfn, slot);
  


slot_largepage_idx() here?

I don't think we ever write protect large pages, so why is this needed?


+   return *hugepage_idx;
+   }
+
+   return 1;
+}
  




+
 static enum kvm_page_size host_page_size(struct kvm *kvm, gfn_t gfn)
 {
struct vm_area_struct *vma;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 095ebb6..2f05d48 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -103,7 +103,7 @@ struct kvm_memory_slot {
struct {
unsigned long rmap_pde;
int write_count;
-   } *lpage_info;
+   } *lpage_info, *hpage_info;
  


} * lpage_info[KVM_NR_PAGE_LEVELS];


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8aa3b95..c4842f4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1001,10 +1001,14 @@ static void kvm_free_physmem_slot(struct 
kvm_memory_slot *free,
if (!dont || free-lpage_info != dont-lpage_info)
vfree(free-lpage_info);
 
+	if (!dont || free-hpage_info != dont-hpage_info)

+   vfree(free-hpage_info);
  


loop


 void kvm_free_physmem(struct kvm *kvm)
@@ -1170,6 +1174,28 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.lpage_info[largepages-1].write_count = 1;
}
 
+#ifdef KVM_PAGES_PER_LHPAGE

+   if (npages  !new.hpage_info) {
+   int hugepages = npages / KVM_PAGES_PER_LHPAGE;
+   if (npages % KVM_PAGES_PER_LHPAGE)
+   hugepages++;
+   if (base_gfn % KVM_PAGES_PER_LHPAGE)
+   hugepages++;
+
+   new.hpage_info = vmalloc(hugepages * sizeof(*new.hpage_info));
+
+   if (!new.hpage_info)
+   goto out_free;
+
+   memset(new.hpage_info, 0, hugepages * sizeof(*new.hpage_info));
+
+   if (base_gfn % KVM_PAGES_PER_LHPAGE)
+   new.hpage_info[0].write_count = 1;
+   if ((base_gfn+npages) % KVM_PAGES_PER_LHPAGE)
+   new.hpage_info[hugepages-1].write_count = 1;
+   }
+#endif
  


Loop, KVM_NR_PAGE_LEVELS defined per arch.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm mmu: add support for 1GB pages to direct mapping paths

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch makes the MMU path for TDP aware of 1GB pages.

 
+#define PT64_MID_BASE_ADDR_MASK (PT64_BASE_ADDR_MASK  \

+   ~((1ULL  (PAGE_SHIFT + (2 * PT64_LEVEL_BITS))) - 1))
+#define PT64_MID_GFN_DELTA_MASK (PT64_BASE_ADDR_MASK  (((1ULL  \
+   (2 * PT64_LEVEL_BITS)) - 1)  PAGE_SHIFT))
+
 #define PT32_BASE_ADDR_MASK PAGE_MASK
 #define PT32_DIR_BASE_ADDR_MASK \
(PAGE_MASK  ~((1ULL  (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
@@ -128,6 +133,7 @@ module_param(oos_shadow, bool, 0644);
 #define PFERR_USER_MASK (1U  2)
 #define PFERR_FETCH_MASK (1U  4)
 
+#define PT_MIDDLE_LEVEL 3
  


I prefer the architectural names to the Linux names (since we're talking 
about the guest), so PDPT here (even though the Linux names make a bit 
more sense).



 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
@@ -507,16 +513,29 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,

  enum kvm_page_size psize)
 {
struct kvm_memory_slot *slot;
-   unsigned long idx;
+   unsigned long idx, *ret;
 
 	slot = gfn_to_memslot(kvm, gfn);

-   if (psize == KVM_PAGE_SIZE_4k)
-   return slot-rmap[gfn - slot-base_gfn];
 
-	idx = (gfn / KVM_PAGES_PER_2M_PAGE) -

- (slot-base_gfn / KVM_PAGES_PER_2M_PAGE);
+   switch (psize) {
+   case KVM_PAGE_SIZE_4k:
+   ret = slot-rmap[gfn - slot-base_gfn];
+   break;
+   case KVM_PAGE_SIZE_2M:
+   idx = (gfn / KVM_PAGES_PER_2M_PAGE) -
+ (slot-base_gfn / KVM_PAGES_PER_2M_PAGE);
+   ret = slot-lpage_info[idx].rmap_pde;
+   break;
+   case KVM_PAGE_SIZE_1G:
+   idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
+ (slot-base_gfn / KVM_PAGES_PER_1G_PAGE);
+   ret = slot-hpage_info[idx].rmap_pde;
+   break;
+   default:
+   BUG();
+   }
  


Ah, page_level would really make sense here.

 
-	return slot-lpage_info[idx].rmap_pde;

+   return ret;
 }
 
 /*

@@ -1363,7 +1382,10 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
   pt[i]);
} else {
--kvm-stat.lpages;
-   rmap_remove(kvm, pt[i], KVM_PAGE_SIZE_2M);
+   if (sp-role.level == PT_DIRECTORY_LEVEL)
+   rmap_remove(kvm, pt[i], 
KVM_PAGE_SIZE_2M);
+   else
+   rmap_remove(kvm, pt[i], 
KVM_PAGE_SIZE_1G);
}
  


And here.


}
pt[i] = shadow_trap_nonpresent_pte;
@@ -1769,8 +1791,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
if ((pte_access  ACC_WRITE_MASK)
|| (write_fault  !is_write_protection(vcpu)  !user_fault)) {
 
-		if (psize  KVM_PAGE_SIZE_4k 

-   has_wrprotected_page(vcpu-kvm, gfn)) {
+   if ((psize == KVM_PAGE_SIZE_2M 
+has_wrprotected_page(vcpu-kvm, gfn)) ||
+   (psize == KVM_PAGE_SIZE_1G 
+has_wrprotected_largepage(vcpu-kvm, gfn))) {
ret = 1;
  


And here.  I'm in complete agreement with myself here.


spte = shadow_trap_nonpresent_pte;
goto set_pte;
@@ -1884,7 +1908,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
if (iterator.level == PT_PAGE_TABLE_LEVEL
|| (psize == KVM_PAGE_SIZE_2M 
-   iterator.level == PT_DIRECTORY_LEVEL)) {
+   iterator.level == PT_DIRECTORY_LEVEL)
+   || (psize == KVM_PAGE_SIZE_1G 
+   iterator.level == PT_MIDDLE_LEVEL)) {
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 0, write, 1, pt_write,
 psize, 0, gfn, pfn, false);
@@ -1919,8 +1945,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn)
unsigned long mmu_seq;
enum kvm_page_size psize = backing_size(vcpu, gfn);
 
-	if (psize == KVM_PAGE_SIZE_2M)

+   if (psize = KVM_PAGE_SIZE_2M) {
+   /*
+* nonpaging mode uses pae page tables - so we
+* can't use gbpages here - take care of this
+*/
gfn = ~(KVM_PAGES_PER_2M_PAGE-1);
+   psize = KVM_PAGE_SIZE_2M;
+   }
 
 	mmu_seq = vcpu-kvm-mmu_notifier_seq;

smp_rmb();
@@ -2123,6 +2155,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa,
psize = backing_size(vcpu, gfn);
if (psize == KVM_PAGE_SIZE_2M)
   

Re: [PATCH 6/7] kvm mmu: enabling 1GB pages by extending backing_size funtion

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch enables support for 1GB pages in KVM by implementing
the support in backing_size().

@@ -490,18 +492,30 @@ static enum kvm_page_size host_page_size(struct kvm *kvm, 
gfn_t gfn)
 static enum kvm_page_size backing_size(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
struct kvm_memory_slot *slot;
-
-   if (has_wrprotected_page(vcpu-kvm, gfn))
-   return KVM_PAGE_SIZE_4k;
-
-   if (host_page_size(vcpu-kvm, gfn)  KVM_PAGE_SIZE_2M)
-   return KVM_PAGE_SIZE_4k;
+   enum kvm_page_size host_size, ret;
 
 	slot = gfn_to_memslot(vcpu-kvm, gfn);

if (slot  slot-dirty_bitmap)
return KVM_PAGE_SIZE_4k;
 
-	return KVM_PAGE_SIZE_2M;

+   host_size = host_page_size(vcpu-kvm, gfn);
+
+   switch (host_size) {
+   case KVM_PAGE_SIZE_1G:
+   if (!has_wrprotected_largepage(vcpu-kvm, gfn)) {
+   ret = KVM_PAGE_SIZE_1G;
+   break;
+   }
  


What if there's a wrprotected_page in there?


+   case KVM_PAGE_SIZE_2M:
+   if (!has_wrprotected_page(vcpu-kvm, gfn)) {
+   ret = KVM_PAGE_SIZE_2M;
+   break;
+   }
+   default:
+   ret = KVM_PAGE_SIZE_4k;
+   }
+
+   return ret;
 }
 
 /*
  



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.
  



diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a1df2a3..6593198 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -542,6 +542,8 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
int (*get_mt_mask_shift)(void);
+
+   bool (*gb_page_enable)(void);
 };
  


Should enable unconditionally.  Of course we need to find the shadow bug 
first, may be the has_wrprotected thingy.



diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ee755e2..e79eb26 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -413,6 +413,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_DEVICE_MSIX 28
 #endif
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
+#define KVM_CAP_1GB_PAGES 30
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
  


Need KVM_GET_SUPPORTED_CPUID2 support as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: add support for 1GB pages in shadow paging code

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch adds support for 1GB pages in the shadow paging code. The
guest can map 1GB pages in his page tables and KVM will map the page
frame with a 1GB, a 2MB or even a 4kb page size, according to backing
host page size and the write protections in place.
This is the theory. In practice there are conditions which turn the
guest unstable when running with this patch and GB pages enabled. The
failing conditions are:

* KVM is loaded using shadow paging
* The Linux guest uses GB pages for the kernel direct mapping
* The guest memory is backed with 4kb pages on the host side

With the above configuration there are random application or kernel
crashed when the guest runs under load. When GB pages for HugeTLBfs in
the guest are allocated at boot time in the guest the guest kernel
crashes or stucks at boot depending on the amount of RAM in the guest.
The following parameters have no impact:

* It bug occurs also without guest SMP (so likely no race
  condition)
* Use PV-MMU makes no difference

I have searched this bug for quite some time with no real luck. Maybe
some other reviewers have more luck than I had by now.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
@@ -729,7 +730,9 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
}
 
 	/* check for huge page mappings */

-   rmapp = gfn_to_rmap(kvm, gfn, KVM_PAGE_SIZE_2M);
+   psize = KVM_PAGE_SIZE_2M;
+again:
+   rmapp = gfn_to_rmap(kvm, gfn, psize);
spte = rmap_next(kvm, rmapp, NULL);
while (spte) {
BUG_ON(!spte);
@@ -737,7 +740,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
BUG_ON((*spte  (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != 
(PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
pgprintk(rmap_write_protect(large): spte %p %llx %lld\n, 
spte, *spte, gfn);
if (is_writeble_pte(*spte)) {
-   rmap_remove(kvm, spte, KVM_PAGE_SIZE_2M);
+   rmap_remove(kvm, spte, psize);
--kvm-stat.lpages;
set_shadow_pte(spte, shadow_trap_nonpresent_pte);
spte = NULL;
@@ -746,6 +749,11 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
spte = rmap_next(kvm, rmapp, spte);
}
 
+	if (psize == KVM_PAGE_SIZE_2M) {

+   psize = KVM_PAGE_SIZE_1G;
+   goto again;
+   }
+
  


Ugh, use a real loop.


return write_protected;
 }
 
@@ -789,11 +797,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,

if (hva = start  hva  end) {
gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
unsigned long lidx = gfn_offset / KVM_PAGES_PER_2M_PAGE;
+   unsigned long hidx = gfn_offset / KVM_PAGES_PER_1G_PAGE;
retval |= handler(kvm, memslot-rmap[gfn_offset],
  KVM_PAGE_SIZE_4k);
retval |= handler(kvm,
  memslot-lpage_info[lidx].rmap_pde,
  KVM_PAGE_SIZE_2M);
+   retval |= handler(kvm, 
memslot-hpage_info[hidx].rmap_pde,
+ KVM_PAGE_SIZE_1G);
}
}
  


Isn't this needed for tdp as well?





--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Support for GB pages in KVM

2009-03-29 Thread Avi Kivity

Marcelo Tosatti wrote:

On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
  

Hi,

this patchset extends the KVM MMU implementation to support 1GB pages as
supported by AMD family 16 processors. These patches enable support for
1 GB pages with Nested Paging. Support for these pages in the shadow
paging code was also developed but does not run stable yet. The patch
for shadow-paging support is not included in this series and will be
sent out seperatly.



Looks generally sane. I'm not sure its even worthwhile to support
GBpages with softmmu, because the chance of finding an area without
shadowed (write protected) pages is much smaller than with 2MB pages.
  


If the guest uses 1GB pages in userspace, then these pages would not 
have any write protected subpages.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Support for GB pages in KVM

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

On Sat, Mar 28, 2009 at 06:40:08PM -0300, Marcelo Tosatti wrote:
  

On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:


Hi,

this patchset extends the KVM MMU implementation to support 1GB pages as
supported by AMD family 16 processors. These patches enable support for
1 GB pages with Nested Paging. Support for these pages in the shadow
paging code was also developed but does not run stable yet. The patch
for shadow-paging support is not included in this series and will be
sent out seperatly.
  

Looks generally sane. I'm not sure its even worthwhile to support
GBpages with softmmu, because the chance of finding an area without
shadowed (write protected) pages is much smaller than with 2MB pages.



Thanks for your review.

The idea behind GB pages in softmmu code was to provide GB pages to the
guest even if hardware does not support it. This would work better with
live migration (Only case where we wouldn't have gbpages then would be
vmx with ept enabled).

  

Have any numbers to share?



No numbers I fully trust by now. I measured a 32% improvement in
kernbench using nested pages backed with gb pages. I will do some more
measurements and share some more solid numbers.

  


Compared to 2M pages?  But we're already close to native here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mm_pages_next() question

2009-03-29 Thread Avi Kivity

static int mmu_pages_next(struct kvm_mmu_pages *pvec,
  struct mmu_page_path *parents,
  int i)
{
int n;

for (n = i+1; n  pvec-nr; n++) {
struct kvm_mmu_page *sp = pvec-page[n].sp;

if (sp-role.level == PT_PAGE_TABLE_LEVEL) {
parents-idx[0] = pvec-page[n].idx;
return n;
}

parents-parent[sp-role.level-2] = sp;
parents-idx[sp-role.level-1] = pvec-page[n].idx;
}

return n;
}


Do we need to break out of the loop if we switch parents during the loop 
(since that will give us a different mmu_page_path)?  Or are callers 
careful to only pass pvecs which belong to the same shadow page?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live memory allocation?

2009-03-29 Thread Avi Kivity

Nolan wrote:

Windows does zero all memory at boot, and also runs a idle-priority thread in
the background to zero memory as it is freed.  This way it is far less likely to
need to zero a page to satisfy a memory allocation request.  Whether or not this
is still a win now that people care about power consumption is an open question.

I suspect the difference of behavior between KVM and VMware is related to
VMware's page sharing.  All those zeroed pages can be collapsed into one COW
zero page.  I wouldn't be surprised to learn that VMware has heuristics in the
page sharing code specifically for windows guests.

Perhaps KSM would help you?  Alternately, a heuristic that scanned for (and
collapsed) fully zeroed pages when a page is faulted in for the first time could
catch these.
  


ksm will indeed collapse these pages.  Lighter-weight alternatives exist 
-- ballooning (need a Windows driver), or, like you mention, a simple 
scanner that looks for zero pages and drops them.  That could be 
implemented within qemu (with some simple kernel support for dropping 
zero pages atomically, say madvise(MADV_DROP_IFZERO).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 02:54:31PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
 If userspace knows that the kernel part supports 1GB pages it can enable
 the corresponding cpuid bit so that guests actually use GB pages.
   

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index a1df2a3..6593198 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -542,6 +542,8 @@ struct kvm_x86_ops {
  int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
  int (*get_tdp_level)(void);
  int (*get_mt_mask_shift)(void);
 +
 +bool (*gb_page_enable)(void);
  };
   

 Should enable unconditionally.  Of course we need to find the shadow bug  
 first, may be the has_wrprotected thingy.

This was the original plan. But how about VMX with EPT enabled? I am not
sure but I think this configuration will not support gbpages?

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Support for GB pages in KVM

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 03:03:28PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
 On Sat, Mar 28, 2009 at 06:40:08PM -0300, Marcelo Tosatti wrote:
   
 On Fri, Mar 27, 2009 at 03:31:52PM +0100, Joerg Roedel wrote:
 
 Hi,

 this patchset extends the KVM MMU implementation to support 1GB pages as
 supported by AMD family 16 processors. These patches enable support for
 1 GB pages with Nested Paging. Support for these pages in the shadow
 paging code was also developed but does not run stable yet. The patch
 for shadow-paging support is not included in this series and will be
 sent out seperatly.
   
 Looks generally sane. I'm not sure its even worthwhile to support
 GBpages with softmmu, because the chance of finding an area without
 shadowed (write protected) pages is much smaller than with 2MB pages.
 

 Thanks for your review.

 The idea behind GB pages in softmmu code was to provide GB pages to the
 guest even if hardware does not support it. This would work better with
 live migration (Only case where we wouldn't have gbpages then would be
 vmx with ept enabled).

   
 Have any numbers to share?
 

 No numbers I fully trust by now. I measured a 32% improvement in
 kernbench using nested pages backed with gb pages. I will do some more
 measurements and share some more solid numbers.

   

 Compared to 2M pages?  But we're already close to native here.

Yes, thats why I don't trust those numbers. I'll find out what went
wrong and provide more solid numbers.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
int (*get_mt_mask_shift)(void);
+
+   bool (*gb_page_enable)(void);
 };
  
  
Should enable unconditionally.  Of course we need to find the shadow bug  
first, may be the has_wrprotected thingy.



This was the original plan. But how about VMX with EPT enabled? I am not
sure but I think this configuration will not support gbpages?




You're right.  Let's have a -max_host_page_level() to handle that.  
It's 0.5T pages ready, too.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: add support for 1GB pages in shadow paging code

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 02:59:26PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
  return write_protected;
  }
  @@ -789,11 +797,14 @@ static int kvm_handle_hva(struct kvm *kvm, 
 unsigned long hva,
  if (hva = start  hva  end) {
  gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
  unsigned long lidx = gfn_offset / KVM_PAGES_PER_2M_PAGE;
 +unsigned long hidx = gfn_offset / KVM_PAGES_PER_1G_PAGE;
  retval |= handler(kvm, memslot-rmap[gfn_offset],
KVM_PAGE_SIZE_4k);
  retval |= handler(kvm,
memslot-lpage_info[lidx].rmap_pde,
KVM_PAGE_SIZE_2M);
 +retval |= handler(kvm, 
 memslot-hpage_info[hidx].rmap_pde,
 +  KVM_PAGE_SIZE_1G);
  }
  }
   

 Isn't this needed for tdp as well?

Hmm, yes. But it may be no problem not doing it because large pages are
never swapped out. Anyway, I will move this to the tdp patch

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 03:49:11PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
int (*get_mt_mask_shift)(void);
 +
 +  bool (*gb_page_enable)(void);
  };
 
 Should enable unconditionally.  Of course we need to find the shadow 
 bug  first, may be the has_wrprotected thingy.
 

 This was the original plan. But how about VMX with EPT enabled? I am not
 sure but I think this configuration will not support gbpages?

  

 You're right.  Let's have a -max_host_page_level() to handle that.   
 It's 0.5T pages ready, too.

Ok I will change that together with the page_size - page_level
chhanges. But I doubt that there will ever be 0.5T pages ;)

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm x86: report 1GB page support to userspace

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

Ok I will change that together with the page_size - page_level
chhanges. But I doubt that there will ever be 0.5T pages ;)
  


We're bloating at a rate of 1 bit per 1-2 years, so we have 8-16 years 
to prepare.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 02:45:44PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
 This patch adds the necessary data structures to take care of write
 protections in place within a second huge page sized page.

   struct kvm_vcpu_arch {
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 9936b45..7d4162d 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -390,6 +390,15 @@ static int *slot_largepage_idx(gfn_t gfn, struct 
 kvm_memory_slot *slot)
  return slot-lpage_info[idx].write_count;
  }
  +static int *slot_hugepage_idx(gfn_t gfn, struct kvm_memory_slot 
 *slot)
 +{
 +unsigned long idx;
 +
 +idx = (gfn / KVM_PAGES_PER_1G_PAGE) -
 +  (slot-base_gfn / KVM_PAGES_PER_1G_PAGE);
 +return slot-hpage_info[idx].write_count;
 +}
   

 A page level argument would remove the need for this duplication, as  
 well as all the constants.

  +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
 +{
 +struct kvm_memory_slot *slot;
 +int *hugepage_idx;
 +
 +gfn = unalias_gfn(kvm, gfn);
 +slot = gfn_to_memslot_unaliased(kvm, gfn);
 +if (slot) {
 +hugepage_idx = slot_hugepage_idx(gfn, slot);
   

 slot_largepage_idx() here?

 I don't think we ever write protect large pages, so why is this needed?

For 2mb pages we need to check if there is a write-protected 4k page in it
before we map a 2mb page for writing. If there is any write-protected 4k
page in a 2mb area this 2mb page is considered write-protected. These
'write-protected' 2mb pages are accounted in the account_shadow()
function. This information is taken into account when we decide if we
can map a guest 1gb page as a 1gb page on the host too.

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting

2009-03-29 Thread Joerg Roedel
On Sun, Mar 29, 2009 at 04:15:07PM +0300, Avi Kivity wrote:
 Joerg Roedel wrote:
 
  +static int has_wrprotected_largepage(struct kvm *kvm, gfn_t gfn)
 +{
 +  struct kvm_memory_slot *slot;
 +  int *hugepage_idx;
 +
 +  gfn = unalias_gfn(kvm, gfn);
 +  slot = gfn_to_memslot_unaliased(kvm, gfn);
 +  if (slot) {
 +  hugepage_idx = slot_hugepage_idx(gfn, slot);
 
 slot_largepage_idx() here?

 I don't think we ever write protect large pages, so why is this needed?
 

 For 2mb pages we need to check if there is a write-protected 4k page in it
 before we map a 2mb page for writing. If there is any write-protected 4k
 page in a 2mb area this 2mb page is considered write-protected. These
 'write-protected' 2mb pages are accounted in the account_shadow()
 function. This information is taken into account when we decide if we
 can map a guest 1gb page as a 1gb page on the host too.
   

 account_shadowed() actually increments a hugepage write_count by 1 for  
 every 4K page, not 2M page, if I read the code correctly.  The code I  
 commented on is right though.

 The naming is confusing.  I suggest  
 has_wrprotected_page_in_{large,huge}page().  although with the a level  
 parameter we can keep has_wrprotected_page().

Yeah true, the name is a bit confusing. I think a level parameter for
has_wrprotected_page() is the best solution.

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting

2009-03-29 Thread Avi Kivity

Avi Kivity wrote:

Joerg Roedel wrote:

This patch adds the necessary data structures to take care of write
protections in place within a second huge page sized page.


+#ifdef KVM_PAGES_PER_LHPAGE
+if (npages  !new.hpage_info) {
+int hugepages = npages / KVM_PAGES_PER_LHPAGE;
+if (npages % KVM_PAGES_PER_LHPAGE)
+hugepages++;
+if (base_gfn % KVM_PAGES_PER_LHPAGE)
+hugepages++;
  


Consider a slot with base_gfn == 1 and npages == 1.  This will have 
hugepages == 2, which is wrong.


I think the right calculation is

 ((base_gfn + npages - 1) / N) - (base_gfn / N) + 1

i.e. index of last page, plus one so we can store it.

The small huge page calculation is off as well.



I fixed the existing case with

commit 1a967084dbe97a2f4be84139d14e2d958d7ffc46
Author: Avi Kivity a...@redhat.com
Date:   Sun Mar 29 16:31:25 2009 +0300

   KVM: MMU: Fix off-by-one calculating large page count
  
   The large page initialization code concludes there are two large 
pages spanned
   by a slot covering 1 (small) page starting at gfn 1.  This is 
incorrect, and
   also results in incorrect write_count initialization in some cases 
(base = 1,

   npages = 513 for example).
  
   Cc: sta...@kernel.org

   Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8aa3b95..3d31557 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1076,6 +1076,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
   int r;
   gfn_t base_gfn;
   unsigned long npages;
+   int largepages;
   unsigned long i;
   struct kvm_memory_slot *memslot;
   struct kvm_memory_slot old, new;
@@ -1151,11 +1152,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
   new.userspace_addr = 0;
   }
   if (npages  !new.lpage_info) {
-   int largepages = npages / KVM_PAGES_PER_HPAGE;
-   if (npages % KVM_PAGES_PER_HPAGE)
-   largepages++;
-   if (base_gfn % KVM_PAGES_PER_HPAGE)
-   largepages++;
+   largepages = 1 + (base_gfn + npages - 1) / 
KVM_PAGES_PER_HPAGE;

+   largepages -= base_gfn / npages;

   new.lpage_info = vmalloc(largepages * 
sizeof(*new.lpage_info));


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] kvm mmu: implement necessary data structures for second huge page accounting

2009-03-29 Thread Avi Kivity

Joerg Roedel wrote:

This patch adds the necessary data structures to take care of write
protections in place within a second huge page sized page.


+#ifdef KVM_PAGES_PER_LHPAGE
+   if (npages  !new.hpage_info) {
+   int hugepages = npages / KVM_PAGES_PER_LHPAGE;
+   if (npages % KVM_PAGES_PER_LHPAGE)
+   hugepages++;
+   if (base_gfn % KVM_PAGES_PER_LHPAGE)
+   hugepages++;
  


Consider a slot with base_gfn == 1 and npages == 1.  This will have 
hugepages == 2, which is wrong.


I think the right calculation is

 ((base_gfn + npages - 1) / N) - (base_gfn / N) + 1

i.e. index of last page, plus one so we can store it.

The small huge page calculation is off as well.


+
+   new.hpage_info = vmalloc(hugepages * sizeof(*new.hpage_info));
+
+   if (!new.hpage_info)
+   goto out_free;
+
+   memset(new.hpage_info, 0, hugepages * sizeof(*new.hpage_info));
+
+   if (base_gfn % KVM_PAGES_PER_LHPAGE)
+   new.hpage_info[0].write_count = 1;
+   if ((base_gfn+npages) % KVM_PAGES_PER_LHPAGE)
+   new.hpage_info[hugepages-1].write_count = 1;
+   }
+#endif
+
  


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] PCI: support the ATS capability

2009-03-29 Thread Matthew Wilcox
On Thu, Mar 26, 2009 at 04:15:56PM -0700, Jesse Barnes wrote:
2, avoid using pci_find_ext_capability every time when reading ATS
   Invalidate Queue Depth (Matthew Wilcox)

I asked a question about how that was used, and got back a version which
changed how it was done.  I still don't have an answer to my question.

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Fix handling of a fault during NMI unblocked due to IRET

2009-03-29 Thread Gleb Natapov
Bit 12 is undefined in any of the following cases:
 If the VM exit sets the valid bit in the IDT-vectoring information field.
 If the VM exit is due to a double fault.

Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/vmx.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 37ae13d..14e3f48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3259,36 +3259,41 @@ static void update_tpr_threshold(struct kvm_vcpu *vcpu)
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
-   u32 idt_vectoring_info;
+   u32 idt_vectoring_info = vmx-idt_vectoring_info;
bool unblock_nmi;
u8 vector;
int type;
bool idtv_info_valid;
u32 error;
 
+   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info  INTR_INFO_UNBLOCK_NMI) != 0;
vector = exit_intr_info  INTR_INFO_VECTOR_MASK;
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Re-set bit block by NMI before VM entry if vmexit caused by
 * a guest IRET fault.
+* SDM 3: 23.2.2 (September 2008)
+* Bit 12 is undefined in any of the following cases:
+*  If the VM exit sets the valid bit in the IDT-vectoring
+*   information field.
+*  If the VM exit is due to a double fault.
 */
-   if (unblock_nmi  vector != DF_VECTOR)
+   if ((exit_intr_info  INTR_INFO_VALID_MASK)  unblock_nmi 
+   vector != DF_VECTOR  !idtv_info_valid)
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
  GUEST_INTR_STATE_NMI);
} else if (unlikely(vmx-soft_vnmi_blocked))
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
-   idt_vectoring_info = vmx-idt_vectoring_info;
-   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
if (vmx-vcpu.arch.nmi_injected) {
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Clear bit block by NMI before VM entry if a NMI delivery
 * faulted.
 */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Rewrite twisted maze of if() statements with more straightforward switch()

2009-03-29 Thread Gleb Natapov
Signed-off-by: Gleb Natapov g...@redhat.com
---

 arch/x86/kvm/vmx.c |   43 +--
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 14e3f48..1017544 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3264,7 +3264,6 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
u8 vector;
int type;
bool idtv_info_valid;
-   u32 error;
 
idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
@@ -3289,34 +3288,42 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
+   vmx-vcpu.arch.nmi_injected = false;
+   kvm_clear_exception_queue(vmx-vcpu);
+   kvm_clear_interrupt_queue(vmx-vcpu);
+
+   if (!idtv_info_valid)
+   return;
+
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
-   if (vmx-vcpu.arch.nmi_injected) {
+   
+   switch(type) {
+   case INTR_TYPE_NMI_INTR:
+   vmx-vcpu.arch.nmi_injected = true;
/*
 * SDM 3: 27.7.1.2 (September 2008)
-* Clear bit block by NMI before VM entry if a NMI delivery
-* faulted.
+* Clear bit block by NMI before VM entry if a NMI
+* delivery faulted.
 */
-   if (idtv_info_valid  type == INTR_TYPE_NMI_INTR)
-   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
-   GUEST_INTR_STATE_NMI);
-   else
-   vmx-vcpu.arch.nmi_injected = false;
-   }
-   kvm_clear_exception_queue(vmx-vcpu);
-   if (idtv_info_valid  (type == INTR_TYPE_HARD_EXCEPTION ||
-   type == INTR_TYPE_SOFT_EXCEPTION)) {
+   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+   GUEST_INTR_STATE_NMI);
+   break;
+   case INTR_TYPE_HARD_EXCEPTION:
+   case INTR_TYPE_SOFT_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
-   error = vmcs_read32(IDT_VECTORING_ERROR_CODE);
-   kvm_queue_exception_e(vmx-vcpu, vector, error);
+   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
-   }
-   kvm_clear_interrupt_queue(vmx-vcpu);
-   if (idtv_info_valid  type == INTR_TYPE_EXT_INTR) {
+   break;
+   case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
+   break;
+   default:
+   break;
}
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO on guest is 20 times slower than host

2009-03-29 Thread Avi Kivity

Kurt Yoder wrote:

slow host cpu information, core 1 of 16:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 4
model name  : Quad-Core AMD Opteron(tm) Processor 8382
stepping: 2
cpu MHz : 2611.998
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall mmxext fxsr_opt
pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor
cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips: 5223.97
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


  


Can you loading kvm_amd on this host with 'modprobe kvm-amd npt=0'?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: qemu: check device assignment command

2009-03-29 Thread Avi Kivity

Han, Weidong wrote:

I suggest replacing the parsing code with pci_parse_devaddr() (needs
to be extended to support functions) so that all the checking and
parsing is done in one place.



If use pci_parse_devaddr(), it needs to add domain section to assigning 
command, and add function section to pci_add/pci_del commands. What's more, 
pci_parse_devaddr() parses guest device bdf, there are some assumption, such as 
function is 0. But here parse host bdf. It's a little complex to combine them 
together.
  


Right, but we end up with overall better code.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] kvm-userland vl.c compiler warning

2009-03-29 Thread Avi Kivity

Jes Sorensen wrote:

Hi,

A minor cosmetic fix to tell the compiler to shut up by including the
header providing the prototype for dma_helper_init().


I already got this via qemu-svn.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Gerry Reno

Hi,
 We had setup a number of VM's using KVM-65 running on F7 some time 
ago.  We just ocmpleted upgrading a number of hosts to F10 and with that 
came KVM-74-10.  We had some problems after that upgrade.  First, none 
of the VM's would start.  After digging around awhile we finally figured 
out that we had to once again 'chcon -t virt_image_t  path_to_image' as 
we had done back when we first setup the VM's.  But what is the proper 
procedure so that we don't lose the selinux attributes on these image 
files?  Shouldn't this be set by fedora's selinux-policy package?  Or by 
the kvm upgrade?  Next issue is that with KVM-74 we are having a lot of 
problems with the dreaded TCP/IP VNC connection to hypervisor host has 
been refused or disconnected.  This has now happened to one of the VM's 
after it was shutdown and then we tried to Run it again and we haven't 
found a way to reconnect to it.  Also, another VM has this weird problem 
that when you run it, it looks like its running because the screen keeps 
changing size but all we see is a blank palette on the console page, no 
screen at all, but the thing is running.  How do we fix these things?


Regards,
Gerry

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-autotest: write relative path in html report

2009-03-29 Thread Uri Lublin

Ryan Harper wrote:

When generating an html report from make_html_report.py, one needs to
supply the full path to the results directory.  This value ends up being
embedded in the output which breaks relocating the results dir to a
different path.  This patch adds a new flag that supresses the full path
value when generating the report.  I'm looking to generate a summary
html report in the results dir and relocate the results dir to a
different server which can be done with this patch.



Applied.

I've made the following 2 modifications to the commit log:
1. replaced kvm-autotest: with make_html_report: on the first line.
2. added the following note to the commit message:
Uri: Note that this only works when the html report is generated
 (or copied to) the results directory. In other words
 make_html_report.py -R -r result-dir -f file-in-result-dir


It can also be achieved with the following (which is not as convenient):
  cd results-dir; runtest_2/make_html_report.py -r . -f report.html

Next, I'm going to change the default control to use this switch.

Thanks,
Uri
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: soft lockup - CPU stuck for ...

2009-03-29 Thread Nikola Ciprich
Hi,
I was also experiencing this problem a lot for quite a long time (and for
wide range of KVM versions..)
I might be completely wrong as I'm not sure if it was really the reason,
but i THINK it disappeared when I started to use fully preemptible kernel on 
host..
You might want to try it...
BR
nik

On Sun, Mar 29, 2009 at 07:51:21AM +, Gerrit Slomma wrote:
 Robert Wimmer r.wimmer at tomorrow-focus.de writes:
 
  
  Hi,
  
  does anyone know how to solve the problem
  with BUG: soft lockup - CPU#0 stuck for ...?
  Today I got the messages below during compilation
  of the kernel modules in a guest. Using kvm84 and Kernel 2.6.29
  as host kernel and 2.6.28 as guest kernel during the
  hangup of the guest neither ssh or ping was possible.
  After about 2 minutes the guest was reachable again
  and I saw the messages below with dmesg.
  
  Maybe it is related with my prev. anserwed posting:
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/29677
  
  Thanks!
  Robert
  
  BUG: soft lockup - CPU#0 stuck for 61s!
  (...)
 
 Hello
 
 Do you use x86_64 or i686?
 Look at my post here 
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/29833
 And my Bug-report here https://bugzilla.redhat.com/show_bug.cgi?id=492688.
 I do not have the problems while running but after migrating. Problems with
 stuck CPUs vanish if i686 for the host is used - but i am testing further.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO on guest is 20 times slower than host

2009-03-29 Thread Avi Kivity

Avi Kivity wrote:

Kurt Yoder wrote:

slow host cpu information, core 1 of 16:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 4
model name  : Quad-Core AMD Opteron(tm) Processor 8382
stepping: 2
cpu MHz : 2611.998
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
pge mca

cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall mmxext fxsr_opt
pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl pni monitor
cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips: 5223.97
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


  


Can you loading kvm_amd on this host with 'modprobe kvm-amd npt=0'?



If it helps, then the guest is messing up the cpu cache.  Try the 
attached patch.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

diff --git a/kernel/x86/kvm/svm.c b/kernel/x86/kvm/svm.c
index 1fcbc17..d9774e9 100644
--- a/kernel/x86/kvm/svm.c
+++ b/kernel/x86/kvm/svm.c
@@ -575,7 +575,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 		INTERCEPT_CR3_MASK);
 		control-intercept_cr_write = ~(INTERCEPT_CR0_MASK|
 		 INTERCEPT_CR3_MASK);
-		save-g_pat = 0x0007040600070406ULL;
+		save-g_pat = 0x0606060606060606ULL;
 		/* enable caching because the QEMU Bios doesn't enable it */
 		save-cr0 = X86_CR0_ET;
 		save-cr3 = 0;


Re: After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Gerry Reno

Gerry Reno wrote:

Hi,
 We had setup a number of VM's using KVM-65 running on F7 some time 
ago.  We just ocmpleted upgrading a number of hosts to F10 and with 
that came KVM-74-10.  We had some problems after that upgrade.  First, 
none of the VM's would start.  After digging around awhile we finally 
figured out that we had to once again 'chcon -t virt_image_t  
path_to_image' as we had done back when we first setup the VM's.  But 
what is the proper procedure so that we don't lose the selinux 
attributes on these image files?  Shouldn't this be set by fedora's 
selinux-policy package?  Or by the kvm upgrade?  Next issue is that 
with KVM-74 we are having a lot of problems with the dreaded TCP/IP 
VNC connection to hypervisor host has been refused or disconnected.  
This has now happened to one of the VM's after it was shutdown and 
then we tried to Run it again and we haven't found a way to reconnect 
to it.  Also, another VM has this weird problem that when you run it, 
it looks like its running because the screen keeps changing size but 
all we see is a blank palette on the console page, no screen at all, 
but the thing is running.  How do we fix these things?


And we have a VM that when you start it says its Connecting to console 
for guest.  But the console message just stays like that and yet the 
machine completely boots up and we can log into it remotely in a 
terminal window.  We just cannot get a console screen.


We've tried putting the vnc on different ports 5901,5902,5903,etc.  But 
it doesn't help.   Again, what can we do to fix these problems?  Thanks.



Regards,
Gerry

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Ross McKay
Gerry Reno wrote:

[...]  After digging around awhile we finally figured 
out that we had to once again 'chcon -t virt_image_t  path_to_image' as 
we had done back when we first setup the VM's.  But what is the proper 
procedure so that we don't lose the selinux attributes on these image 
files?  [...]

You need to tell selinux about paths that you have custom selinux
contexts on if you want those contexts to persist. Something like
(assuming /path/to/image is a folder):

semanage fcontext -a -t virt_image_t /path/to/image(/.*)?

That will add them to the selinux database, and a restorecon will
preserve your changes (and reassert them if something else changes the
contexts of the files, or if you move a file from another place into
/path/to/image)
-- 
Ross McKay, Toronto, NSW Australia
Let the laddie play wi the knife - he'll learn
- The Wee Book of Calvin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Gerry Reno

Ross McKay wrote:

Gerry Reno wrote:

  
[...]  After digging around awhile we finally figured 
out that we had to once again 'chcon -t virt_image_t  path_to_image' as 
we had done back when we first setup the VM's.  But what is the proper 
procedure so that we don't lose the selinux attributes on these image 
files?  [...]



You need to tell selinux about paths that you have custom selinux
contexts on if you want those contexts to persist. Something like
(assuming /path/to/image is a folder):

semanage fcontext -a -t virt_image_t /path/to/image(/.*)?

That will add them to the selinux database, and a restorecon will
preserve your changes (and reassert them if something else changes the
contexts of the files, or if you move a file from another place into
/path/to/image)
  

Ok, thanks.  I'll run that command on all the image file directory trees.

Now if we can just get some guidance on all this console connection 
craziness maybe we can get back to normal running.



Regards,
Gerry

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Gerry Reno

Gerry Reno wrote:

Gerry Reno wrote:

Hi,
 We had setup a number of VM's using KVM-65 running on F7 some time 
ago.  We just ocmpleted upgrading a number of hosts to F10 and with 
that came KVM-74-10.  We had some problems after that upgrade.  


we are having a lot of problems with the dreaded TCP/IP VNC 
connection to hypervisor host has been refused or disconnected.  
This has now happened to one of the VM's after it was shutdown and 
then we tried to Run it again and we haven't found a way to reconnect 
to it.  
This is happening on a VM that was just anaconda upgraded to F9 and is 
on its first reboot.  But that reboot we cannot make happen because the 
VM refuses to run.


Also, another VM has this weird problem that when you run it, it 
looks like its running because the screen keeps changing size but all 
we see is a blank palette on the console page, no screen at all, but 
the thing is running.  How do we fix these things?
We kicked everyone off the VM's and rebooted the host and now we have 
access again to this VM. 





And we have a VM that when you start it says its Connecting to console 
for guest.  But the console message just stays like that and yet the 
machine completely boots up and we can log into it remotely in a 
terminal window.  We just cannot get a console screen.

This also got corrected through a host reboot.


So the only problem I'm very concerned about at the moment is the VM 
that was at the point of first reboot after the successful completion of 
an F9 install.  Any suggestions as to how to get that VM running?



Regards,
Gerry


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RFC: Add reserved bits check

2009-03-29 Thread Dong, Eddie
 +static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int
 level) +{ +  int ps = 0;
 +
 +if (level == PT_DIRECTORY_LEVEL)
 +ps = !!(gpte  PT_PAGE_SIZE_MASK);
 
 
 No need for this.  If you set rsvd_bits_mask[1][0] ==
 rsvd_bits_mask[0][0], then you get the same behaviour.  The first
 index is not the page size, it's just bit 7.

Sure, fixed.

 
 You'll need to fill all the indexes for bit 7 == 1, but it's worth it,
 with the 1GB pages patch.
 
 +return (gpte  vcpu-arch.mmu.rsvd_bits_mask[ps][level-1]) != 0; +}
 +
  #define PTTYPE 64
  #include paging_tmpl.h
  #undef PTTYPE
 
 +int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 +{
 +struct kvm_cpuid_entry2 *best;
 +
 +best = kvm_find_cpuid_entry(vcpu, 0x8008, 0); + if (best)
 +return best-eax  0xff;
 +return 32;
 +}
 +
 
 
 Best to return 36 if the cpu doesn't support cpuid 8008 but does
 support pae.

Mmm, noticed a conflict information in SDM, but you are right :)

One more modification is that RSVD bit error code won't update if P=0 after 
double checking with internal architect.

Thanks and reposted.
Eddie




Emulate #PF error code of reserved bits violation.

Signed-off-by: Eddie Dong eddie.d...@intel.com

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55fd4c5..4fe2742 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -261,6 +261,7 @@ struct kvm_mmu {
union kvm_mmu_page_role base_role;
 
u64 *pae_root;
+   u64 rsvd_bits_mask[2][4];
 };
 
 struct kvm_vcpu_arch {
@@ -791,5 +792,6 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ef060ec..0a6f109 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -126,6 +126,7 @@ module_param(oos_shadow, bool, 0644);
 #define PFERR_PRESENT_MASK (1U  0)
 #define PFERR_WRITE_MASK (1U  1)
 #define PFERR_USER_MASK (1U  2)
+#define PFERR_RSVD_MASK (1U  3)
 #define PFERR_FETCH_MASK (1U  4)
 
 #define PT_DIRECTORY_LEVEL 2
@@ -179,6 +180,11 @@ static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mt_mask;
 
+static inline u64 rsvd_bits(int s, int e)
+{
+   return ((1ULL  (e - s + 1)) - 1)  s;
+}
+
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
 {
shadow_trap_nonpresent_pte = trap_pte;
@@ -2155,6 +2161,14 @@ static void paging_free(struct kvm_vcpu *vcpu)
nonpaging_free(vcpu);
 }
 
+static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
+{
+   int bit7;
+
+   bit7 = (gpte  7)  1;
+   return (gpte  vcpu-arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
+}
+
 #define PTTYPE 64
 #include paging_tmpl.h
 #undef PTTYPE
@@ -2183,6 +2197,25 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu, int level)
 
 static int paging64_init_context(struct kvm_vcpu *vcpu)
 {
+   struct kvm_mmu *context = vcpu-arch.mmu;
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+   u64 exb_bit_rsvd = 0;
+
+   if (!is_nx(vcpu))
+   exb_bit_rsvd = rsvd_bits(63, 63);
+
+   context-rsvd_bits_mask[0][3] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+   context-rsvd_bits_mask[1][3] = context-rsvd_bits_mask[0][3];
+   context-rsvd_bits_mask[1][2] = context-rsvd_bits_mask[0][2];
+   context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
+   context-rsvd_bits_mask[1][0] = context-rsvd_bits_mask[0][0];
return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
 }
 
@@ -2190,6 +2223,16 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
 {
struct kvm_mmu *context = vcpu-arch.mmu;
 
+   /* no rsvd bits for 2 level 4K page table entries */
+   context-rsvd_bits_mask[0][1] = 0;
+   context-rsvd_bits_mask[0][0] = 0;
+   if (is_cpuid_PSE36())
+   /* 36bits PSE 4MB page */
+   context-rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
+   else
+   /* 32 bits PSE 4MB page */
+   context-rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
+   context-rsvd_bits_mask[1][0] = 0;
context-new_cr3 = paging_new_cr3;
context-page_fault = paging32_page_fault;
context-gva_to_gpa = paging32_gva_to_gpa;
@@ -2205,6 +2248,22 @@ static int paging32_init_context(struct 

Re: [PATCH v4 0/6] PCI: support the ATS capability

2009-03-29 Thread Yu Zhao
On Sun, Mar 29, 2009 at 09:51:31PM +0800, Matthew Wilcox wrote:
 On Thu, Mar 26, 2009 at 04:15:56PM -0700, Jesse Barnes wrote:
 2, avoid using pci_find_ext_capability every time when reading ATS
Invalidate Queue Depth (Matthew Wilcox)
 
 I asked a question about how that was used, and got back a version which
 changed how it was done.  I still don't have an answer to my question.

VT-d hardware is designed as that the Invalidate Queue Depth is used
every time when the software prepares the Invalidate Request descriptor.
This happens when the device IOMMU mapping changes (i.e. device driver
calls DMA map/unmap if the device is use by the host; or when a guest
is started/destroyed if the device is assigned to this guest).

Given the DMA map/unmap are used very frequently, I suppose the queue
depth should be cached somewhere. And it used to be cached in the VT-d
private data structure (before v3) because I'm not sure about how the
IOMMU hardware from other vendors use the queue depth.

After you commented the code, I checked the AMD/IBM/Sun IOMMU: AMD IOMMU
also uses the invalidate queue for every Invalidate Request descriptor;
IBM/Sun IOMMUs don't look like supporting the ATS. So it's reasonable to
cache the queue depth in the PCI subsystem since all IOMMUs that support
the ATS use the queue depth in the same way (very frequently), right?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2721640 ] perfctr wrmsr warning when booting 64bit RHEl5.3

2009-03-29 Thread SourceForge.net
Bugs item #2721640, was opened at 2009-03-29 19:04
Message generated for change (Tracker Item Submitted) made by jiajun
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2721640group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Nobody/Anonymous (nobody)
Summary: perfctr wrmsr warning when booting 64bit RHEl5.3

Initial Comment:
Testing Environment
Kernel Commit: 1b24f5558e2b1dc20de2e974e80785a4e231c133
Userspace Commit: ef44113201a240fae7faf480591a9ecd73d337d1
Host Kernel Version: 2.6.29-rc6

When booting 64bit RHEL5.3 guest, there are some perfctr wrmsr warnings 
printing in dmesg. With 32bit RHEl5.3 or other older 64bit kernel, there is no 
such warning.

Reproduce steps:

qemu-system-x86_64  -m 768   -net nic,macaddr=00:16:3e:0a:52:36,model=rtl8139 
-net tap,script=/etc/kvm/qemu-ifup
-hda /share/xvs/var/32e_rhel5u3.img

dmesg log 

device sw3 entered promiscuous mode
kvm: 20076: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
kvm: 20076: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffcf2a11
kvm: 20076: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079





--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2721640group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Biweekly KVM Test report, kernel 0c7771... userspace 1223a0...

2009-03-29 Thread Xu, Jiajun
Hi All,

This is our Weekly KVM Testing Report against lastest kvm.git
0c77713470debc666a07dc40080d728272bb58b9 and kvm-userspace.git
1223a029b36b0d9e73af76bcc274bb770f814886.

One New Issue:

1. perfctr wrmsr warning when booting 64bit RHEl5.3
https://sourceforge.net/tracker/?func=detailaid=2721640group_id=180599atid=893831

Five Old Issues:

1. 32bits Rhel5/FC6 guest may fail to reboot after installation
https://sourceforge.net/tracker/?func=detailatid=893831aid=1991647group_id=180599

2. failure to migrate guests with more than 4GB of RAM
https://sourceforge.net/tracker/index.php?func=detailaid=1971512group_id=180599atid=893831

3. OpenSuse10.2 can not be installed
http://sourceforge.net/tracker/index.php?func=detailaid=2088475group_id=180599atid=893831

4. Fail to Save Restore Guest
https://sourceforge.net/tracker/?func=detailatid=893831aid=2175042group_id=180599

5. hotplug inexistent device will kill guest
https://sourceforge.net/tracker/index.php?func=detailaid=2432316group_id=180599atid=893831


Test environment

Platform  A
Stoakley/Clovertown
CPU 4
Memory size 8G'

Report Summary on IA32-pae
   Summary Test Report of Last Session
=
Total   PassFailNoResult   Crash
=
control_panel   8   5   3 00
gtest   16  16  0 00
=
control_panel   8   5   3 00
 :KVM_256M_guest_PAE_gPAE   1   1   0 00
 :KVM_linux_win_PAE_gPAE1   1   0 00
 :KVM_two_winxp_PAE_gPAE1   1   0 00
 :KVM_four_sguest_PAE_gPA   1   0   1 00
 :KVM_1500M_guest_PAE_gPA   1   1   0 00
 :KVM_LM_Continuity_PAE_g   1   0   1 00
 :KVM_LM_SMP_PAE_gPAE   1   1   0 00
 :KVM_SR_Continuity_PAE_g   1   0   1 00
gtest   16  16  0 00
 :ltp_nightly_PAE_gPAE  1   1   0 00
 :boot_up_acpi_PAE_gPAE 1   1   0 00
 :boot_up_acpi_xp_PAE_gPA   1   1   0 00
 :boot_up_vista_PAE_gPAE1   1   0 00
 :reboot_xp_PAE_gPAE1   1   0 00
 :boot_base_kernel_PAE_gP   1   1   0 00
 :boot_up_acpi_win2k3_PAE   1   1   0 00
 :boot_smp_acpi_win2k3_PA   1   1   0 00
 :boot_smp_acpi_win2k_PAE   1   1   0 00
 :boot_up_win2008_PAE_gPA   1   1   0 00
 :boot_up_acpi_win2k_PAE_   1   1   0 00
 :boot_smp_acpi_xp_PAE_gP   1   1   0 00
 :boot_up_noacpi_win2k_PA   1   1   0 00
 :boot_smp_vista_PAE_gPAE   1   1   0 00
 :boot_smp_win2008_PAE_gP   1   1   0 00
 :kb_nightly_PAE_gPAE   1   1   0 00
=
Total   24  21  3 00

Report Summary on IA32e
   Summary Test Report of Last Session
=
Total   PassFailNoResult   Crash
=
control_panel   17  9   8 00
gtest   23  23  0 00
=
control_panel   17  9   8 00
 :KVM_4G_guest_64_g32e  1   1   0 00
 :KVM_four_sguest_64_gPAE   1   1   0 00
 :KVM_LM_SMP_64_g32e1   0   1 00
 :KVM_linux_win_64_gPAE 1   1   0 00
 :KVM_LM_SMP_64_gPAE1   0   1 00
 :KVM_SR_Continuity_64_gP   1   0   1 00
 :KVM_four_sguest_64_g32e   1   1   0 00
 :KVM_four_dguest_64_gPAE   1   1   0 00
 :KVM_SR_SMP_64_gPAE1   0   1 00
 :KVM_LM_Continuity_64_g3   1   0   1 00
 :KVM_1500M_guest_64_gPAE   1   1   0 00
 :KVM_LM_Continuity_64_gP   1   1   0 00
 :KVM_1500M_guest_64_g32e   1   0   1 00
 :KVM_SR_Continuity_64_g3   1   0   

Cleanup to reuse is_long_mode()

2009-03-29 Thread Dong, Eddie


Thanks, Eddie




commit 6688a1fbc37330f2c4e16d1a78050b64e1ce5dcc
Author: root r...@eddie-wb.localdomain
Date:   Mon Mar 30 11:31:10 2009 +0800

cleanup to reuse is_long_mode(vcpu)

Signed-off-by: Eddie Dong eddie.d...@intel.com

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index db5021b..affc31d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -859,7 +859,7 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
struct vcpu_svm *svm = to_svm(vcpu);
 
 #ifdef CONFIG_X86_64
-   if (vcpu-arch.shadow_efer  EFER_LME) {
+   if (is_long_mode(vcpu)) {
if (!is_paging(vcpu)  (cr0  X86_CR0_PG)) {
vcpu-arch.shadow_efer |= EFER_LMA;
svm-vmcb-save.efer |= EFER_LMA | EFER_LME;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9913a1d..b1f1458 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1543,7 +1543,7 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
enter_rmode(vcpu);
 
 #ifdef CONFIG_X86_64
-   if (vcpu-arch.shadow_efer  EFER_LME) {
+   if (is_long_mode(vcpu)) {
if (!is_paging(vcpu)  (cr0  X86_CR0_PG))
enter_lmode(vcpu);
if (is_paging(vcpu)  !(cr0  X86_CR0_PG))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf6683a..961bd2b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -289,7 +289,7 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
if (!is_paging(vcpu)  (cr0  X86_CR0_PG)) {
 #ifdef CONFIG_X86_64
-   if ((vcpu-arch.shadow_efer  EFER_LME)) {
+   if (is_long_mode(vcpu)) {
int cs_db, cs_l;
 
if (!is_pae(vcpu)) {

cleanup.patch
Description: cleanup.patch


Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit

2009-03-29 Thread Dong, Eddie
This is followup of rsvd_bits emulation.

thx, eddie




commit 171eb2b2d8282dd913a5d5c6c695fd64e1ddcf4c
Author: root r...@eddie-wb.localdomain
Date:   Mon Mar 30 11:39:50 2009 +0800

Use rsvd_bits_mask in load_pdptrs for cleanup and considing EXB bit.

Signed-off-by: Eddie Dong eddie.d...@intel.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0a6f109..b0bf8b2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2255,6 +2255,9 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu)
if (!is_nx(vcpu))
exb_bit_rsvd = rsvd_bits(63, 63);
 
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62) |
+   rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62);  /* PDE */
context-rsvd_bits_mask[0][0] = exb_bit_rsvd |
@@ -2270,6 +2273,17 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu)
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 {
struct kvm_mmu *context = vcpu-arch.mmu;
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+   u64 exb_bit_rsvd = 0;
+
+   if (!is_long_mode(vcpu)  is_pae(vcpu)  is_paging(vcpu)) {
+   if (!is_nx(vcpu))
+   exb_bit_rsvd = rsvd_bits(63, 63);
+
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 62) |
+   rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
+   }
 
context-new_cr3 = nonpaging_new_cr3;
context-page_fault = tdp_page_fault;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 961bd2b..ff178fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -233,7 +233,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
goto out;
}
for (i = 0; i  ARRAY_SIZE(pdpte); ++i) {
-   if ((pdpte[i]  1)  (pdpte[i]  0xfff001e6ull)) {
+   if ((pdpte[i]  PT_PRESENT_MASK) 
+   (pdpte[i]  vcpu-arch.mmu.rsvd_bits_mask[0][2])) {
ret = 0;
goto out;
}--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After upgrading from F7 kvm-65-15 to F10 kvm-74-10

2009-03-29 Thread Gerry Reno

Gerry Reno wrote:


So the only problem I'm very concerned about at the moment is the VM 
that was at the point of first reboot after the successful completion 
of an F9 install.  Any suggestions as to how to get that VM running?




In the virtmanager log we see this when we try to start this VM:
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (manager:427) VM 
MX_2 started
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (details:1042) Got 
timed retry
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (details:1094) 
Trying console login
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (details:1108) 
Graphics console configured at vnc://127.0.0.1:5901
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (details:1121) 
Starting connect process for 127.0.0.1 5901
[Sun, 29 Mar 2009 22:37:27 virt-manager 5707] DEBUG (details:1020) VNC 
initialized
[Sun, 29 Mar 2009 22:37:30 virt-manager 5707] DEBUG (details:1006) VNC 
disconnected
[Sun, 29 Mar 2009 22:37:30 virt-manager 5707] WARNING (details:1032) 
Retrying connection in 125 ms
[Sun, 29 Mar 2009 22:37:30 virt-manager 5707] ERROR (proxies:400) 
Introspect error on 
:1.5:/org/freedesktop/Hal/devices/net_06_d3_60_e9_70_0f: 
dbus.exceptions.DBusException: org.freedesktop.Hal.NoSuchDevice: No 
device with id /org/freedesktop/Hal/devices/net_06_d3_60_e9_70_0f
[Sun, 29 Mar 2009 22:37:30 virt-manager 5707] DEBUG (proxies:403) 
Executing introspect queue due to error
[Sun, 29 Mar 2009 22:37:30 virt-manager 5707] ERROR (connection:218) 
Exception in handler for D-Bus signal:

Traceback (most recent call last):
 File /usr/lib/python2.5/site-packages/dbus/connection.py, line 214, 
in maybe_handle_message

   self._handler(*args, **kwargs)
 File /usr/share/virt-manager/virtManager/connection.py, line 278, in 
_net_phys_device_removed

   if objif.QueryCapability(net):
 File /usr/lib/python2.5/site-packages/dbus/proxies.py, line 68, in 
__call__

   return self._proxy_method(*args, **keywords)
 File /usr/lib/python2.5/site-packages/dbus/proxies.py, line 140, in 
__call__

   **keywords)
 File /usr/lib/python2.5/site-packages/dbus/connection.py, line 630, 
in call_blocking

   message, timeout)


What does this mean?

Regards,
Gerry


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Add reserved bits check

2009-03-29 Thread Avi Kivity

Dong, Eddie wrote:

@@ -2183,6 +2197,25 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu, int level)
 
 static int paging64_init_context(struct kvm_vcpu *vcpu)

 {
+   struct kvm_mmu *context = vcpu-arch.mmu;
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+   u64 exb_bit_rsvd = 0;
+
+   if (!is_nx(vcpu))
+   exb_bit_rsvd = rsvd_bits(63, 63);
+
+   context-rsvd_bits_mask[0][3] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   context-rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
+   context-rsvd_bits_mask[1][3] = context-rsvd_bits_mask[0][3];
+   context-rsvd_bits_mask[1][2] = context-rsvd_bits_mask[0][2];
+   context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
+   rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
+   context-rsvd_bits_mask[1][0] = context-rsvd_bits_mask[0][0];
return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
 }
  


Just noticed that walk_addr() too can be called from tdp context, so 
need to make sure rsvd_bits_mask is initialized in init_kvm_tdp_mmu() as 
well.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors after migration - why?

2009-03-29 Thread Takeshi Sone
Hello,

I had similar problem regarding block I/O and migration.
And it is worked around by qemu stop command and waiting 1 second
before starting migration (and cont after migration).
See the Ubuntu bug report I posted.
https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/341682

I think Nolan description here explains why stop and wait works.


Nolan wrote:
 On Sat, 2009-03-28 at 11:21 +0100, Tomasz Chmielewski wrote:
 Nolan schrieb:
 Tomasz Chmielewski mangoo at wpkg.org writes:
 I'm trying to perform live migration by following the instructions on 
 http://www.linux-kvm.org/page/Migration.
 Unfortunately, it doesn't work very well - guest is migrated, but looses 
 access to its disk.
 The LSI logic scsi device model doesn't implement device state 
 save/restore. 
 Any suspend/resume, snapshot or migration will fail.
 Oh, that sucks - as not everything supports virtio (which doesn't work 
 for me as well for some reason) - like Windows (which should be 
 addressed soon with block virtio drivers), but also older installations, 
 running older kernels.
 
 It is indeed a shame.  I wish I had the time to investigate and resolve
 the problems with my patch that I linked to previously.
 
 LSI in particular is important for interoperability, as that is what
 VMware uses.
 
 Does IDE support migration?
 
 It appears to, but I am not 100% sure that it will always survive
 migration under heavy IO load.  I've gotten mixed messages on whether or
 not the qemu core waits for all in flight IOs to complete or if the
 device models need to checkpoint pending IOs themselves.  Experimental
 evidence suggests that it does not.  Also, from ide.c's checkpoint save
 code:
 /* XXX: if a transfer is pending, we do not save it yet */
 
 I think the ideal here would be to stop the CPUs, but let the device
 models continue to run.  Once all pending IOs have completed (and DMAed
 data and/or descriptors into guest memory, or raised interrupts, or
 whatever) then checkpoint all device state.  When the guest resumes, it
 will see an unusual flurry of IO completions and/or interrupts, but it
 should be able to handle that OK.  Shouldn't look much different from
 SMM taking over for a while during high IO load.
 
 This would save a lot of (unwritten, complex, hard to test)
 checkpointing code in the device models.  Might cause a missed timer
 interrupt or two if there is a lot of slow IO, but that can be
 compensated for if needed.
 
 I sent a patch that partially addresses this (but is buggy in the presence 
 of
 in-flight IO):
 http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00744.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html