Re: qemu-kvm: fix infinite recursion in pci

2009-12-17 Thread Avi Kivity

On 12/15/2009 02:58 PM, Michael S. Tsirkin wrote:

Make config reads for assigned devices work
like they used to: both pci_default_read_config
and pci_default_cap_read_config call to pci_read_config,
which does the actual work.

This fixes infinite recursion introduced by
recent merge from stable-0.12.

   


Applied to master and 0.12, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging windows guests

2009-12-17 Thread Avi Kivity

On 12/17/2009 09:05 AM, Raindog wrote:

On 12/16/2009 9:36 PM, Avi Kivity wrote:

On 12/17/2009 12:06 AM, Raindog wrote:


Are there any advantages over stock qemu if using kvm w/out the 
kernel module?


No.  qemu-kvm is not tested without kvm, so there may be disadvantages.



Does that then imply that svm emulation (--enable-nesting) is not well 
tested either?


It's an unrelated feature (but I wouldn't say it is heavily tested).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tuning guide, -cpu host crashes hosted VM

2009-12-17 Thread Avi Kivity

On 12/16/2009 12:33 AM, Steven Ball wrote:

Howdy all,

I am attempting to evaluate a few different virtualization platforms.  I've 
been going about this by installing a really basic XP host, and running some 
benchmarks.  I realize these are synthetic benchmarks, but it at least allows 
me to see how the virtualization platforms differ.

I attempted to do some tuning with KVM as per the tuning guide, using -cpu 
host, and as soon as my benchmarking program under XP uses a sse instruction, I 
get a blue screen crash and reboot.
   


Any idea what the instruction is?  You can set Windows to collect a 
crash dump, then use windbg to examine the dump and it will point out 
the offending instruction.


Alternatively, post a pointer to the application and someone may 
reproduce it some day.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] Nested VMX patch 5 Simplify fpu handling

2009-12-17 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

   


What exactly is the simplification?  Is it intended to have a functional 
change?



---
  arch/x86/kvm/vmx.c |   27 +--
  1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8745d44..de1f596 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1244,8 +1244,6 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
u32 eb;

eb = (1u  PF_VECTOR) | (1u  UD_VECTOR) | (1u  MC_VECTOR);
-   if (!vcpu-fpu_active)
-   eb |= 1u  NM_VECTOR;
/*
 * Unconditionally intercept #DB so we can maintain dr6 without
 * reading it every exit.
@@ -1463,10 +1461,6 @@ static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
if (vcpu-fpu_active)
return;
vcpu-fpu_active = 1;
-   vmcs_clear_bits(GUEST_CR0, X86_CR0_TS);
-   if (vcpu-arch.cr0  X86_CR0_TS)
-   vmcs_set_bits(GUEST_CR0, X86_CR0_TS);
-   update_exception_bitmap(vcpu);
  }

  static void vmx_fpu_deactivate(struct kvm_vcpu *vcpu)
@@ -1474,8 +1468,6 @@ static void vmx_fpu_deactivate(struct kvm_vcpu *vcpu)
if (!vcpu-fpu_active)
return;
vcpu-fpu_active = 0;
-   vmcs_set_bits(GUEST_CR0, X86_CR0_TS);
-   update_exception_bitmap(vcpu);
  }

  static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
@@ -2715,8 +2707,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)

vmx_flush_tlb(vcpu);
vmcs_writel(GUEST_CR3, guest_cr3);
-   if (vcpu-arch.cr0  X86_CR0_PE)
-   vmx_fpu_deactivate(vcpu);
+   if (vcpu-arch.cr0  X86_CR0_PE) {
+   if (guest_cr3 != vmcs_readl(GUEST_CR3))
+   vmx_fpu_deactivate(vcpu);
+   }
   


Why the added cr3 check?  It may make sense, but it isn't a simplification.


  static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
@@ -5208,6 +5202,19 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (vcpu-arch.switch_db_regs)
get_debugreg(vcpu-arch.dr6, 6);

+   if (vcpu-fpu_active) {
+   if (vmcs_readl(CR0_READ_SHADOW)  X86_CR0_TS)
+   vmcs_set_bits(GUEST_CR0, X86_CR0_TS);
+   else
+   vmcs_clear_bits(GUEST_CR0, X86_CR0_TS);
+   vmcs_write32(EXCEPTION_BITMAP,
+vmcs_read32(EXCEPTION_BITMAP)   ~(1u  
NM_VECTOR));
+   } else {
+   vmcs_set_bits(GUEST_CR0, X86_CR0_TS);
+   vmcs_write32(EXCEPTION_BITMAP,
+vmcs_read32(EXCEPTION_BITMAP) |  (1u  
NM_VECTOR));
+   }
   


This is executed unconditionally, so the vmreads/vmwrites take place 
every time.  Need to cache the previous fpu_active state and only take 
action if it changed.


Since this is a large piece of code, move it to a function.

Please post this as the first patch (or better, separately), so I can 
apply it independently of the rest.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: Extended shared_msr_global to per CPU

2009-12-17 Thread Sheng Yang
shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.

Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.

Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

How about this?

Move the all initialization to hardware_enable(). And only initialized once
for each cpu.

 arch/x86/kvm/x86.c |   60 ---
 1 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dd15d7a..e3eae50 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -92,16 +92,17 @@ module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO 
| S_IWUSR);
 
 struct kvm_shared_msrs_global {
int nr;
-   struct kvm_shared_msr {
-   u32 msr;
-   u64 value;
-   } msrs[KVM_NR_SHARED_MSRS];
+   u32 msrs[KVM_NR_SHARED_MSRS];
 };
 
 struct kvm_shared_msrs {
struct user_return_notifier urn;
bool registered;
-   u64 current_value[KVM_NR_SHARED_MSRS];
+   struct kvm_shared_msr_values {
+   u64 host;
+   u64 curr;
+   bool initialized;
+   } values[KVM_NR_SHARED_MSRS];
 };
 
 static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
@@ -146,53 +147,68 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
unsigned slot;
-   struct kvm_shared_msr *global;
struct kvm_shared_msrs *locals
= container_of(urn, struct kvm_shared_msrs, urn);
+   struct kvm_shared_msr_values *values;
 
for (slot = 0; slot  shared_msrs_global.nr; ++slot) {
-   global = shared_msrs_global.msrs[slot];
-   if (global-value != locals-current_value[slot]) {
-   wrmsrl(global-msr, global-value);
-   locals-current_value[slot] = global-value;
+   values = locals-values[slot];
+   if (values-host != values-curr) {
+   wrmsrl(shared_msrs_global.msrs[slot], values-host);
+   values-curr = values-host;
}
}
locals-registered = false;
user_return_notifier_unregister(urn);
 }
 
-void kvm_define_shared_msr(unsigned slot, u32 msr)
+static void shared_msr_update(unsigned slot, u32 msr)
 {
-   int cpu;
+   struct kvm_shared_msrs *smsr;
u64 value;
 
+   smsr = __get_cpu_var(shared_msrs);
+   /* only read, and nobody should modify it at this time,
+* so don't need lock */
+   if (slot = shared_msrs_global.nr) {
+   printk(KERN_ERR kvm: invalid MSR slot!);
+   return;
+   }
+   if (smsr-values[slot].initialized)
+   return;
+   rdmsrl_safe(msr, value);
+   smsr-values[slot].host = value;
+   smsr-values[slot].curr = value;
+   smsr-values[slot].initialized = true;
+   put_cpu_var(shared_msrs);
+}
+
+void kvm_define_shared_msr(unsigned slot, u32 msr)
+{
if (slot = shared_msrs_global.nr)
shared_msrs_global.nr = slot + 1;
-   shared_msrs_global.msrs[slot].msr = msr;
-   rdmsrl_safe(msr, value);
-   shared_msrs_global.msrs[slot].value = value;
-   for_each_online_cpu(cpu)
-   per_cpu(shared_msrs, cpu).current_value[slot] = value;
+   shared_msrs_global.msrs[slot] = msr;
+   /* we need ensured the shared_msr_global have been updated */
+   smp_wmb();
 }
 EXPORT_SYMBOL_GPL(kvm_define_shared_msr);
 
 static void kvm_shared_msr_cpu_online(void)
 {
unsigned i;
-   struct kvm_shared_msrs *locals = __get_cpu_var(shared_msrs);
 
for (i = 0; i  shared_msrs_global.nr; ++i)
-   locals-current_value[i] = shared_msrs_global.msrs[i].value;
+   shared_msr_update(i, shared_msrs_global.msrs[i]);
 }
 
 void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
 {
struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs);
 
-   if (((value ^ smsr-current_value[slot])  mask) == 0)
+   if (((value ^ smsr-values[slot].curr)  mask) == 0)
return;
-   smsr-current_value[slot] = value;
-   wrmsrl(shared_msrs_global.msrs[slot].msr, value);
+   smsr-values[slot].curr = value;
+   wrmsrl(shared_msrs_global.msrs[slot], value);
if (!smsr-registered) {
smsr-urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(smsr-urn);
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest

2009-12-17 Thread Sheng Yang
Before enabling, execution of rdtscp in guest would result in #UD.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

Reflect guest CPUID on vmcs fields as well, but it involved some more code
which would only executed once... Do we need a callback there for post-cpuid
setting?

 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/svm.c  |6 
 arch/x86/kvm/vmx.c  |   49 --
 arch/x86/kvm/x86.c  |   15 ++-
 5 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f865e8..c920285 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -374,6 +374,8 @@ struct kvm_vcpu_arch {
/* used for guest single stepping over the given code position */
u16 singlestep_cs;
unsigned long singlestep_rip;
+
+   bool cpuid_rdtscp_enabled;
 };
 
 struct kvm_mem_alias {
@@ -532,6 +534,7 @@ struct kvm_x86_ops {
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
bool (*gb_page_enable)(void);
+   bool (*rdtscp_supported)(void);
 
const struct trace_print_flags *exit_reasons_str;
 };
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 2b49454..8c39320 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -53,6 +53,7 @@
  */
 #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001
 #define SECONDARY_EXEC_ENABLE_EPT   0x0002
+#define SECONDARY_EXEC_RDTSCP  0x0008
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3de0b37..052d91a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2912,6 +2912,11 @@ static bool svm_gb_page_enable(void)
return true;
 }
 
+static bool svm_rdtscp_supported(void)
+{
+   return false;
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -2978,6 +2983,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.exit_reasons_str = svm_exit_reasons_str,
.gb_page_enable = svm_gb_page_enable,
+   .rdtscp_supported = svm_rdtscp_supported,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5c464ed..f60e645 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -136,6 +136,8 @@ struct vcpu_vmx {
ktime_t entry_time;
s64 vnmi_blocked_time;
u32 exit_reason;
+
+   bool control_rdtscp_enabled;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -210,7 +212,7 @@ static const u32 vmx_msr_index[] = {
 #ifdef CONFIG_X86_64
MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR,
 #endif
-   MSR_EFER, MSR_K6_STAR,
+   MSR_EFER, MSR_TSC_AUX, MSR_K6_STAR,
 };
 #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index)
 
@@ -347,6 +349,12 @@ static inline int cpu_has_vmx_vpid(void)
SECONDARY_EXEC_ENABLE_VPID;
 }
 
+static inline int cpu_has_vmx_rdtscp(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_RDTSCP;
+}
+
 static inline int cpu_has_virtual_nmis(void)
 {
return vmcs_config.pin_based_exec_ctrl  PIN_BASED_VIRTUAL_NMIS;
@@ -878,6 +886,11 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
 }
 
+static bool vmx_rdtscp_supported(void)
+{
+   return cpu_has_vmx_rdtscp();
+}
+
 /*
  * Swap MSR entry in host/guest MSR entry array.
  */
@@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_CSTAR);
if (index = 0)
move_msr_up(vmx, index, save_nmsrs++);
+   index = __find_msr_index(vmx, MSR_TSC_AUX);
+   if (index = 0)
+   move_msr_up(vmx, index, save_nmsrs++);
/*
 * MSR_K6_STAR is only needed on long mode guests, and only
 * if efer.sce is enabled.
@@ -1002,6 +1018,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_IA32_SYSENTER_ESP:
data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
+   case MSR_TSC_AUX:
+   if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported())
+   return 1;
+   /* Otherwise falls through */
default:
vmx_load_host_state(to_vmx(vcpu));
msr = find_msr_entry(to_vmx(vcpu), msr_index);
@@ -1065,7 +1085,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)

Re: SIGTERM to qemu-kvm process destroys qcow2 image?

2009-12-17 Thread Kenni Lund
2009/12/17 Avi Kivity a...@redhat.com:
 On 12/17/2009 02:52 AM, Kenni Lund wrote:
 Yesterday I entered an invalid boot device as an argument to my
 qemu-kvm command for my Windows XP machine, causing an error about a
 missing boot device in the qemu BIOS/POST. As I didn't have any
 filesystems mounted inside the virtual machine (since it was stuck at
 the BIOS asking for a device to boot), I did a kill $pid, fixed the
 boot device in the qemu-kvm command and tried booting again...but with
 no luck, whatever I try now with qemu-kvm gives me the error:
 qemu: could not open disk image /data/virtualization/WindowsXP.img

 And qemu-img (check, convert, etc) gives me:
 qemu-img: Could not open 'WindowsXP.img'


 Can you post the first 4K of the image?  It shouldn't contain private data,
 but go over it (or don't post) if you sensitive information there.

4K dump attached.

Best Regards
Kenni Lund


WindowsXP.img_4k_dump
Description: Binary data


Re: [PATCH 6/7] Nested VMX patch 6 implements vmlaunch and vmresume

2009-12-17 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com


@@ -287,7 +297,7 @@ static inline int vmcs_field_type(unsigned long field)
  }

  /*
-  Returncs VMCS field size in bits
+  Returns VMCS field size in bits
  */
   


Fold.


  static inline int vmcs_field_size(int field_type, struct kvm_vcpu *vcpu)
  {
@@ -313,6 +323,10 @@ static inline int vmcs_field_size(int field_type, struct 
kvm_vcpu *vcpu)
return 0;
  }

+#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \
+   VM_EXIT_SAVE_IA32_PAT))
+#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \
+


I think a whitelist is better here, so if we add a new feature and 
forget it here, we don't introduce a vulnerability.




+static inline int nested_cpu_has_vmx_tpr_shadow(struct  kvm_vcpu *vcpu)
+{
+   return cpu_has_vmx_tpr_shadow()
+   get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control
+   CPU_BASED_TPR_SHADOW;
+}
   


bools are better.


+
+static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu)
+{
+   return cpu_has_secondary_exec_ctrls()
+   get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+}
+
+static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu
+  *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-secondary_vm_exec_control
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+}
   


Need to check for secondary controls first.


+
+static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-
+   secondary_vm_exec_control  SECONDARY_EXEC_ENABLE_EPT;
+}
+
+static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-secondary_vm_exec_control
+   SECONDARY_EXEC_ENABLE_VPID;
+}
   


A helper nested_check_2ndary_control() can help reduce duplication.


+
+static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-vm_entry_controls
+   VM_ENTRY_LOAD_IA32_PAT;
+}
   


Suggest dropping PAT support for now (it's optional in the spec IIRC, 
and doesn't help much).




+
+void prepare_vmcs_12(struct kvm_vcpu *vcpu)
+{
   


Not sure what this does.  From the name, it appears to prepare a vmcs.  
From the contents, it appears to read the vmcs and save it into a 
shadow vmcs.



+   struct shadow_vmcs *l2_shadow_vmcs =
+   get_shadow_vmcs(vcpu);
+
+   l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR);
+   l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR);
+   l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR);
+   l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR);
+   l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR);
+   l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR);
+   l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16(GUEST_LDTR_SELECTOR);
+   l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR);
+
+   l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET);
+   l2_shadow_vmcs-guest_physical_address =
+   vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);
+   l2_shadow_vmcs-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+   if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT)
+   l2_shadow_vmcs-guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);
+   l2_shadow_vmcs-cr3_target_count = vmcs_read32(CR3_TARGET_COUNT);
+   l2_shadow_vmcs-vm_entry_intr_info_field =
+   vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+   l2_shadow_vmcs-vm_entry_exception_error_code =
+   vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE);
+   l2_shadow_vmcs-vm_entry_instruction_len =
+   vmcs_read32(VM_ENTRY_INSTRUCTION_LEN);
+   l2_shadow_vmcs-vm_instruction_error =
+   vmcs_read32(VM_INSTRUCTION_ERROR);
+   l2_shadow_vmcs-vm_exit_reason  = vmcs_read32(VM_EXIT_REASON);
+   l2_shadow_vmcs-vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+   l2_shadow_vmcs-vm_exit_intr_error_code =
+   vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
+   l2_shadow_vmcs-idt_vectoring_info_field =
+   vmcs_read32(IDT_VECTORING_INFO_FIELD);
+   l2_shadow_vmcs-idt_vectoring_error_code =
+   vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   l2_shadow_vmcs-vm_exit_instruction_len =
+   vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   l2_shadow_vmcs-vmx_instruction_info =
+   vmcs_read32(VMX_INSTRUCTION_INFO);
+   l2_shadow_vmcs-guest_es_limit = vmcs_read32(GUEST_ES_LIMIT);
+   l2_shadow_vmcs-guest_cs_limit = vmcs_read32(GUEST_CS_LIMIT);
+   

Re: SIGTERM to qemu-kvm process destroys qcow2 image?

2009-12-17 Thread Avi Kivity

On 12/17/2009 11:38 AM, Kenni Lund wrote:

2009/12/17 Avi Kivitya...@redhat.com:
   

On 12/17/2009 02:52 AM, Kenni Lund wrote:
 

Yesterday I entered an invalid boot device as an argument to my
qemu-kvm command for my Windows XP machine, causing an error about a
missing boot device in the qemu BIOS/POST. As I didn't have any
filesystems mounted inside the virtual machine (since it was stuck at
the BIOS asking for a device to boot), I did a kill $pid, fixed the
boot device in the qemu-kvm command and tried booting again...but with
no luck, whatever I try now with qemu-kvm gives me the error:
qemu: could not open disk image /data/virtualization/WindowsXP.img

And qemu-img (check, convert, etc) gives me:
qemu-img: Could not open 'WindowsXP.img'

   

Can you post the first 4K of the image?  It shouldn't contain private data,
but go over it (or don't post) if you sensitive information there.
 

4K dump attached.

   


Seems fine.  Kevin can you take a look?

You have a backing file.  Do qemu-img info and qemu-img check like it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU

2009-12-17 Thread Avi Kivity

On 12/17/2009 11:32 AM, Sheng Yang wrote:

shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.

Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.

Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.

Signed-off-by: Sheng Yangsh...@linux.intel.com
---

How about this?

Move the all initialization to hardware_enable(). And only initialized once
for each cpu.


-void kvm_define_shared_msr(unsigned slot, u32 msr)
+static void shared_msr_update(unsigned slot, u32 msr)
  {
-   int cpu;
+   struct kvm_shared_msrs *smsr;
u64 value;

+   smsr =__get_cpu_var(shared_msrs);
+   /* only read, and nobody should modify it at this time,
+* so don't need lock */
+   if (slot= shared_msrs_global.nr) {
+   printk(KERN_ERR kvm: invalid MSR slot!);
+   return;
+   }
+   if (smsr-values[slot].initialized)
+   return;
   


I don't think .initialized is worthwhile.  shared_msr_update is run very 
rarely.



+   smsr-values[slot].initialized = true;
+   put_cpu_var(shared_msrs);
   


If you use __get_cpu_var(), you need to remove put_cpu_var().


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest

2009-12-17 Thread Avi Kivity

On 12/17/2009 11:33 AM, Sheng Yang wrote:

Before enabling, execution of rdtscp in guest would result in #UD.

Signed-off-by: Sheng Yangsh...@linux.intel.com
---

Reflect guest CPUID on vmcs fields as well, but it involved some more code
which would only executed once... Do we need a callback there for post-cpuid
setting?
   


I guess we do need a callback.


-   /* Otherwise falls through to kvm_set_msr_common */
+   ret = kvm_set_msr_common(vcpu, msr_index, data);
+   break;
+   case MSR_TSC_AUX:
+   if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported())
+   return 1;
+   /* Check reserved bit */
+   if ((data  0x) != 0)
+   return 1;
   


It's 0x...


@@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_CSTAR);
if (index= 0)
move_msr_up(vmx, index, save_nmsrs++);
+   index = __find_msr_index(vmx, MSR_TSC_AUX);
+   if (index= 0)
+   move_msr_up(vmx, index, save_nmsrs++);
   


Only do this if rdtscp is enabled!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGTERM to qemu-kvm process destroys qcow2 image?

2009-12-17 Thread Kevin Wolf
Am 17.12.2009 11:23, schrieb Avi Kivity:
 On 12/17/2009 11:38 AM, Kenni Lund wrote:
 2009/12/17 Avi Kivitya...@redhat.com:

 On 12/17/2009 02:52 AM, Kenni Lund wrote:
  
 Yesterday I entered an invalid boot device as an argument to my
 qemu-kvm command for my Windows XP machine, causing an error about a
 missing boot device in the qemu BIOS/POST. As I didn't have any
 filesystems mounted inside the virtual machine (since it was stuck at
 the BIOS asking for a device to boot), I did a kill $pid, fixed the
 boot device in the qemu-kvm command and tried booting again...but with
 no luck, whatever I try now with qemu-kvm gives me the error:
 qemu: could not open disk image /data/virtualization/WindowsXP.img

 And qemu-img (check, convert, etc) gives me:
 qemu-img: Could not open 'WindowsXP.img'


 Can you post the first 4K of the image?  It shouldn't contain private data,
 but go over it (or don't post) if you sensitive information there.
  
 4K dump attached.


 
 Seems fine.  Kevin can you take a look?

Agreed, the dump looks fine. Even qcow2_open succeeds if I just use this
header as a qcow2 file.

 You have a backing file.  Do qemu-img info and qemu-img check like it?

So I'd definitely have a look at the backing file. It's almost the only
thing that can go wrong after qcow2_open. qemu-img doesn't have known
problems with backing files in general, but it does have problems with
missing (or broken) backing files.

Kenni, could you check if your backing file is still alive? /tmp doesn't
sound like a safe place for images. If it still exists in the right
location (/tmp/WindowsXP.img.backup) and it is qcow2, can you please
attach the first 4k of the backing file?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mouse and keyboard dead in XP guest...

2009-12-17 Thread duck
Just upgraded from kvm-83 with evdev keyboard codes backoprted (yes, I 
know, it's old) to 0.12.0rc2 plus the 2.6.32 source for kvm-kmod. This 
on 64-bit Linux with an evdev Xorg.

Mouse and keyboard simply won't work in my existing XP VMs. 

Installed a fresh XP3 straight from a slipstreamed ISO. The text-based 
part of the install took my keyboard input; the GUI part was automated, 
thanks to nLite.

Windows installed fine and I ended at the Ctrl-Alt-Del logon screen. No 
mouse, no keyboard.

Tried via VNC. No mouse, no keyboard.

Slackware-64 13.0 -- forget what kernel version. 2.6.29, I think.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: add missing architectures

2009-12-17 Thread Michael S. Tsirkin
vhost is completely portable, but Kconfig include was missing for all
architectures besides x86, so it did not appear in the menu.  Add the
relevant Kconfig includes to all architectures that support
virtualization.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Hi Rusty,
please apply the following trivial fixup patch for 2.6.33.
Thanks!

 arch/ia64/kvm/Kconfig|1 +
 arch/powerpc/kvm/Kconfig |1 +
 arch/s390/kvm/Kconfig|1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index ef3e7be..01c7579 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -47,6 +47,7 @@ config KVM_INTEL
  Provides support for KVM on Itanium 2 processors equipped with the VT
  extensions.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..a1b4c5d 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -58,6 +58,7 @@ config KVM_E500
 
  If unsure, say N.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index bf164fc..6be6aea 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
-- 
1.6.6.rc1.43.gf55cc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH-cc-fixed] vhost: add missing architectures

2009-12-17 Thread Michael S. Tsirkin
vhost is completely portable, but Kconfig include was missing for all
architectures besides x86, so it did not appear in the menu.  Add the
relevant Kconfig includes to all architectures that support
virtualization.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Some IBM addresses seem to be bouncing. Reposting with these removed,
please send replies to this shorter list to avoid bounces.  Sorry about
the noise.


Hi Rusty,
please apply the following trivial fixup patch for 2.6.33.
Thanks!

 arch/ia64/kvm/Kconfig|1 +
 arch/powerpc/kvm/Kconfig |1 +
 arch/s390/kvm/Kconfig|1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index ef3e7be..01c7579 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -47,6 +47,7 @@ config KVM_INTEL
  Provides support for KVM on Itanium 2 processors equipped with the VT
  extensions.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..a1b4c5d 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -58,6 +58,7 @@ config KVM_E500
 
  If unsure, say N.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index bf164fc..6be6aea 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
-- 
1.6.6.rc1.43.gf55cc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] Nested VMX patch 7 handling of nested guest exits

2009-12-17 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

   


(changelog)


@@ -1525,6 +1539,22 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
new_offset = vmcs_read64(TSC_OFFSET) + delta;
vmcs_write64(TSC_OFFSET, new_offset);
}
+
+   if (l1_shadow_vmcs != NULL) {
+   l1_shadow_vmcs-host_tr_base =
+   vmcs_readl(HOST_TR_BASE);
+   l1_shadow_vmcs-host_gdtr_base =
+   vmcs_readl(HOST_GDTR_BASE);
+   l1_shadow_vmcs-host_ia32_sysenter_esp =
+   vmcs_readl(HOST_IA32_SYSENTER_ESP);
+
+   if (tsc_this  vcpu-arch.host_tsc)
+   l1_shadow_vmcs-tsc_offset =
+   vmcs_read64(TSC_OFFSET);
+
+   if (vmx-nested.nested_mode)
+   load_vmcs_host_state(l1_shadow_vmcs);
+   }
   


Please share this code with non-nested vmcs setup.


@@ -3794,6 +3824,11 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
  {
u32 cpu_based_vm_exec_control;

+   if (to_vmx(vcpu)-nested.nested_mode) {
+   nested_vmx_intr(vcpu);
+   return;
+   }
   


I think this happens too late?  enable_irq_window() is called after 
we've given up on injecting the interrupt because interrupts are 
disabled.  But if we're running a guest, we can vmexit and inject the 
interrupt.  This code will only vmexit.


Hm, I see the vmexit code has an in_interrupt case, but I'd like this to 
be more regular: adjust vmx_interrupt_allowed() to allow interrupts if 
in a guest, and vmx_inject_irq() to force the vmexit.  This way 
interrupts have a single code path.




  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
  {
+   if (to_vmx(vcpu)-nested.nested_mode) {
+   if (!nested_vmx_intr(vcpu))
+   return 0;
+   }
   


... and you do that... so I wonder why the changes to 
enable_irq_window() are needed?



+
return (vmcs_readl(GUEST_RFLAGS)  X86_EFLAGS_IF)
!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO)
(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
@@ -4042,6 +4082,10 @@ static int handle_exception(struct kvm_vcpu *vcpu)
   not interested (exception bitmap 12 does not include 
NM_VECTOR)
   enable fpu and resume l2 (avoid switching to l1)
*/
+
+   if (vmx-nested.nested_mode)
+   vmx-nested.nested_run_pending = 1; /* removing this 
line cause hung on boot of l2*/
+
   


This indicates a hack?


vmx_fpu_activate(vcpu);

return 1;
@@ -4169,7 +4213,33 @@ static int handle_cr(struct kvm_vcpu *vcpu)
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
-   kvm_set_cr0(vcpu, val);
+   if (to_vmx(vcpu)-nested.nested_mode) {
+   /* assume only X86_CR0_TS is handled by l0 */
+   long new_cr0 = vmcs_readl(GUEST_CR0);
+   long new_cr0_read_shadow = 
vmcs_readl(CR0_READ_SHADOW);
+
+   vmx_fpu_deactivate(vcpu);
   
+

+   if (val  X86_CR0_TS) {
+   new_cr0 |= X86_CR0_TS;
+   new_cr0_read_shadow |= X86_CR0_TS;
+   vcpu-arch.cr0 |= X86_CR0_TS;
+   } else {
+   new_cr0= ~X86_CR0_TS;
+   new_cr0_read_shadow= ~X86_CR0_TS;
+   vcpu-arch.cr0= X86_CR0_TS;
+   }
+
+   vmcs_writel(GUEST_CR0, new_cr0);
+   vmcs_writel(CR0_READ_SHADOW, 
new_cr0_read_shadow);
   


Don't you need to #vmexit if the new cr0 violates the cr0_bits_always_on 
constraint, or if it changes bits in cr0 that the guest intercepts?



+
+   if (!(val  X86_CR0_TS) || !(val  X86_CR0_PE))
+   vmx_fpu_activate(vcpu);
+
+   to_vmx(vcpu)-nested.nested_run_pending = 1;
   


Please split into a function.


+   } else
+   kvm_set_cr0(vcpu, val);
+
skip_emulated_instruction(vcpu);
return 1;
case 3:
@@ -4196,8 +4266,15 @@ static int handle_cr(struct kvm_vcpu *vcpu)
break;
case 2: /* clts */
vmx_fpu_deactivate(vcpu);
-   vcpu-arch.cr0= ~X86_CR0_TS;
-   vmcs_writel(CR0_READ_SHADOW, 

Re: Nested VMX support v4

2009-12-17 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

Avi,
We have addressed all of the comments, please apply.

   


I'm afraid there is still a lot of work remaining.


This work was inspired by the nested SVM support by Alexander Graf and Joerg
Roedel.

   


Please try to make this as readable as the svm work.  That means smaller 
patches, detailed changelogs, reduced forward declarations.  I would 
appreciate documentation of the data structures.  I am still confused 
about the triplication in kvm_vcpu, l1_state, and l2_state - there 
shouldn't be three, just two.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest

2009-12-17 Thread Sheng Yang
On Thursday 17 December 2009 18:39:22 Avi Kivity wrote:
 On 12/17/2009 11:33 AM, Sheng Yang wrote:
  Before enabling, execution of rdtscp in guest would result in #UD.
 
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
 
  Reflect guest CPUID on vmcs fields as well, but it involved some more
  code which would only executed once... Do we need a callback there for
  post-cpuid setting?
 
 I guess we do need a callback.
 
  -   /* Otherwise falls through to kvm_set_msr_common */
  +   ret = kvm_set_msr_common(vcpu, msr_index, data);
  +   break;
  +   case MSR_TSC_AUX:
  +   if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported())
  +   return 1;
  +   /* Check reserved bit */
  +   if ((data  0x) != 0)
  +   return 1;
 
 It's 0x...

... /me blame himself...
 
  @@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
  index = __find_msr_index(vmx, MSR_CSTAR);
  if (index= 0)
  move_msr_up(vmx, index, save_nmsrs++);
  +   index = __find_msr_index(vmx, MSR_TSC_AUX);
  +   if (index= 0)
  +   move_msr_up(vmx, index, save_nmsrs++);
 
 Only do this if rdtscp is enabled!

/me blame himself again...

-- 
regards
Yang, Sheng

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU

2009-12-17 Thread Sheng Yang
On Thursday 17 December 2009 18:32:08 Avi Kivity wrote:
 On 12/17/2009 11:32 AM, Sheng Yang wrote:
  shared_msr_global saved host value of relevant MSRs, but it have an
  assumption that all MSRs it tracked shared the value across the different
  CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.
 
  Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
  alike MSRs.
 
  Notice now the shared_msr_global still have one assumption: it can only
  deal with the MSRs that won't change in host after KVM module loaded.
 
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
 
  How about this?
 
  Move the all initialization to hardware_enable(). And only initialized
  once for each cpu.
 
 
  -void kvm_define_shared_msr(unsigned slot, u32 msr)
  +static void shared_msr_update(unsigned slot, u32 msr)
{
  -   int cpu;
  +   struct kvm_shared_msrs *smsr;
  u64 value;
 
  +   smsr =__get_cpu_var(shared_msrs);
  +   /* only read, and nobody should modify it at this time,
  +* so don't need lock */
  +   if (slot= shared_msrs_global.nr) {
  +   printk(KERN_ERR kvm: invalid MSR slot!);
  +   return;
  +   }
  +   if (smsr-values[slot].initialized)
  +   return;
 
 I don't think .initialized is worthwhile.  shared_msr_update is run very
 rarely.

The reason is, after cpu hotplug, the MSR_TSC_AUX would be rewritten by 
vsyscall_init again. But in the hotplug notifier chain, KVM has higher 
priority(20 vs 0 for vsyscall_init), so maybe the rdmsr() here would get a 
bogus value... Then I think prevent it from initializing again should be 
safer.

But I just think of another issue: if we hot plug in a cpu(without hot plug 
off), it would have a bogus value as well in the same path? Sound 
troublesome...

 
  +   smsr-values[slot].initialized = true;
  +   put_cpu_var(shared_msrs);
 
 If you use __get_cpu_var(), you need to remove put_cpu_var().
 

Sure...

-- 
regards
Yang, Sheng

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Device Assignment fixes

2009-12-17 Thread Alexander Graf
While trying out I stumbled across several issues in the device assignment
code.

This set addresses the most major ones. Namely allowing passthrough of non page
aligned BAR region (like on lpfc) and telling users what to do when their
device is in use.

Alexander Graf (3):
  Enable non page boundary BAR device assignment
  Split off sysfs id retrieval
  Inform users about busy device assignment attempt

 hw/device-assignment.c |  244 ++--
 1 files changed, 215 insertions(+), 29 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Split off sysfs id retrieval

2009-12-17 Thread Alexander Graf
To retreive device and vendor ID from a PCI device, we need to read a
sysfs file. That code is currently hand written at least two times,
the later patch introducing two more calls.

So let's move that out to a function.

Signed-off-by: Alexander Graf ag...@suse.de
---
 hw/device-assignment.c |   66 
 1 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 000fa61..566671c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -562,14 +562,46 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 return 0;
 }
 
+static int get_real_id(const char *devpath, const char *idname, uint16_t *val)
+{
+FILE *f;
+char name[128];
+long id;
+
+snprintf(name, sizeof(name), %s%s, devpath, idname);
+f = fopen(name, r);
+if (f == NULL) {
+fprintf(stderr, %s: %s: %m\n, __func__, name);
+return -1;
+}
+if (fscanf(f, %li\n, id) == 1) {
+*val = id;
+} else {
+return -1;
+}
+fclose(f);
+
+return 0;
+}
+
+static int get_real_vendor_id(const char *devpath, uint16_t *val)
+{
+return get_real_id(devpath, vendor, val);
+}
+
+static int get_real_device_id(const char *devpath, uint16_t *val)
+{
+return get_real_id(devpath, device, val);
+}
+
 static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
uint8_t r_dev, uint8_t r_func)
 {
 char dir[128], name[128];
-int fd, r = 0;
+int fd, r = 0, v;
 FILE *f;
 unsigned long long start, end, size, flags;
-unsigned long id;
+uint16_t id;
 struct stat statbuf;
 PCIRegion *rp;
 PCIDevRegions *dev = pci_dev-real_device;
@@ -635,31 +667,21 @@ again:
 
 fclose(f);
 
-/* read and fill device ID */
-snprintf(name, sizeof(name), %svendor, dir);
-f = fopen(name, r);
-if (f == NULL) {
-fprintf(stderr, %s: %s: %m\n, __func__, name);
+/* read and fill vendor ID */
+v = get_real_vendor_id(dir, id);
+if (v) {
 return 1;
 }
-if (fscanf(f, %li\n, id) == 1) {
-   pci_dev-dev.config[0] = id  0xff;
-   pci_dev-dev.config[1] = (id  0xff00)  8;
-}
-fclose(f);
+pci_dev-dev.config[0] = id  0xff;
+pci_dev-dev.config[1] = (id  0xff00)  8;
 
-/* read and fill vendor ID */
-snprintf(name, sizeof(name), %sdevice, dir);
-f = fopen(name, r);
-if (f == NULL) {
-fprintf(stderr, %s: %s: %m\n, __func__, name);
+/* read and fill device ID */
+v = get_real_device_id(dir, id);
+if (v) {
 return 1;
 }
-if (fscanf(f, %li\n, id) == 1) {
-   pci_dev-dev.config[2] = id  0xff;
-   pci_dev-dev.config[3] = (id  0xff00)  8;
-}
-fclose(f);
+pci_dev-dev.config[2] = id  0xff;
+pci_dev-dev.config[3] = (id  0xff00)  8;
 
 /* dealing with virtual function device */
 snprintf(name, sizeof(name), %sphysfn/, dir);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Enable non page boundary BAR device assignment

2009-12-17 Thread Alexander Graf
While trying to get device passthrough working with an emulex hba, kvm
refused to pass it through because it has a BAR of 256 bytes:

Region 0: Memory at d210 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at d2101000 (64-bit, non-prefetchable) [size=256]
Region 4: I/O ports at b100 [size=256]

Since the page boundary is an arbitrary optimization to allow 1:1 mapping of
physical to virtual addresses, we can still take the old MMIO callback route.

So let's add a second code path that allows for size  0xFFF != 0 sized regions
by looping it through userspace.

I verified that it works by passing through an e1000 with this additional patch
applied and the card acted the same way it did without this patch:

 map_func = assigned_dev_iomem_map;
-if (cur_region-size  0xFFF) {
+if (i != PCI_ROM_SLOT){
 fprintf(stderr, PCI region %d at address 0x%llx 

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - don't use map_func function pointer
  - use the same code for mmap on fast and slow path

v2 - v3:

  - address mst's comments
---
 hw/device-assignment.c |  117 +--
 1 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 13a86bb..000fa61 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -148,6 +148,105 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, 
uint32_t addr)
 return value;
 }
 
+static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr)
+{
+AssignedDevRegion *d = opaque;
+uint8_t *in = d-u.r_virtbase + addr;
+uint32_t r;
+
+r = *in;
+DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx  val=0x%08x\n, addr, r);
+
+return r;
+}
+
+static uint32_t slow_bar_readw(void *opaque, target_phys_addr_t addr)
+{
+AssignedDevRegion *d = opaque;
+uint16_t *in = d-u.r_virtbase + addr;
+uint32_t r;
+
+r = *in;
+DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx  val=0x%08x\n, addr, r);
+
+return r;
+}
+
+static uint32_t slow_bar_readl(void *opaque, target_phys_addr_t addr)
+{
+AssignedDevRegion *d = opaque;
+uint32_t *in = d-u.r_virtbase + addr;
+uint32_t r;
+
+r = *in;
+DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx  val=0x%08x\n, addr, r);
+
+return r;
+}
+
+static void slow_bar_writeb(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+{
+AssignedDevRegion *d = opaque;
+uint8_t *out = d-u.r_virtbase + addr;
+
+DEBUG(slow_bar_writeb addr=0x TARGET_FMT_plx  val=0x%02x\n, addr, val);
+*out = val;
+}
+
+static void slow_bar_writew(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+{
+AssignedDevRegion *d = opaque;
+uint16_t *out = d-u.r_virtbase + addr;
+
+DEBUG(slow_bar_writew addr=0x TARGET_FMT_plx  val=0x%04x\n, addr, val);
+*out = val;
+}
+
+static void slow_bar_writel(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+{
+AssignedDevRegion *d = opaque;
+uint32_t *out = d-u.r_virtbase + addr;
+
+DEBUG(slow_bar_writel addr=0x TARGET_FMT_plx  val=0x%08x\n, addr, val);
+*out = val;
+}
+
+static CPUWriteMemoryFunc * const slow_bar_write[] = {
+slow_bar_writeb,
+slow_bar_writew,
+slow_bar_writel
+};
+
+static CPUReadMemoryFunc * const slow_bar_read[] = {
+slow_bar_readb,
+slow_bar_readw,
+slow_bar_readl
+};
+
+static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, int region_num,
+pcibus_t e_phys, pcibus_t e_size,
+int type)
+{
+AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
+AssignedDevRegion *region = r_dev-v_addrs[region_num];
+PCIRegion *real_region = r_dev-real_device.regions[region_num];
+int m;
+
+DEBUG(slow map\n);
+m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region);
+cpu_register_physical_memory(e_phys, e_size, m);
+
+/* MSI-X MMIO page */
+if ((e_size  0) 
+real_region-base_addr = r_dev-msix_table_addr 
+real_region-base_addr + real_region-size = r_dev-msix_table_addr) {
+int offset = r_dev-msix_table_addr - real_region-base_addr;
+
+cpu_register_physical_memory(e_phys + offset,
+TARGET_PAGE_SIZE, r_dev-mmio_index);
+}
+}
+
 static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
pcibus_t e_phys, pcibus_t e_size, int type)
 {
@@ -381,15 +480,22 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 
 /* handle memory io regions */
 if (cur_region-type  IORESOURCE_MEM) {
+int slow_map = 0;
 int t = cur_region-type  IORESOURCE_PREFETCH
 ? PCI_BASE_ADDRESS_MEM_PREFETCH
 : PCI_BASE_ADDRESS_SPACE_MEMORY;
+
 if (cur_region-size  0xFFF) {
-fprintf(stderr, Unable to assign 

Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU

2009-12-17 Thread Avi Kivity

On 12/17/2009 05:01 PM, Sheng Yang wrote:



-void kvm_define_shared_msr(unsigned slot, u32 msr)
+static void shared_msr_update(unsigned slot, u32 msr)
   {
-   int cpu;
+   struct kvm_shared_msrs *smsr;
u64 value;

+   smsr =__get_cpu_var(shared_msrs);
+   /* only read, and nobody should modify it at this time,
+* so don't need lock */
+   if (slot= shared_msrs_global.nr) {
+   printk(KERN_ERR kvm: invalid MSR slot!);
+   return;
+   }
+   if (smsr-values[slot].initialized)
+   return;
   

I don't think .initialized is worthwhile.  shared_msr_update is run very
rarely.
 

The reason is, after cpu hotplug, the MSR_TSC_AUX would be rewritten by
vsyscall_init again. But in the hotplug notifier chain, KVM has higher
priority(20 vs 0 for vsyscall_init), so maybe the rdmsr() here would get a
bogus value... Then I think prevent it from initializing again should be
safer.

   


So let's raise vsyscall_init's priority?


But I just think of another issue: if we hot plug in a cpu(without hot plug
off), it would have a bogus value as well in the same path? Sound
troublesome...
   


Removing .initialized would take care of this issue.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Inform users about busy device assignment attempt

2009-12-17 Thread Alexander Graf
When using -pcidevice on a device that is already in use by a kernel driver
all the user gets is the following (very useful) information:

  Failed to assign device 04:00.0 : Device or resource busy
  Failed to deassign device 04:00.0 : Invalid argument
  Error initializing device pci-assign

Since I usually prefer to have my computer do the thinking for me, I figured
it might be a good idea to check and see if a device is actually used by a
driver. If so, tell the user.

So with this patch applied you get the following output:

  Failed to assign device 04:00.0 : Device or resource busy
  *** The driver 'igb' is occupying your device 04:00.0.
  ***
  *** You can try the following commands to free it:
  ***
  *** $ echo 8086 150a  /sys/bus/pci/drivers/pci-stub/new_id
  *** $ echo :04:00.0  /sys/bus/pci/drivers/igb/unbind
  *** $ echo :04:00.0  /sys/bus/pci/drivers/pci-stub/bind
  *** $ echo 8086 150a  /sys/bus/pci/drivers/pci-stub/remove_id
  ***
  Failed to deassign device 04:00.0 : Invalid argument
  Error initializing device pci-assign

That should keep people like me from doing the most obvious misuses :-).

CC: Daniel P. Berrange berra...@redhat.com
Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - add more helpful guidance thanks to Daniel Berrange

v2 - v3:

  - clear name variable before using it, thus 0-terminating the string
  - fix region numbers
  - use correct unbind/bind names

v3 - v4:

  - split id retrieval part
  - mst comments

v4 - v5:

  - address chris' comments
---
 hw/device-assignment.c |   61 ++-
 1 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 566671c..d6d44eb 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -756,6 +756,54 @@ static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t 
devfn)
 return (uint32_t)bus  8 | (uint32_t)devfn;
 }
 
+static void assign_failed_examine(AssignedDevice *dev)
+{
+char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
+uint16_t vendor_id, device_id;
+int r;
+
+/* XXX implement multidomain */
+sprintf(dir, /sys/bus/pci/devices/:%02x:%02x.%01x/,
+ dev-host.bus, dev-host.dev, dev-host.func);
+
+sprintf(name, %sdriver, dir);
+
+r = readlink(name, driver, sizeof(driver));
+if ((r = 0) || r = sizeof(driver) || !(ns = strrchr(driver, '/'))) {
+goto fail;
+}
+
+ns++;
+
+if (get_real_vendor_id(dir, vendor_id) ||
+get_real_device_id(dir, device_id)) {
+goto fail;
+}
+
+fprintf(stderr, *** The driver '%s' is occupying your device 
+%02x:%02x.%x.\n,
+ns, dev-host.bus, dev-host.dev, dev-host.func);
+fprintf(stderr, ***\n);
+fprintf(stderr, *** You can try the following commands to free it:\n);
+fprintf(stderr, ***\n);
+fprintf(stderr, *** $ echo \%04x %04x\  /sys/bus/pci/drivers/pci-stub/
+new_id\n, vendor_id, device_id);
+fprintf(stderr, *** $ echo \:%02x:%02x.%x\  /sys/bus/pci/drivers/
+%s/unbind\n,
+dev-host.bus, dev-host.dev, dev-host.func, ns);
+fprintf(stderr, *** $ echo \:%02x:%02x.%x\  /sys/bus/pci/drivers/
+pci-stub/bind\n,
+dev-host.bus, dev-host.dev, dev-host.func);
+fprintf(stderr, *** $ echo \%04x %04x\  /sys/bus/pci/drivers/pci-stub
+/remove_id\n, vendor_id, device_id);
+fprintf(stderr, ***\n);
+
+return;
+
+fail:
+fprintf(stderr, Couldn't find out why.\n);
+}
+
 static int assign_device(AssignedDevice *dev)
 {
 struct kvm_assigned_pci_dev assigned_dev_data;
@@ -781,9 +829,18 @@ static int assign_device(AssignedDevice *dev)
 #endif
 
 r = kvm_assign_pci_device(kvm_context, assigned_dev_data);
-if (r  0)
-   fprintf(stderr, Failed to assign device \%s\ : %s\n,
+if (r  0) {
+fprintf(stderr, Failed to assign device \%s\ : %s\n,
 dev-dev.qdev.id, strerror(-r));
+
+switch (r) {
+case -EBUSY:
+assign_failed_examine(dev);
+break;
+default:
+break;
+}
+}
 return r;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


wifi pci card issue in windows host.

2009-12-17 Thread Dmitri Seletski
Hello folks.
My goal is to make my pci based network card work under windows
guest.(obviously using windows drivers)

this is device i am interested in:
01:07.0 Network controller: Atheros Communications Inc. AR5416
802.11abgn Wireless PCI Adapter (rev 01)
Subsystem: D-Link System Inc Device 3a6b
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B+ DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 168, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f9ff (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] #00 []
Kernel driver in use: ath9k
Kernel modules: ath9k

uname -a: Linux dimko 2.6.32 #4 SMP PREEMPT Thu Dec 17 02:45:45 GMT 2009
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ AuthenticAMD GNU/Linux




below commands I have executed:
#echo 1186 3a6b  /sys/bus/pci/drivers/pci-stub/new_id
#echo :01:07.0  /sys/bus/pci/devices/\:01\:07.0/driver/unbind
#echo :01:07.0  /sys/bus/pci/drivers/pci-stub/bind

last command gives:  echo: write error: No such device
I have pci_stub module loaded.

and part of dmesg:
[ 8471.257299] wlan0: deauthenticating from 00:1e:58:b4:f6:83 by local
choice (reason=3)
[ 8471.537053] ath9k :01:07.0: PCI INT A disabled
Which i guess is ok.

than i load my guest with smth like: qemu xp32.img -pcidevice
host=01:07.0 -m 2048

I can see device in linux, i can even install driver for it, and there
are no other problems with that.
But device is not usable. Basically wireless device sees no networks,
even though my network has ESSID opened.
I was wondering if that previous echo :01:07.0 
/sys/bus/pci/drivers/pci-stub/bind failure can affect me?

Windows driver is freshly downloaded off dlink site. Access Point is
*/definitely/*
http://www.google.ie/search?hl=ensafe=offclient=firefox-arls=org.gentoo:en-US:officialhs=h3dei=NVcqS5TnA8be4gbw3siBCQsa=Xoi=spellresnum=0ct=resultcd=1ved=0CAYQBSgAq=definitelyspell=1
 
working, since i am connecting to it using native linux drivers.

Thanks in advance.
Dmitri

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel memory allocation bug in 2.6.27.32-2.6.27.41 kvm section

2009-12-17 Thread Oscon
Hello!

I can't register new account in bugzilla.kernel.org. / my ISP's spamfilter 
problem (?) maybe./

--

I sent this mail to Greg KH (2.6.27.y maintainer), he sent me: 

Can you get the kvm maintainers to agree that this is correct?  

thanks,

greg k-h

---
So the bug :

I found a memory allocation bug in kvm/mmu.c  kvm_main.c. /in 
kvm_destroy_vm()/

Affected kernel: 2.6.27.32-2.6.27.41

Mainline kernel (2.6.32) is not affected. (Modified kvm subsystem.)

Cause: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.27.y.git;a=commitdiff_plain;h=d2127c8300fb1ec54af56faee17170e7a525326d

Solution: Revert this patch.

This bug can cause local DoS in the host system.

---
Description:

After closing of kvm virtual machine kmv_destroy_vm() doens't free mmu_pages.
The allocated memory from host's system is LOST.

A script for demonstration is here:

www.freeweb.hu/oscon/kvm-memory-eater.sh.gz

WARNING: this script can cause local DoS.

description of script:

script starts a kvm with boot image file. After start this script wait 
10sec. -wait to boot guest processor (586+kernel image or a windows guest / 
486 kernel_image doesn't activate the kvm_amd for example and it doesn't 
allocate...it seems... /) to activates kvm_svm.

you can see the used memory of the host system
with top or kinfocenter or other..

After 10 sec this script killes the kvm userspace process - kvm_destroy_vm() 
is activated.

kvm_destroy_vm doesn't free mmu_pages so you lose ~500 mbyte memory from host 
system. /kvm -m 512/

this script starts a kvm again...and usw... :-) you lose all memory from host. 
from swap (partition or file) too.

--
comment:

oom_killer doesn't protect against this bug. 

Not kvm userspace process 
allocates the memory but the kernel module (by me: kvm_amd) so there isn't 
process to kill.
Not user initiated process allocates the memory so ulimit or limits.conf 
restrictions doesn't protect.

Remove kvm_amd and kvm modules from system doesn't help because: :-):

After killing the script and killing kvm process I tried rmmod kvm_amd :

Effect is: 

Dec 14 16:12:45 osconsfortress kernel: slab error in kmem_cache_destroy(): 
cache `kvm_pte_chain': Can't free all objects
Dec 14 16:12:45 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P        
  
2.6.27.41 #1
Dec 14 16:12:45 osconsfortress kernel:  [c02d7a27] ? printk+0x18/0x21
Dec 14 16:12:45 osconsfortress kernel:  [c017ed48] 
kmem_cache_destroy+0xb8/0xf0
Dec 14 16:12:45 osconsfortress kernel:  [f9885481] 
mmu_destroy_caches+0x11/0x30 [kvm]
Dec 14 16:12:45 osconsfortress kernel:  [f9885558] 
kvm_mmu_module_exit+0x8/0x20 [kvm]
Dec 14 16:12:45 osconsfortress kernel:  [f987f4c2] kvm_arch_exit+0x12/0x20 
[kvm]
Dec 14 16:12:45 osconsfortress kernel:  [f987b31a] kvm_exit+0x5a/0x80 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f98ce4d4] svm_exit+0x8/0xa 
[kvm_amd]
Dec 14 16:12:46 osconsfortress kernel:  [c014dafa] 
sys_delete_module+0x15a/0x200
Dec 14 16:12:46 osconsfortress kernel:  [c0171bc2] ? do_munmap+0x1d2/0x230
Dec 14 16:12:46 osconsfortress kernel:  [c0103271] 
sysenter_do_call+0x12/0x25
Dec 14 16:12:46 osconsfortress kernel:  ===
Dec 14 16:12:46 osconsfortress kernel: slab error in kmem_cache_destroy(): 
cache `kvm_rmap_desc': Can't free all objects
Dec 14 16:12:46 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P        
  
2.6.27.41 #1
Dec 14 16:12:46 osconsfortress kernel:  [c02d7a27] ? printk+0x18/0x21
Dec 14 16:12:46 osconsfortress kernel:  [c017ed48] 
kmem_cache_destroy+0xb8/0xf0
Dec 14 16:12:46 osconsfortress kernel:  [f988548f] 
mmu_destroy_caches+0x1f/0x30 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f9885558] 
kvm_mmu_module_exit+0x8/0x20 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f987f4c2] kvm_arch_exit+0x12/0x20 
[kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f987b31a] kvm_exit+0x5a/0x80 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f98ce4d4] svm_exit+0x8/0xa 
[kvm_amd]
Dec 14 16:12:46 osconsfortress kernel:  [c014dafa] 
sys_delete_module+0x15a/0x200
Dec 14 16:12:46 osconsfortress kernel:  [c0171bc2] ? do_munmap+0x1d2/0x230
Dec 14 16:12:46 osconsfortress kernel:  [c0103271] 
sysenter_do_call+0x12/0x25
Dec 14 16:12:46 osconsfortress kernel:  ===
Dec 14 16:12:46 osconsfortress kernel: slab error in kmem_cache_destroy(): 
cache `kvm_mmu_page_header': Can't free all objects
Dec 14 16:12:46 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P        
  
2.6.27.41 #1
Dec 14 16:12:46 osconsfortress kernel:  [c02d7a27] ? printk+0x18/0x21
Dec 14 16:12:46 osconsfortress kernel:  [c017ed48] 
kmem_cache_destroy+0xb8/0xf0
Dec 14 16:12:46 osconsfortress kernel:  [f988549d] 
mmu_destroy_caches+0x2d/0x30 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f9885558] 
kvm_mmu_module_exit+0x8/0x20 [kvm]
Dec 14 16:12:46 osconsfortress kernel:  [f987f4c2] kvm_arch_exit+0x12/0x20 
[kvm]
Dec 14 16:12:46 osconsfortress kernel:  

Re: network shutdown under heavy load

2009-12-17 Thread rek2



What's the exact guest kernel version? When the network is down,
please get onto the guest console to determine which direction
(if not both) of the network is not functioning.

You can run tcpdump in the guest/host and execute pings on both
sides to see which direction is blocked.

Cheers,
   

on the hosts:
uname -a
Linux   2.6.31-16-server #53-Ubuntu SMP Tue Dec 8 05:08:02 UTC 2009 
x86_64 GNU/Linux


I been told that today the network when down again and one of the guys 
here had to log using the console and restart it for that particular 
guests..


on the guest:
 uname -a
Linux  2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT 
2009 x86_64 x86_64 x86_64 GNU/Linux


Next time it goes down I will try to run a sniffer and try both sides.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix vhost ioctl handling for 32-bit

2009-12-17 Thread David Stevens
VHOST_GET_FEATURES returns high-order garbage on 32-bit
machines. This patch fixes it to use 64 bits throughout.

+-DLS

[in-line for viewing, attached to avoid whitespace mangling]

Signed-off-by: David L Stevens dlstev...@us.ibm.com

--- a/drivers/vhost/net.c   2009-11-17 22:51:56.0 -0800
+++ b/drivers/vhost/net.c   2009-12-17 11:31:51.0 -0800
@@ -563,7 +563,7 @@
 {
struct vhost_net *n = f-private_data;
void __user *argp = (void __user *)arg;
-   u32 __user *featurep = argp;
+   u64 __user *featurep = (u64 __user *)argp;
struct vhost_vring_file backend;
u64 features;
int r;
@@ -577,7 +577,7 @@
features = VHOST_FEATURES;
return put_user(features, featurep);
case VHOST_SET_FEATURES:
-   r = get_user(features, featurep);
+   r = copy_from_user(features, featurep, sizeof features);
/* No features for now */
if (r  0)
return r;



VH1.patch
Description: Binary data


Re: network shutdown under heavy load

2009-12-17 Thread Herbert Xu
On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote:

 I been told that today the network when down again and one of the guys  
 here had to log using the console and restart it for that particular  
 guests..

 on the guest:
  uname -a
 Linux  2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT  
 2009 x86_64 x86_64 x86_64 GNU/Linux

 Next time it goes down I will try to run a sniffer and try both sides.

OK I'm fairly sure this version has a buggy virtio-net.  Does
this patch (if it applies :) help?

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9eec5a5..74b3854 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data)
vi-svq-vq_ops-kick(vi-svq);
vi-last_xmit_skb = NULL;
}
-   if (vi-free_in_tasklet)
+   if (vi-free_in_tasklet) {
free_old_xmit_skbs(vi);
+   netif_wake_queue(vi-dev);
+   }
netif_tx_unlock_bh(vi-dev);
 }
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest PATCH] KVM test: Add a subtest vnc via which interacts with guest

2009-12-17 Thread Yolkfull Chow
Signed-off-by: Yolkfull Chow yz...@redhat.com
---
 client/tests/kvm/tests/vnc.py  |   24 
 client/tests/kvm/tests_base.cfg.sample |3 +++
 2 files changed, 27 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/vnc.py

diff --git a/client/tests/kvm/tests/vnc.py b/client/tests/kvm/tests/vnc.py
new file mode 100644
index 000..0f00379
--- /dev/null
+++ b/client/tests/kvm/tests/vnc.py
@@ -0,0 +1,24 @@
+import logging, pexpect
+from autotest_lib.client.common_lib import  error
+import kvm_test_utils, kvm_subprocess
+
+def run_vnc(test, params, env):
+
+Test whether guest could be interacted with vnc.
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+
+vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
+session = kvm_test_utils.wait_for_login(vm)
+
+# Start vnc connection test
+vnc_port = str(vm.vnc_port - 5900)
+vnc_cmd = vncviewer +  localhost: + vnc_port
+logging.debug(Using command to vnc connect: %s % vnc_cmd)
+
+p = kvm_subprocess.run_bg(vnc_cmd, None, logging.debug, (vnc) )
+if not p.is_alive():
+raise error.TestFail(Vnc connect to guest failed)
+p.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index a403399..0eaccae 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -270,6 +270,9 @@ variants:
 type = physical_resources_check
 catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
 
+- vnc: install setup unattended_install
+type = vnc
+
 # NICs
 variants:
 - @rtl8139:
-- 
1.6.5.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/2] Defer skb allocation in virtio_net recv

2009-12-17 Thread Shirley Ma
This patch-set has deferred virtio_net skb allocation in receiving path
for both big packets and mergeable buffers. It reduces skb
pre-allocations and skb frees. This patch-set also add a new API
detach_unused_bufs in virtio. Recv skb queue has been removed in
virtio_net. It is based on previous Rusty and Michaels' review, patch
has split into two:

[PATCH 1/2] virtio: Add detach unused buffer from vring
[PATCH 2/2] virtio_net: Defer skb allocation in receive path

I copied Rusty's comment as [PATCH 1/2] commit message.

This patch is built against Dave's net-next tree.

Thanks
Shirley

 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] virtio: Add detach unused buffer from vring

2009-12-17 Thread Shirley Ma
There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information.  So
add a new hook to do this: virtio_net will be the first user.

Signed-off-by: Shirley Ma x...@us.ibm.com
--- 
 drivers/virtio/virtio_ring.c |   25 +
 include/linux/virtio.h   |4 
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fbd2ecd..71929ee 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -334,6 +334,30 @@ static bool vring_enable_cb(struct virtqueue *_vq)
return true;
 }
 
+static void *vring_detach_unused_buf(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   unsigned int i;
+   void *buf;
+
+   START_USE(vq);
+
+   for (i = 0; i  vq-vring.num; i++) {
+   if (!vq-data[i])
+   continue;
+   /* detach_buf clears data, so grab it now. */
+   buf = vq-data[i];
+   detach_buf(vq, i);
+   END_USE(vq);
+   return buf;
+   }
+   /* That should have freed everything. */
+   BUG_ON(vq-num_free != vq-vring.num);
+
+   END_USE(vq);
+   return NULL;
+}
+
 irqreturn_t vring_interrupt(int irq, void *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
@@ -360,6 +384,7 @@ static struct virtqueue_ops vring_vq_ops = {
.kick = vring_kick,
.disable_cb = vring_disable_cb,
.enable_cb = vring_enable_cb,
+   .detach_unused_buf = vring_detach_unused_buf,
 };
 
 struct virtqueue *vring_new_virtqueue(unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 057a2e0..f508c65 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -51,6 +51,9 @@ struct virtqueue {
  * This re-enables callbacks; it returns false if there are pending
  * buffers in the queue, to detect a possible race between the driver
  * checking for more work, and enabling callbacks.
+ * @detach_unused_buf: detach first unused buffer
+ * vq: the struct virtqueue we're talking about.
+ * Returns NULL or the data token handed to add_buf
  *
  * Locking rules are straightforward: the driver is responsible for
  * locking.  No two operations may be invoked simultaneously, with the 
exception
@@ -71,6 +74,7 @@ struct virtqueue_ops {
 
void (*disable_cb)(struct virtqueue *vq);
bool (*enable_cb)(struct virtqueue *vq);
+   void *(*detach_unused_buf)(struct virtqueue *vq);
 };
 
 /**


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] virtio_net: Defer skb allocation in receive path

2009-12-17 Thread Shirley Ma
virtio_net receives packets from its pre-allocated vring buffers, then it 
delivers these packets to upper layer protocols as skb buffs. So it's not
necessary to pre-allocate skb for each mergable buffer, then frees extra 
skbs when buffers are merged into a large packet. This patch has deferred 
skb allocation in receiving packets for both big packets and mergeable buffers
to reduce skb pre-allocations and skb frees. It frees unused buffers by calling 
detach_unused_buf in vring, so recv skb queue is not needed.

Signed-off-by: Shirley Ma x...@us.ibm.com
---
 drivers/net/virtio_net.c |  426 +++---
 1 files changed, 247 insertions(+), 179 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c708ecc..bb57b96 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -56,8 +56,7 @@ struct virtnet_info
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
 
-   /* Receive  send queues. */
-   struct sk_buff_head recv;
+   /* Send queue. */
struct sk_buff_head send;
 
/* Work struct for refilling if we run low on memory. */
@@ -75,34 +74,44 @@ struct skb_vnet_hdr {
unsigned int num_sg;
 };
 
+struct padded_vnet_hdr {
+   struct virtio_net_hdr hdr;
+   /*
+* virtio_net_hdr should be in a separated sg buffer because of a
+* QEMU bug, and data sg buffer shares same page with this header sg.
+* This padding makes next sg 16 byte aligned after virtio_net_hdr.
+*/
+   char padding[6];
+};
+
 static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
 {
return (struct skb_vnet_hdr *)skb-cb;
 }
 
-static void give_a_page(struct virtnet_info *vi, struct page *page)
-{
-   page-private = (unsigned long)vi-pages;
-   vi-pages = page;
-}
-
-static void trim_pages(struct virtnet_info *vi, struct sk_buff *skb)
+/*
+ * private is used to chain pages for big packets, put the whole
+ * most recent used list in the beginning for reuse
+ */
+static void give_pages(struct virtnet_info *vi, struct page *page)
 {
-   unsigned int i;
+   struct page *end;
 
-   for (i = 0; i  skb_shinfo(skb)-nr_frags; i++)
-   give_a_page(vi, skb_shinfo(skb)-frags[i].page);
-   skb_shinfo(skb)-nr_frags = 0;
-   skb-data_len = 0;
+   /* Find end of list, sew whole thing into vi-pages. */
+   for (end = page; end-private; end = (struct page *)end-private);
+   end-private = (unsigned long)vi-pages;
+   vi-pages = page;
 }
 
 static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 {
struct page *p = vi-pages;
 
-   if (p)
+   if (p) {
vi-pages = (struct page *)p-private;
-   else
+   /* clear private here, it is used to chain pages */
+   p-private = 0;
+   } else
p = alloc_page(gfp_mask);
return p;
 }
@@ -118,99 +127,142 @@ static void skb_xmit_done(struct virtqueue *svq)
netif_wake_queue(vi-dev);
 }
 
-static void receive_skb(struct net_device *dev, struct sk_buff *skb,
-   unsigned len)
+static void set_skb_frag(struct sk_buff *skb, struct page *page,
+unsigned int offset, unsigned int *len)
 {
-   struct virtnet_info *vi = netdev_priv(dev);
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
-   int err;
-   int i;
+   int i = skb_shinfo(skb)-nr_frags;
+   skb_frag_t *f;
+
+   f = skb_shinfo(skb)-frags[i];
+   f-size = min((unsigned)PAGE_SIZE - offset, *len);
+   f-page_offset = offset;
+   f-page = page;
+
+   skb-data_len += f-size;
+   skb-len += f-size;
+   skb_shinfo(skb)-nr_frags++;
+   *len -= f-size;
+}
 
-   if (unlikely(len  sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
-   pr_debug(%s: short packet %i\n, dev-name, len);
-   dev-stats.rx_length_errors++;
-   goto drop;
-   }
+static struct sk_buff *page_to_skb(struct virtnet_info *vi,
+  struct page *page, unsigned int len)
+{
+   struct sk_buff *skb;
+   struct skb_vnet_hdr *hdr;
+   unsigned int copy, hdr_len, offset;
+   char *p;
 
-   if (vi-mergeable_rx_bufs) {
-   unsigned int copy;
-   char *p = page_address(skb_shinfo(skb)-frags[0].page);
+   p = page_address(page);
 
-   if (len  PAGE_SIZE)
-   len = PAGE_SIZE;
-   len -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   /* copy small packet so we can reuse these pages for small data */
+   skb = netdev_alloc_skb_ip_align(vi-dev, GOOD_COPY_LEN);
+   if (unlikely(!skb))
+   return NULL;
 
-   memcpy(hdr-mhdr, p, sizeof(hdr-mhdr));
-   p += sizeof(hdr-mhdr);
+   hdr = skb_vnet_hdr(skb);
 
-   copy = len;
-   

Re: [PATCH net-next 0/2] Defer skb allocation in virtio_net recv

2009-12-17 Thread Shirley Ma
Send skb queue can also be reduced with detach_unused_buf API by it is
not a part of this patch.

Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: add missing architectures

2009-12-17 Thread Michael S. Tsirkin
vhost is completely portable, but Kconfig include was missing for all
architectures besides x86, so it did not appear in the menu.  Add the
relevant Kconfig includes to all architectures that support
virtualization.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Hi Rusty,
please apply the following trivial fixup patch for 2.6.33.
Thanks!

 arch/ia64/kvm/Kconfig|1 +
 arch/powerpc/kvm/Kconfig |1 +
 arch/s390/kvm/Kconfig|1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index ef3e7be..01c7579 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -47,6 +47,7 @@ config KVM_INTEL
  Provides support for KVM on Itanium 2 processors equipped with the VT
  extensions.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..a1b4c5d 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -58,6 +58,7 @@ config KVM_E500
 
  If unsure, say N.
 
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index bf164fc..6be6aea 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
-- 
1.6.6.rc1.43.gf55cc
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html