[PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args

2009-01-07 Thread Avi Kivity
From: Mark McLoughlin mar...@redhat.com

Fixes:

  qemu-kvm.h:110: warning: function declaration isn’t a prototype

Signed-off-by: Mark McLoughlin mar...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/kvm-tpr-opt.c b/qemu/kvm-tpr-opt.c
index f2a3a1e..b3d26aa 100644
--- a/qemu/kvm-tpr-opt.c
+++ b/qemu/kvm-tpr-opt.c
@@ -370,7 +370,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 enable_vapic(env);
 }
 
-void kvm_tpr_opt_setup(CPUState *env)
+void kvm_tpr_opt_setup(void)
 {
 register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL);
 register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL);
diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h
index 896df4e..12bd5a0 100644
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -107,7 +107,7 @@ void qemu_kvm_aio_wait_end(void);
 
 void qemu_kvm_notify_work(void);
 
-void kvm_tpr_opt_setup();
+void kvm_tpr_opt_setup(void);
 void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
 int handle_tpr_access(void *opaque, int vcpu,
 uint64_t rip, int is_write);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: mmu_notifiers release method

2009-01-07 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

The destructor for huge pages uses the backing inode for adjusting
hugetlbfs accounting.

Hugepage mappings are destroyed by exit_mmap, after
mmu_notifier_release, so there are no notifications through
unmap_hugepage_range at this point.

The hugetlbfs inode can be freed with pages backed by it referenced
by the shadow. When the shadow releases its reference, the huge page
destructor will access a now freed inode.

Implement the release operation for kvm mmu notifiers to release page
refs before the hugetlbfs inode is gone.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 785c1e3..103bc08 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -814,11 +814,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
mmu_notifier *mn,
return young;
 }
 
+static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
+struct mm_struct *mm)
+{
+   struct kvm *kvm = mmu_notifier_to_kvm(mn);
+   kvm_arch_flush_shadow(kvm);
+}
+
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
.invalidate_page= kvm_mmu_notifier_invalidate_page,
.invalidate_range_start = kvm_mmu_notifier_invalidate_range_start,
.invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
.clear_flush_young  = kvm_mmu_notifier_clear_flush_young,
+   .release= kvm_mmu_notifier_release,
 };
 #endif /* CONFIG_MMU_NOTIFIER  KVM_ARCH_WANT_MMU_NOTIFIER */
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM host kernel hang

2009-01-07 Thread Alexander Graf
Hi,

while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
some KVM code seems to be stuck in an endless loop. The qemu process
hangs, I can't attach gdb to it and the kernel module seems to be
hanging in a place where I don't see any looping code. One CPU is
definitely stuck in sys at 100% though.

This is running git as of yesterday with some minor ESX modifications
that should not touch any of these parts (userspace and MSRs).

Maybe one of you guys has a clue what's going on here. You'll find a
snippet of a t-sysrq trace with all qemu relevant parts below. The
registers (incl. IP) of these don't change over time.

Alex

qemu-system-x D 810001025280 0 27900   9501
 8101000e5c58 0082  8101000e5c1c
 81011446e728 807e6280 807e6280 8100388ca680
 80601890 8100388ca9c0 00200200 8100388ca9c0
Call Trace:
 [804485ec] __mutex_lock_slowpath+0x72/0xa9
 [8044847a] mutex_lock+0x1e/0x22
 [88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
 [88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777
 [802acada] vfs_ioctl+0x2a/0x78
 [802acd6f] do_vfs_ioctl+0x247/0x261
 [802acdde] sys_ioctl+0x55/0x77
 [8020bffa] system_call_after_swapgs+0x8a/0x8f
 [7f2f3b15eb67]

qemu-system-x R  running task0 27908   9501
  88d7d3ad 0390 810100120040
 810116491000 fee00390  
 81011b361d08 88d7f1fb  0001
Call Trace:
Inexact backtrace:

 [88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
 [88d7f1fb] :kvm:emulate_instruction+0x199/0x266
 [88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86
 [88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1
 [88a3e1b4] :kvm_amd:handle_exit+0x218/0x221
 [88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
 [88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485
 [802acada] vfs_ioctl+0x2a/0x78
 [802acd6f] do_vfs_ioctl+0x247/0x261
 [802a13a3] fget_light+0x1/0x83
 [802acdde] sys_ioctl+0x55/0x77
 [802a0b48] sys_writev+0x60/0x94
 [8020bffa] system_call_after_swapgs+0x8a/0x8f




dmesg.kvm.gz
Description: GNU Zip compressed data


[PATCH] CPUID Masking MSRs

2009-01-07 Thread Alexander Graf
Current AMD CPUs support masking of CPUID bits. Using this functionality,
a VMM can limit what features are exposed to the guest, even if it's not
using SVM/VMX.

While I'm not aware of any open source hypervisor that uses these MSRs
atm, VMware ESX does and patches exist for Xen, where trapping CPUID is
non-trivial.

This patch implements emulation for this masking, which is pretty trivial
because we're intercepting CPUID anyways.

Because it's so simple and can be pretty effective, I put it into the
generic code paths, so VMX benefits from it as well.

Signed-off-by: Alexander Graf ag...@suse.de

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 863ea73..e2f0dde 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -370,6 +370,9 @@ struct kvm_vcpu_arch {
unsigned long dr6;
unsigned long dr7;
unsigned long eff_db[KVM_NR_DB_REGS];
+
+   u64 cpuid_mask;
+   u64 cpuid_mask_ext;
 };
 
 struct kvm_mem_alias {
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1890032..03b53ba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -337,5 +337,7 @@
 
 #define MSR_VM_CR   0xc0010114
 #define MSR_VM_HSAVE_PA 0xc0010117
+#define MSR_VM_MASK_CPUID   0xc0011004
+#define MSR_VM_MASK_CPUID_EXT   0xc0011005
 
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 18bba94..83b4877 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -782,6 +784,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
data)
kvm_write_guest_time(vcpu);
break;
}
+   case MSR_VM_MASK_CPUID:
+   vcpu-arch.cpuid_mask = data;
+   break;
+   case MSR_VM_MASK_CPUID_EXT:
+   vcpu-arch.cpuid_mask_ext = data;
+   break;
default:
pr_unimpl(vcpu, unhandled wrmsr: 0x%x data %llx\n, msr, data);
return 1;
@@ -896,6 +904,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
case MSR_KVM_SYSTEM_TIME:
data = vcpu-arch.time;
break;
+   case MSR_VM_MASK_CPUID:
+   data = vcpu-arch.cpuid_mask;
+   break;
+   case MSR_VM_MASK_CPUID_EXT:
+   data = vcpu-arch.cpuid_mask_ext;
+   break;
default:
pr_unimpl(vcpu, unhandled rdmsr: 0x%x\n, msr);
return 1;
@@ -2901,10 +2915,19 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
kvm_register_write(vcpu, VCPU_REGS_RDX, 0);
best = kvm_find_cpuid_entry(vcpu, function, index);
if (best) {
+   u32 ecx = best-ecx;
+   u32 edx = best-edx;
kvm_register_write(vcpu, VCPU_REGS_RAX, best-eax);
kvm_register_write(vcpu, VCPU_REGS_RBX, best-ebx);
-   kvm_register_write(vcpu, VCPU_REGS_RCX, best-ecx);
-   kvm_register_write(vcpu, VCPU_REGS_RDX, best-edx);
+   if ( function == 1 ) {
+   ecx = (u32)vcpu-arch.cpuid_mask;
+   edx = (u32)(vcpu-arch.cpuid_mask  32);
+   } else if ( function == 0x8001 ) {
+   ecx = (u32)vcpu-arch.cpuid_mask_ext;
+   edx = (u32)(vcpu-arch.cpuid_mask_ext  32);
+   }
+   kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
}
kvm_x86_ops-skip_emulated_instruction(vcpu);
KVMTRACE_5D(CPUID, vcpu, function,
@@ -4089,6 +4112,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
memset(vcpu-arch.db, 0, sizeof(vcpu-arch.db));
vcpu-arch.dr6 = DR6_FIXED_1;
vcpu-arch.dr7 = DR7_FIXED_1;
+   vcpu-arch.cpuid_mask = 0x;
+   vcpu-arch.cpuid_mask_ext = 0x;
 
return kvm_x86_ops-vcpu_reset(vcpu);
 }
-- 
1.5.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: compat: define marker_synchronize_unregister on older kernels

2009-01-07 Thread Avi Kivity

Eduardo Habkost wrote:

marker_synchronize_unregister() is available only on 2.6.28. However,
its definition is very simple, so we can define it if it is missing.

This fixes compilation of kvm_trace.c against older kernels.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y

2009-01-07 Thread Avi Kivity

Marcelo Tosatti wrote:

Ok, the bug seems to be gone now. Avi, can you apply the kernel patch
please?


Done.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CPUID Masking MSRs

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:

Current AMD CPUs support masking of CPUID bits. Using this functionality,
a VMM can limit what features are exposed to the guest, even if it's not
using SVM/VMX.

While I'm not aware of any open source hypervisor that uses these MSRs
atm, VMware ESX does and patches exist for Xen, where trapping CPUID is
non-trivial.

This patch implements emulation for this masking, which is pretty trivial
because we're intercepting CPUID anyways.

Because it's so simple and can be pretty effective, I put it into the
generic code paths, so VMX benefits from it as well.

  


Missing save/restore support.

Note that Intel has similar functionality, called FlexMigration IIRC, 
likely using different MSRs.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CPUID Masking MSRs

2009-01-07 Thread Alexander Graf


On 07.01.2009, at 11:07, Avi Kivity wrote:


Alexander Graf wrote:
Current AMD CPUs support masking of CPUID bits. Using this  
functionality,
a VMM can limit what features are exposed to the guest, even if  
it's not

using SVM/VMX.

While I'm not aware of any open source hypervisor that uses these  
MSRs
atm, VMware ESX does and patches exist for Xen, where trapping  
CPUID is

non-trivial.

This patch implements emulation for this masking, which is pretty  
trivial

because we're intercepting CPUID anyways.

Because it's so simple and can be pretty effective, I put it into the
generic code paths, so VMX benefits from it as well.




Missing save/restore support.


Right. I keep forgetting about that one ;-).

Note that Intel has similar functionality, called FlexMigration  
IIRC, likely using different MSRs.


Hum. I'll take a look at it to see if that's as easy to implement then.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args

2009-01-07 Thread Avi Kivity

Mark McLoughlin wrote:

Fixes:

  qemu-kvm.h:110: warning: function declaration isn’t a prototype

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CPUID Masking MSRs

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:
Note that Intel has similar functionality, called FlexMigration IIRC, 
likely using different MSRs.


Hum. I'll take a look at it to see if that's as easy to implement then.


It's probably easy (well supporting both might be tricky) but if you 
don't have a real test case then it's best to wait with it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args

2009-01-07 Thread Mark McLoughlin
Fixes:

  qemu-kvm.h:110: warning: function declaration isn’t a prototype

Signed-off-by: Mark McLoughlin mar...@redhat.com
---
 qemu/kvm-tpr-opt.c |2 +-
 qemu/qemu-kvm.h|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu/kvm-tpr-opt.c b/qemu/kvm-tpr-opt.c
index f2a3a1e..b3d26aa 100644
--- a/qemu/kvm-tpr-opt.c
+++ b/qemu/kvm-tpr-opt.c
@@ -370,7 +370,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 enable_vapic(env);
 }
 
-void kvm_tpr_opt_setup(CPUState *env)
+void kvm_tpr_opt_setup(void)
 {
 register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL);
 register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL);
diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h
index 896df4e..12bd5a0 100644
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -107,7 +107,7 @@ void qemu_kvm_aio_wait_end(void);
 
 void qemu_kvm_notify_work(void);
 
-void kvm_tpr_opt_setup();
+void kvm_tpr_opt_setup(void);
 void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
 int handle_tpr_access(void *opaque, int vcpu,
 uint64_t rip, int is_write);
-- 
1.6.0.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:

Hi,

while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
some KVM code seems to be stuck in an endless loop. The qemu process
hangs, I can't attach gdb to it and the kernel module seems to be
hanging in a place where I don't see any looping code. One CPU is
definitely stuck in sys at 100% though.

This is running git as of yesterday with some minor ESX modifications
that should not touch any of these parts (userspace and MSRs).

Maybe one of you guys has a clue what's going on here. You'll find a
snippet of a t-sysrq trace with all qemu relevant parts below. The
registers (incl. IP) of these don't change over time.

Alex

qemu-system-x D 810001025280 0 27900   9501
 8101000e5c58 0082  8101000e5c1c
 81011446e728 807e6280 807e6280 8100388ca680
 80601890 8100388ca9c0 00200200 8100388ca9c0
Call Trace:
 [804485ec] __mutex_lock_slowpath+0x72/0xa9
 [8044847a] mutex_lock+0x1e/0x22
 [88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
 [88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777
 [802acada] vfs_ioctl+0x2a/0x78
 [802acd6f] do_vfs_ioctl+0x247/0x261
 [802acdde] sys_ioctl+0x55/0x77
 [8020bffa] system_call_after_swapgs+0x8a/0x8f
 [7f2f3b15eb67]

  


Waiting for kvm-lock, so can't kill or strace.


qemu-system-x R  running task0 27908   9501
  88d7d3ad 0390 810100120040
 810116491000 fee00390  
 81011b361d08 88d7f1fb  0001
Call Trace:
Inexact backtrace:

 [88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
 [88d7f1fb] :kvm:emulate_instruction+0x199/0x266
 [88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86
 [88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1
 [88a3e1b4] :kvm_amd:handle_exit+0x218/0x221
 [88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
 [88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485
 [802acada] vfs_ioctl+0x2a/0x78
 [802acd6f] do_vfs_ioctl+0x247/0x261
 [802a13a3] fget_light+0x1/0x83
 [802acdde] sys_ioctl+0x55/0x77
 [802a0b48] sys_writev+0x60/0x94
 [8020bffa] system_call_after_swapgs+0x8a/0x8f
  


But the mutex is not taken here.  Looks like we lost it, maybe 
CONFIG_LOCKDEP can find out where.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:

Using this patch it works. But if I read it correctly, that doesn't
actually fix anything but only treats NPT/EPT special, which it
shouldn't, should it? 


The patch doesn't fix the bug but is nevertheless correct.  cr4.pge only 
matters to the mmu if using the shadow mmu; with tdp it only wastes 
memory (and exposes the bug which you encountered).


So, wrt to the bug you saw, it's a workaround, but it's also a correct 
fix for another bug.



Maybe this actually even breaks EPT?
  


It shouldn't.


I remember having seen a lot of CR4 hacks in svm.c when npt is enabled.
Maybe that is related?
  


No.  cr4 controls the guest mmu, but with npt the guest mmu is 
completely virtualized, so we need to ignore those bits.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] KVM: Unified the delivery of IOAPIC and MSI

2009-01-07 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |3 ++
 virt/kvm/ioapic.c|   84 
 virt/kvm/irq_comm.c  |   86 --
 3 files changed, 86 insertions(+), 87 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bfdaab9..2736dbf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -351,6 +351,9 @@ struct kvm_gsi_route_entry {
struct hlist_node link;
 };
 
+void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
+  union kvm_ioapic_redirect_entry *entry,
+  u32 *deliver_bitmask);
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index b6530e9..951df12 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -200,75 +200,53 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
 
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
-   u8 dest = ioapic-redirtbl[irq].fields.dest_id;
-   u8 dest_mode = ioapic-redirtbl[irq].fields.dest_mode;
-   u8 delivery_mode = ioapic-redirtbl[irq].fields.delivery_mode;
-   u8 vector = ioapic-redirtbl[irq].fields.vector;
-   u8 trig_mode = ioapic-redirtbl[irq].fields.trig_mode;
+   union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq];
u32 deliver_bitmask;
struct kvm_vcpu *vcpu;
int vcpu_id, r = 0;
 
ioapic_debug(dest=%x dest_mode=%x delivery_mode=%x 
 vector=%x trig_mode=%x\n,
-dest, dest_mode, delivery_mode, vector, trig_mode);
+entry.fields.dest, entry.fields.dest_mode,
+entry.fields.delivery_mode, entry.fields.vector,
+entry.fields.trig_mode);
 
-   deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, dest,
- dest_mode);
+   kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask);
if (!deliver_bitmask) {
ioapic_debug(no target on destination\n);
return 0;
}
 
-   switch (delivery_mode) {
-   case IOAPIC_LOWEST_PRIORITY:
-   vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector,
-   deliver_bitmask);
+   /* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
-   if (irq == 0)
-   vcpu = ioapic-kvm-vcpus[0];
+   if (irq == 0)
+   deliver_bitmask = 1  0;
 #endif
-   if (vcpu != NULL)
-   r = ioapic_inj_irq(ioapic, vcpu, vector,
-  trig_mode, delivery_mode);
-   else
-   ioapic_debug(null lowest prio vcpu: 
-mask=%x vector=%x delivery_mode=%x\n,
-deliver_bitmask, vector, 
IOAPIC_LOWEST_PRIORITY);
-   break;
-   case IOAPIC_FIXED:
-#ifdef CONFIG_X86
-   if (irq == 0)
-   deliver_bitmask = 1;
-#endif
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask  (1  vcpu_id)))
-   continue;
-   deliver_bitmask = ~(1  vcpu_id);
-   vcpu = ioapic-kvm-vcpus[vcpu_id];
-   if (vcpu) {
-   r = ioapic_inj_irq(ioapic, vcpu, vector,
-  trig_mode, delivery_mode);
-   }
-   }
-   break;
-   case IOAPIC_NMI:
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask  (1  vcpu_id)))
-   continue;
-   deliver_bitmask = ~(1  vcpu_id);
-   vcpu = ioapic-kvm-vcpus[vcpu_id];
-   if (vcpu)
+
+   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+   if (!(deliver_bitmask  (1  vcpu_id)))
+   continue;
+   deliver_bitmask = ~(1  vcpu_id);
+   vcpu = ioapic-kvm-vcpus[vcpu_id];
+   if (vcpu) {
+   if (entry.fields.delivery_mode ==
+   IOAPIC_LOWEST_PRIORITY ||
+   entry.fields.delivery_mode == IOAPIC_FIXED)
+   r = ioapic_inj_irq(ioapic, vcpu,
+  entry.fields.vector,
+  entry.fields.trig_mode,
+  

[PATCH 01/10] KVM: Add a route layer to convert MSI message to GSI

2009-01-07 Thread Sheng Yang
Avi's purpose, to use single kvm_set_irq() to deal with all interrupt, including
MSI. So here is it.

struct gsi_route_entry is a mapping from a special gsi(with KVM_GSI_MSG_MASK) to
MSI/MSI-X message address/data. And the struct can also be extended for other
purpose.

Now we support up to 256 gsi_route_entry mapping, and gsi is allocated by 
kernel and
provide two ioctls to userspace, which is more flexiable.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm.h  |   26 +++
 include/linux/kvm_host.h |   20 +
 virt/kvm/irq_comm.c  |   70 ++
 virt/kvm/kvm_main.c  |  106 ++
 4 files changed, 222 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71c150f..bbefce6 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -399,6 +399,9 @@ struct kvm_trace_rec {
 #if defined(CONFIG_X86)
 #define KVM_CAP_REINJECT_CONTROL 24
 #endif
+#if defined(CONFIG_X86)
+#define KVM_CAP_GSI_ROUTE 25
+#endif
 
 /*
  * ioctls for VM fds
@@ -433,6 +436,8 @@ struct kvm_trace_rec {
 #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
struct kvm_assigned_irq)
 #define KVM_REINJECT_CONTROL  _IO(KVMIO, 0x71)
+#define KVM_REQUEST_GSI_ROUTE_IOWR(KVMIO, 0x72, void *)
+#define KVM_FREE_GSI_ROUTE   _IOR(KVMIO, 0x73, void *)
 
 /*
  * ioctls for vcpu fds
@@ -553,4 +558,25 @@ struct kvm_assigned_irq {
 #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION  KVM_DEV_IRQ_ASSIGN_ENABLE_MSI
 #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI  (1  0)
 
+struct kvm_gsi_route_guest {
+   __u32 entries_nr;
+   struct kvm_gsi_route_entry_guest *entries;
+};
+
+#define KVM_GSI_ROUTE_MSI  (1  0)
+struct kvm_gsi_route_entry_guest {
+   __u32 gsi;
+   __u32 type;
+   __u32 flags;
+   __u32 reserved;
+   union {
+   struct {
+   __u32 addr_lo;
+   __u32 addr_hi;
+   __u32 data;
+   } msi;
+   __u32 padding[8];
+   };
+};
+
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a8bcad0..6a00201 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm {
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
 #endif
+   struct hlist_head gsi_route_list;
+#define KVM_NR_GSI_ROUTE_ENTRIES256
+   DECLARE_BITMAP(gsi_route_bitmap, KVM_NR_GSI_ROUTE_ENTRIES);
 };
 
 /* The guest did something we don't support. */
@@ -336,6 +339,19 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int 
irq,
  struct kvm_irq_mask_notifier *kimn);
 void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
 
+#define KVM_GSI_ROUTE_MASK0x100ull
+struct kvm_gsi_route_entry {
+   u32 gsi;
+   u32 type;
+   u32 flags;
+   u32 reserved;
+   union {
+   struct msi_msg msi;
+   u32 reserved[8];
+   };
+   struct hlist_node link;
+};
+
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
@@ -343,6 +359,10 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
 void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry);
+struct kvm_gsi_route_entry *kvm_find_gsi_route_entry(struct kvm *kvm, u32 gsi);
+void kvm_free_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry);
+void kvm_free_gsi_route_list(struct kvm *kvm);
 
 #ifdef CONFIG_DMAR
 int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 5162a41..7460e7f 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -123,3 +123,73 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, 
bool mask)
kimn-func(kimn, mask);
 }
 
+int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry)
+{
+   struct kvm_gsi_route_entry *found_entry, *new_entry;
+   int r, gsi;
+
+   mutex_lock(kvm-lock);
+   /* Find whether we need a update or a new entry */
+   found_entry = kvm_find_gsi_route_entry(kvm, entry-gsi);
+   if (found_entry)
+   *found_entry = *entry;
+   else {
+   gsi = find_first_zero_bit(kvm-gsi_route_bitmap,
+ KVM_NR_GSI_ROUTE_ENTRIES);
+   if (gsi = KVM_NR_GSI_ROUTE_ENTRIES) {
+   r = -ENOSPC;
+   goto out;
+   }
+   __set_bit(gsi, kvm-gsi_route_bitmap);
+   entry-gsi = gsi | KVM_GSI_ROUTE_MASK;
+   

Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings

2009-01-07 Thread Marcelo Tosatti
On Wed, Jan 07, 2009 at 12:19:26PM +0200, Avi Kivity wrote:
 Alexander Graf wrote:
 Using this patch it works. But if I read it correctly, that doesn't
 actually fix anything but only treats NPT/EPT special, which it
 shouldn't, should it? 

 The patch doesn't fix the bug but is nevertheless correct.  cr4.pge only  
 matters to the mmu if using the shadow mmu; with tdp it only wastes  
 memory (and exposes the bug which you encountered).

 So, wrt to the bug you saw, it's a workaround, but it's also a correct  
 fix for another bug.

 Maybe this actually even breaks EPT?
   

 It shouldn't.

 I remember having seen a lot of CR4 hacks in svm.c when npt is enabled.
 Maybe that is related?
   

 No.  cr4 controls the guest mmu, but with npt the guest mmu is  
 completely virtualized, so we need to ignore those bits.

Let me shoot at one direction: a shadow page with PGE bit in either
state is created. Later that shadow page is nuked (via mmu notifiers,
for example). Then set_cr4 changes base_role.pge to a different value,
and a fault creates a new shadow page and instantiates that in the tree.

Perhaps a svm_flush_tlb is required in such case, when updating a
previously valid pagetable entry? Joerg?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6][v3] Userspace support for MSI

2009-01-07 Thread Sheng Yang
Update from v2:
Change API to gsi_route.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] kvm: Replace force type convert with container_of()

2009-01-07 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu/hw/device-assignment.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index d5eb7b2..f357d17 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -144,7 +144,7 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, 
uint32_t addr)
 static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
uint32_t e_phys, uint32_t e_size, int type)
 {
-AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 uint32_t old_ephys = region-e_physbase;
 uint32_t old_esize = region-e_size;
@@ -178,7 +178,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
 uint32_t addr, uint32_t size, int type)
 {
-AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 int first_map = (region-e_size == 0);
 CPUState *env;
@@ -227,6 +227,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
 {
 int fd;
 ssize_t ret;
+AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
 DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n,
   ((d-devfn  3)  0x1F), (d-devfn  0x7),
@@ -248,7 +249,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
   ((d-devfn  3)  0x1F), (d-devfn  0x7),
   (uint16_t) address, val, len);
 
-fd = ((AssignedDevice *)d)-real_device.config_fd;
+fd = pci_dev-real_device.config_fd;
 
 again:
 ret = pwrite(fd, val, len, address);
@@ -269,6 +270,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 uint32_t val = 0;
 int fd;
 ssize_t ret;
+AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
 if ((address = 0x10  address = 0x24) || address == 0x34 ||
 address == 0x3c || address == 0x3d) {
@@ -282,7 +284,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 if (address == 0xFC)
 goto do_log;
 
-fd = ((AssignedDevice *)d)-real_device.config_fd;
+fd = pci_dev-real_device.config_fd;
 
 again:
 ret = pread(fd, val, len, address);
@@ -539,16 +541,18 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo 
*adev, PCIBus *bus)
 {
 int r;
 AssignedDevice *dev;
+PCIDevice *pci_dev;
 uint8_t e_device, e_intx;
 struct kvm_assigned_pci_dev assigned_dev_data;
 
 DEBUG(Registering real physical device %s (bus=%x dev=%x func=%x)\n,
   adev-name, adev-bus, adev-dev, adev-func);
 
-dev = (AssignedDevice *)
-pci_register_device(bus, adev-name, sizeof(AssignedDevice),
--1, assigned_dev_pci_read_config,
-assigned_dev_pci_write_config);
+pci_dev = pci_register_device(bus, adev-name,
+  sizeof(AssignedDevice), -1, assigned_dev_pci_read_config,
+  assigned_dev_pci_write_config);
+dev = container_of(pci_dev, AssignedDevice, dev);
+
 if (NULL == dev) {
 fprintf(stderr, %s: Error: Couldn't register real device %s\n,
 __func__, adev-name);
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] Support for device capability

2009-01-07 Thread Sheng Yang
This framework can be easily extended to support device capability, like
MSI/MSI-x.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu/hw/pci.c |   85 +
 qemu/hw/pci.h |   30 
 2 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 8589dfa..d755516 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -351,11 +351,65 @@ static void pci_update_mappings(PCIDevice *d)
 }
 }
 
+int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
+{
+if (pci_dev-cap.supported  address = pci_dev-cap.start 
+(address + len)  pci_dev-cap.start + pci_dev-cap.length)
+return 1;
+return 0;
+}
+
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+ uint32_t address, int len)
+{
+uint32_t val = 0;
+
+if (pci_access_cap_config(pci_dev, address, len)) {
+switch(len) {
+default:
+case 4:
+if (address  pci_dev-cap.start + pci_dev-cap.length - 4) {
+val = le32_to_cpu(*(uint32_t *)(pci_dev-cap.config
++ address - pci_dev-cap.start));
+break;
+}
+/* fall through */
+case 2:
+if (address  pci_dev-cap.start + pci_dev-cap.length - 2) {
+val = le16_to_cpu(*(uint16_t *)(pci_dev-cap.config
++ address - pci_dev-cap.start));
+break;
+}
+/* fall through */
+case 1:
+val = pci_dev-cap.config[address - pci_dev-cap.start];
+break;
+}
+}
+return val;
+}
+
+void pci_default_cap_write_config(PCIDevice *pci_dev,
+  uint32_t address, uint32_t val, int len)
+{
+if (pci_access_cap_config(pci_dev, address, len)) {
+int i;
+for (i = 0; i  len; i++) {
+pci_dev-cap.config[address + i - pci_dev-cap.start] = val;
+val = 8;
+}
+return;
+}
+}
+
 uint32_t pci_default_read_config(PCIDevice *d,
  uint32_t address, int len)
 {
 uint32_t val;
 
+if (pci_access_cap_config(d, address, len))
+return d-cap.config_read(d, address, len);
+
 switch(len) {
 default:
 case 4:
@@ -409,6 +463,11 @@ void pci_default_write_config(PCIDevice *d,
 return;
 }
  default_config:
+if (pci_access_cap_config(d, address, len)) {
+d-cap.config_write(d, address, val, len);
+return;
+}
+
 /* not efficient, but simple */
 addr = address;
 for(i = 0; i  len; i++) {
@@ -828,3 +887,29 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint32_t 
id,
 s-bus = pci_register_secondary_bus(s-dev, map_irq);
 return s-bus;
 }
+
+void pci_enable_capability_support(PCIDevice *pci_dev,
+   uint32_t config_start,
+   PCICapConfigReadFunc *config_read,
+   PCICapConfigWriteFunc *config_write,
+   PCICapConfigInitFunc *config_init)
+{
+if (!pci_dev)
+return;
+
+if (config_start = 0x40  config_start  0xff)
+pci_dev-cap.start = config_start;
+else
+pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
+if (config_read)
+pci_dev-cap.config_read = config_read;
+else
+pci_dev-cap.config_read = pci_default_cap_read_config;
+if (config_write)
+pci_dev-cap.config_write = config_write;
+else
+pci_dev-cap.config_write = pci_default_cap_write_config;
+pci_dev-cap.supported = 1;
+pci_dev-config[0x34] = pci_dev-cap.start;
+config_init(pci_dev);
+}
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 1f33819..f2a622c 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -28,6 +28,12 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int 
region_num,
 uint32_t addr, uint32_t size, int type);
 typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
 
+typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
+   uint32_t address, uint32_t val, int len);
+typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
+  uint32_t address, int len);
+typedef void PCICapConfigInitFunc(PCIDevice *pci_dev);
+
 #define PCI_ADDRESS_SPACE_MEM  0x00
 #define PCI_ADDRESS_SPACE_IO   0x01
 #define PCI_ADDRESS_SPACE_MEM_PREFETCH 0x08
@@ -78,6 +84,10 @@ typedef struct PCIIORegion {
 
 #define PCI_COMMAND_RESERVED_MASK_HI (PCI_COMMAND_RESERVED  8)
 
+#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60
+#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40
+#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
+
 struct PCIDevice {
 /* PCI config space */
 uint8_t config[256];
@@ -100,6 +110,15 @@ struct PCIDevice {
 
 /* 

[PATCH 6/6] kvm: expose MSI capability to guest

2009-01-07 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu/hw/device-assignment.c |  111 ---
 qemu/hw/device-assignment.h |7 +++
 2 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 169357f..4c08b00 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -268,7 +268,8 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
 }
 
 if ((address = 0x10  address = 0x24) || address == 0x34 ||
-address == 0x3c || address == 0x3d) {
+address == 0x3c || address == 0x3d ||
+pci_access_cap_config(d, address, len)) {
 /* used for update-mappings (BAR emulation) */
 pci_default_write_config(d, address, val, len);
 return;
@@ -302,7 +303,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
 if ((address = 0x10  address = 0x24) || address == 0x34 ||
-address == 0x3c || address == 0x3d) {
+address == 0x3c || address == 0x3d ||
+pci_access_cap_config(d, address, len)) {
 val = pci_default_read_config(d, address, len);
 DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n,
   (d-devfn  3)  0x1F, (d-devfn  0x7), address, val, len);
@@ -331,11 +333,13 @@ do_log:
 DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n,
   (d-devfn  3)  0x1F, (d-devfn  0x7), address, val, len);
 
-/* kill the special capabilities */
-if (address == 4  len == 4)
-val = ~0x10;
-else if (address == 6)
-val = ~0x10;
+if (!pci_dev-cap.available) {
+/* kill the special capabilities */
+if (address == 4  len == 4)
+val = ~0x10;
+else if (address == 6)
+val = ~0x10;
+}
 
 return val;
 }
@@ -566,6 +570,95 @@ void assigned_dev_update_irq(PCIDevice *d)
 }
 }
 
+#if defined(KVM_CAP_DEVICE_MSI)  defined (KVM_CAP_GSI_ROUTE)
+static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
+{
+struct kvm_assigned_irq assigned_irq_data;
+struct kvm_gsi_route_guest gsi_route;
+struct kvm_gsi_route_entry_guest gsi_entry[1];
+AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+uint8_t ctrl_byte = pci_dev-cap.config[ctrl_pos];
+
+memset(assigned_irq_data, 0, sizeof assigned_irq_data);
+assigned_irq_data.assigned_dev_id  =
+calc_assigned_dev_id(assigned_dev-h_busnr,
+(uint8_t)assigned_dev-h_devfn);
+
+if (ctrl_byte  PCI_MSI_FLAGS_ENABLE) {
+   gsi_route.entries_nr = 1;
+gsi_entry[0].msi.addr_lo = *(uint32_t *)(pci_dev-cap.config +
+PCI_MSI_ADDRESS_LO);
+gsi_entry[0].msi.data = *(uint16_t *)(pci_dev-cap.config +
+ PCI_MSI_DATA_32);
+gsi_entry[0].type = KVM_GSI_ROUTE_MSI;
+gsi_route.entries = gsi_entry;
+if (kvm_request_gsi_route(kvm_context, gsi_route)  0) {
+perror(assigned_dev_enable_msi: kvm_request_gsi_route);
+assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED;
+return;
+}
+assigned_irq_data.guest_irq = gsi_entry[0].gsi;
+assigned_irq_data.flags = KVM_DEV_IRQ_ASSIGN_ENABLE_MSI;
+} else
+   assigned_irq_data.guest_irq = assigned_dev-girq;
+
+if (kvm_assign_irq(kvm_context, assigned_irq_data)  0)
+perror(assigned_dev_enable_msi);
+if (assigned_irq_data.flags  KVM_DEV_IRQ_ASSIGN_ENABLE_MSI) {
+assigned_dev-cap.state |= ASSIGNED_DEVICE_MSI_ENABLED;
+pci_dev-cap.config[ctrl_pos] |= PCI_MSI_FLAGS_ENABLE;
+} else {
+assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED;
+pci_dev-cap.config[ctrl_pos] = ~PCI_MSI_FLAGS_ENABLE;
+}
+}
+#endif
+
+void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address,
+  uint32_t val, int len)
+{
+AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+unsigned int pos = pci_dev-cap.start, ctrl_pos;
+
+pci_default_cap_write_config(pci_dev, address, val, len);
+#if defined(KVM_CAP_DEVICE_MSI)  defined (KVM_CAP_GSI_ROUTE)
+if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
+ctrl_pos = pos + PCI_MSI_FLAGS;
+if (address = ctrl_pos  address + len  ctrl_pos)
+assigned_dev_update_msi(pci_dev, ctrl_pos - pci_dev-cap.start);
+pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+}
+#endif
+return;
+}
+
+static void assigned_device_pci_cap_init(PCIDevice *pci_dev)
+{
+AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
+int next_cap_pt;
+struct pci_access *pacc;
+int h_bus, h_dev, h_func;
+
+pci_dev-cap.length = 0;
+h_bus = dev-h_busnr;
+h_dev = 

[PATCH 2/6] Make device assignment depend on libpci

2009-01-07 Thread Sheng Yang
Which is used later for capability detection.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu/Makefile.target |1 +
 qemu/configure   |   20 
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index f58015b..a58f31d 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -696,6 +696,7 @@ OBJS += device-hotplug.o
 
 ifeq ($(USE_KVM_DEVICE_ASSIGNMENT), 1)
 OBJS+= device-assignment.o
+LIBS+=-lpci
 endif
 
 ifeq ($(TARGET_BASE_ARCH), i386)
diff --git a/qemu/configure b/qemu/configure
index 6eb12ae..f5d3f89 100755
--- a/qemu/configure
+++ b/qemu/configure
@@ -780,6 +780,26 @@ EOF
 fi
 fi
 
+# libpci probe for kvm_cap_device_assignment
+if test $kvm_cap_device_assignment = yes ; then
+cat  $TMPC  EOF
+#include pci/pci.h
+#ifndef PCI_VENDOR_ID
+#error NO LIBPCI
+#endif
+int main(void) { return 0; }
+EOF
+if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $TMPC 2/dev/null ; then
+:
+else
+echo
+echo Error: libpci check failed
+echo Disable KVM Device Assignment capability.
+echo
+kvm_cap_device_assignment=no
+fi
+fi
+
 ##
 # zlib check
 
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] kvm: ioctl for gsi_route

2009-01-07 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 libkvm/libkvm.c |   27 +++
 libkvm/libkvm.h |8 
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 0408fdb..6d53f38 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1164,3 +1164,30 @@ int kvm_reinject_control(kvm_context_t kvm, int 
pit_reinject)
 #endif
return -ENOSYS;
 }
+
+#ifdef KVM_CAP_GSI_ROUTE
+int kvm_request_gsi_route(kvm_context_t kvm,
+ struct kvm_gsi_route_guest *route)
+{
+int ret;
+
+ret = ioctl(kvm-vm_fd, KVM_REQUEST_GSI_ROUTE, route);
+if (ret  0)
+return -errno;
+
+return ret;
+}
+
+int kvm_free_gsi_route(kvm_context_t kvm,
+  struct kvm_gsi_route_guest *route)
+{
+int ret;
+
+ret = ioctl(kvm-vm_fd, KVM_FREE_GSI_ROUTE, route);
+if (ret  0)
+return -errno;
+
+return ret;
+}
+
+#endif
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index ee1ba68..2bfcfe3 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -720,4 +720,12 @@ int kvm_assign_irq(kvm_context_t kvm,
  */
 int kvm_destroy_memory_region_works(kvm_context_t kvm);
 #endif
+
+#ifdef KVM_CAP_GSI_ROUTE
+int kvm_request_gsi_route(kvm_context_t kvm,
+ struct kvm_gsi_route_guest *route);
+int kvm_free_gsi_route(kvm_context_t kvm,
+  struct kvm_gsi_route_guest *route);
+#endif
+
 #endif
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] Figure out device capability

2009-01-07 Thread Sheng Yang
Try to figure out device capability in update_dev_cap(). Now we are only care
about MSI capability.

The function pci_find_cap_offset original function wrote by Allen for Xen.
Notice the function need root privilege to work. This depends on libpci to work.

Signed-off-by: Allen Kay allen.m@intel.com
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 qemu/hw/device-assignment.c |   29 +
 qemu/hw/device-assignment.h |1 +
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index f357d17..169357f 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -222,6 +222,35 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, 
int region_num,
   (r_dev-v_addrs + region_num));
 }
 
+static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap)
+{
+int id;
+int max_cap = 48;
+int pos = PCI_CAPABILITY_LIST;
+int status;
+
+status = pci_read_byte(pci_dev, PCI_STATUS);
+if ((status  PCI_STATUS_CAP_LIST) == 0)
+return 0;
+
+while (max_cap--) {
+pos = pci_read_byte(pci_dev, pos);
+if (pos  0x40)
+break;
+
+pos = ~3;
+id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID);
+
+if (id == 0xff)
+break;
+if (id == cap)
+return pos;
+
+pos += PCI_CAP_LIST_NEXT;
+}
+return 0;
+}
+
 static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
   uint32_t val, int len)
 {
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index a565948..2d83566 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -29,6 +29,7 @@
 #define __DEVICE_ASSIGNMENT_H__
 
 #include sys/mman.h
+#include pci/pci.h
 #include qemu-common.h
 #include sys-queue.h
 #include pci.h
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CPUID Masking MSRs

2009-01-07 Thread Andre Przywara

Alexander Graf wrote:

Well if I could take the FlexMigration design into account when putting
variables in the vcpu context, that'd be great. But I can't seem to find
it in the Intel documentation, so I'll leave it for now.
Not real documentation (tell me if you find some!), but this code shows 
almost everything you probably need:

http://xenbits.xensource.com/xen-unstable.hg?rev/be20b11656bb

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München
Geschäftsführer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis München
Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] KVM: Using ioapic_irqchip() macro for kvm_set_irq

2009-01-07 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 virt/kvm/irq_comm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 7460e7f..f5e2d2c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -39,7 +39,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, 
int level)
 * IOAPIC.  So set the bit in both. The guest will ignore
 * writes to the unused one.
 */
-   kvm_ioapic_set_irq(kvm-arch.vioapic, irq, !!(*irq_state));
+   kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state));
 #ifdef CONFIG_X86
kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state));
 #endif
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] KVM: Using gsi route for MSI device assignment

2009-01-07 Thread Sheng Yang
Convert MSI userspace interface to support gsi_msg mapping(and nobody should
be the user of the old interface...).

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |1 -
 virt/kvm/kvm_main.c  |   79 ++
 2 files changed, 45 insertions(+), 35 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6a00201..eab9588 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -316,7 +316,6 @@ struct kvm_assigned_dev_kernel {
int host_irq;
bool host_irq_disabled;
int guest_irq;
-   struct msi_msg guest_msi;
 #define KVM_ASSIGNED_DEV_GUEST_INTX(1  0)
 #define KVM_ASSIGNED_DEV_GUEST_MSI (1  1)
 #define KVM_ASSIGNED_DEV_HOST_INTX (1  8)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bc1a27b..0a59245 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -92,44 +92,56 @@ static void assigned_device_msi_dispatch(struct 
kvm_assigned_dev_kernel *dev)
int vcpu_id;
struct kvm_vcpu *vcpu;
struct kvm_ioapic *ioapic = ioapic_irqchip(dev-kvm);
-   int dest_id = (dev-guest_msi.address_lo  MSI_ADDR_DEST_ID_MASK)
-MSI_ADDR_DEST_ID_SHIFT;
-   int vector = (dev-guest_msi.data  MSI_DATA_VECTOR_MASK)
-MSI_DATA_VECTOR_SHIFT;
-   int dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
-   (unsigned long *)dev-guest_msi.address_lo);
-   int trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
-   (unsigned long *)dev-guest_msi.data);
-   int delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
-   (unsigned long *)dev-guest_msi.data);
+   struct kvm_gsi_route_entry *gsi_entry;
+   int dest_id, vector, dest_mode, trig_mode, delivery_mode;
u32 deliver_bitmask;
 
BUG_ON(!ioapic);
 
-   deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
+   gsi_entry = kvm_find_gsi_route_entry(dev-kvm, dev-guest_irq);
+   if (!gsi_entry) {
+   printk(KERN_WARNING kvm: fail to find correlated gsi entry\n);
+   return;
+   }
+
+   if (gsi_entry-type  KVM_GSI_ROUTE_MSI) {
+   dest_id = (gsi_entry-msi.address_lo  MSI_ADDR_DEST_ID_MASK)
+MSI_ADDR_DEST_ID_SHIFT;
+   vector = (gsi_entry-msi.data  MSI_DATA_VECTOR_MASK)
+MSI_DATA_VECTOR_SHIFT;
+   dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
+   (unsigned long *)gsi_entry-msi.address_lo);
+   trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
+   (unsigned long *)gsi_entry-msi.data);
+   delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
+   (unsigned long *)gsi_entry-msi.data);
+   deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
dest_id, dest_mode);
-   /* IOAPIC delivery mode value is the same as MSI here */
-   switch (delivery_mode) {
-   case IOAPIC_LOWEST_PRIORITY:
-   vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector,
-   deliver_bitmask);
-   if (vcpu != NULL)
-   kvm_apic_set_irq(vcpu, vector, trig_mode);
-   else
-   printk(KERN_INFO kvm: null lowest priority vcpu!\n);
-   break;
-   case IOAPIC_FIXED:
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask  (1  vcpu_id)))
-   continue;
-   deliver_bitmask = ~(1  vcpu_id);
-   vcpu = ioapic-kvm-vcpus[vcpu_id];
-   if (vcpu)
+   /* IOAPIC delivery mode value is the same as MSI here */
+   switch (delivery_mode) {
+   case IOAPIC_LOWEST_PRIORITY:
+   vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector,
+   deliver_bitmask);
+   if (vcpu != NULL)
kvm_apic_set_irq(vcpu, vector, trig_mode);
+   else
+   printk(KERN_INFO
+  kvm: null lowest priority vcpu!\n);
+   break;
+   case IOAPIC_FIXED:
+   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+   if (!(deliver_bitmask  (1  vcpu_id)))
+   continue;
+   deliver_bitmask = ~(1  vcpu_id);
+   vcpu = ioapic-kvm-vcpus[vcpu_id];
+   if (vcpu)
+   kvm_apic_set_irq(vcpu, vector,
+ 

[PATCH 09/10] KVM: Update intr delivery func to accept unsigned long* bitmap

2009-01-07 Thread Sheng Yang
Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
increased.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/lapic.c |8 
 include/linux/kvm_host.h |2 +-
 virt/kvm/ioapic.c|4 ++--
 virt/kvm/ioapic.h|4 ++--
 virt/kvm/irq_comm.c  |6 +++---
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afac68c..c1e4935 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -403,7 +403,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
 }
 
 static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector,
-  unsigned long bitmap)
+  unsigned long *bitmap)
 {
int last;
int next;
@@ -415,7 +415,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm 
*kvm, u8 vector,
do {
if (++next == KVM_MAX_VCPUS)
next = 0;
-   if (kvm-vcpus[next] == NULL || !test_bit(next, bitmap))
+   if (kvm-vcpus[next] == NULL || !test_bit(next, bitmap))
continue;
apic = kvm-vcpus[next]-arch.apic;
if (apic  apic_enabled(apic))
@@ -431,7 +431,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm 
*kvm, u8 vector,
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-   unsigned long bitmap)
+   unsigned long *bitmap)
 {
struct kvm_lapic *apic;
 
@@ -502,7 +502,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
}
 
if (delivery_mode == APIC_DM_LOWEST) {
-   target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map);
+   target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map);
if (target != NULL)
__apic_accept_irq(target-arch.apic, delivery_mode,
  vector, level, trig_mode);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2736dbf..ed1c6bb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -353,7 +353,7 @@ struct kvm_gsi_route_entry {
 
 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
   union kvm_ioapic_redirect_entry *entry,
-  u32 *deliver_bitmask);
+  unsigned long *deliver_bitmask);
 void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index aa4e8d8..0dcb0da 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -159,7 +159,7 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
 }
 
 void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-u8 dest_mode, u32 *mask)
+u8 dest_mode, unsigned long *mask)
 {
int i;
struct kvm *kvm = ioapic-kvm;
@@ -200,7 +200,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq];
-   u32 deliver_bitmask;
+   unsigned long deliver_bitmask;
struct kvm_vcpu *vcpu;
int vcpu_id, r = 0;
 
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index e107dbb..c418a7f 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -65,12 +65,12 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
*kvm)
 }
 
 struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector,
-  unsigned long bitmap);
+  unsigned long *bitmap);
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
 void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
 void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
-u8 dest_mode, u32 *mask);
+u8 dest_mode, unsigned long *mask);
 
 #endif
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index d97cdd6..baee4b7 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -30,7 +30,7 @@
 
 void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
   union kvm_ioapic_redirect_entry *entry,
-  u32 *deliver_bitmask)
+  unsigned long *deliver_bitmask)
 {
struct kvm_vcpu *vcpu;
 
@@ -40,7 +40,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
switch 

[PATCH 03/10] KVM: Improve MSI dispatch function

2009-01-07 Thread Sheng Yang
Prepare to merge with kvm_set_irq().

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 virt/kvm/kvm_main.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0a59245..717e1b0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -87,7 +87,7 @@ static bool kvm_rebooting;
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 
 #ifdef CONFIG_X86
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev)
+static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, 
u32 gsi)
 {
int vcpu_id;
struct kvm_vcpu *vcpu;
@@ -98,7 +98,7 @@ static void assigned_device_msi_dispatch(struct 
kvm_assigned_dev_kernel *dev)
 
BUG_ON(!ioapic);
 
-   gsi_entry = kvm_find_gsi_route_entry(dev-kvm, dev-guest_irq);
+   gsi_entry = kvm_find_gsi_route_entry(dev-kvm, gsi);
if (!gsi_entry) {
printk(KERN_WARNING kvm: fail to find correlated gsi entry\n);
return;
@@ -145,7 +145,7 @@ static void assigned_device_msi_dispatch(struct 
kvm_assigned_dev_kernel *dev)
}
 }
 #else
-static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) 
{}
+static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, 
u32 gsi) {}
 #endif
 
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
@@ -180,7 +180,7 @@ static void kvm_assigned_dev_interrupt_work_handler(struct 
work_struct *work)
assigned_dev-guest_irq, 1);
else if (assigned_dev-irq_requested_type 
KVM_ASSIGNED_DEV_GUEST_MSI) {
-   assigned_device_msi_dispatch(assigned_dev);
+   assigned_device_msi_dispatch(assigned_dev, 
assigned_dev-guest_irq);
enable_irq(assigned_dev-host_irq);
assigned_dev-host_irq_disabled = false;
}
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] KVM: bit ops for deliver_bitmap

2009-01-07 Thread Sheng Yang
It's also convenient when we extend KVM supported vcpu number in the future.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/lapic.c |7 ---
 virt/kvm/ioapic.c|   24 +---
 virt/kvm/irq_comm.c  |   17 +
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c1e4935..359e02c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -477,9 +477,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
struct kvm_vcpu *target;
struct kvm_vcpu *vcpu;
-   unsigned long lpr_map = 0;
+   DECLARE_BITMAP(lpr_map, KVM_MAX_VCPUS);
int i;
 
+   bitmap_zero(lpr_map, KVM_MAX_VCPUS);
apic_debug(icr_high 0x%x, icr_low 0x%x, 
   short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, 
   dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n,
@@ -494,7 +495,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
if (vcpu-arch.apic 
apic_match_dest(vcpu, apic, short_hand, dest, dest_mode)) {
if (delivery_mode == APIC_DM_LOWEST)
-   set_bit(vcpu-vcpu_id, lpr_map);
+   set_bit(vcpu-vcpu_id, lpr_map);
else
__apic_accept_irq(vcpu-arch.apic, 
delivery_mode,
  vector, level, trig_mode);
@@ -502,7 +503,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
}
 
if (delivery_mode == APIC_DM_LOWEST) {
-   target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map);
+   target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map);
if (target != NULL)
__apic_accept_irq(target-arch.apic, delivery_mode,
  vector, level, trig_mode);
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 0dcb0da..162cbdd 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -200,7 +200,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic 
*ioapic, u8 dest,
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq];
-   unsigned long deliver_bitmask;
+   DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
struct kvm_vcpu *vcpu;
int vcpu_id, r = 0;
 
@@ -210,22 +210,24 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 entry.fields.delivery_mode, entry.fields.vector,
 entry.fields.trig_mode);
 
-   kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask);
-   if (!deliver_bitmask) {
-   ioapic_debug(no target on destination\n);
-   return 0;
-   }
+   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
 
/* Always delivery PIT interrupt to vcpu 0 */
 #ifdef CONFIG_X86
if (irq == 0)
-   deliver_bitmask = 1  0;
+   set_bit(0, deliver_bitmask);
+   else
 #endif
+   kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask);
+
+   if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) = KVM_MAX_VCPUS) {
+   ioapic_debug(no target on destination\n);
+   return 0;
+   }
 
-   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
-   if (!(deliver_bitmask  (1  vcpu_id)))
-   continue;
-   deliver_bitmask = ~(1  vcpu_id);
+   while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS))
+KVM_MAX_VCPUS) {
+   clear_bit(vcpu_id, deliver_bitmask);
vcpu = ioapic-kvm-vcpus[vcpu_id];
if (vcpu) {
if (entry.fields.delivery_mode ==
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index baee4b7..bce3cd5 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -41,7 +41,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
case IOAPIC_LOWEST_PRIORITY:
vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm,
entry-fields.vector, deliver_bitmask);
-   *deliver_bitmask = 1  vcpu-vcpu_id;
+   set_bit(vcpu-vcpu_id, deliver_bitmask);
break;
case IOAPIC_FIXED:
case IOAPIC_NMI:
@@ -62,7 +62,7 @@ static void gsi_dispatch(struct kvm *kvm, u32 gsi)
struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
struct kvm_gsi_route_entry *gsi_entry;
union kvm_ioapic_redirect_entry entry;
-   unsigned long deliver_bitmask;
+   DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS);
 
BUG_ON(!ioapic);
 
@@ -72,6 +72,7 @@ static void gsi_dispatch(struct kvm *kvm, u32 gsi)
return;
}
 
+   bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS);
 #ifdef CONFIG_X86
if 

[PATCH 0/10][v4]GSI route layer for MSI/MSI-X

2009-01-07 Thread Sheng Yang
Update from v3:
Addressed Avi's comment, improve struct gsi_route_entry and use a pair of
ioctl to handle them(including some specific interrupt routing) all. Now only
support MSI/MSI-X.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] KVM: Merge MSI handling to kvm_set_irq

2009-01-07 Thread Sheng Yang
Using kvm_set_irq to handle all interrupt injection.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |2 +-
 virt/kvm/irq_comm.c  |   79 +++--
 virt/kvm/kvm_main.c  |   79 +++---
 3 files changed, 81 insertions(+), 79 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eab9588..bfdaab9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -351,7 +351,7 @@ struct kvm_gsi_route_entry {
struct hlist_node link;
 };
 
-void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
+void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index f5e2d2c..e9fcd23 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -24,10 +24,81 @@
 
 #include ioapic.h
 
+#ifdef CONFIG_X86
+#include asm/msidef.h
+#endif
+
+static void gsi_dispatch(struct kvm *kvm, u32 gsi)
+{
+   int vcpu_id;
+   struct kvm_vcpu *vcpu;
+   struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
+   struct kvm_gsi_route_entry *gsi_entry;
+   int dest_id, vector, dest_mode, trig_mode, delivery_mode;
+   u32 deliver_bitmask;
+
+   BUG_ON(!ioapic);
+
+   gsi_entry = kvm_find_gsi_route_entry(kvm, gsi);
+   if (!gsi_entry) {
+   printk(KERN_WARNING kvm: fail to find correlated gsi entry\n);
+   return;
+   }
+
+#ifdef CONFIG_X86
+   if (gsi_entry-type  KVM_GSI_ROUTE_MSI) {
+   dest_id = (gsi_entry-msi.address_lo  MSI_ADDR_DEST_ID_MASK)
+MSI_ADDR_DEST_ID_SHIFT;
+   vector = (gsi_entry-msi.data  MSI_DATA_VECTOR_MASK)
+MSI_DATA_VECTOR_SHIFT;
+   dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
+   (unsigned long *)gsi_entry-msi.address_lo);
+   trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
+   (unsigned long *)gsi_entry-msi.data);
+   delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
+   (unsigned long *)gsi_entry-msi.data);
+   deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
+   dest_id, dest_mode);
+   /* IOAPIC delivery mode value is the same as MSI here */
+   switch (delivery_mode) {
+   case IOAPIC_LOWEST_PRIORITY:
+   vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector,
+   deliver_bitmask);
+   if (vcpu != NULL)
+   kvm_apic_set_irq(vcpu, vector, trig_mode);
+   else
+   printk(KERN_INFO
+  kvm: null lowest priority vcpu!\n);
+   break;
+   case IOAPIC_FIXED:
+   for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) {
+   if (!(deliver_bitmask  (1  vcpu_id)))
+   continue;
+   deliver_bitmask = ~(1  vcpu_id);
+   vcpu = ioapic-kvm-vcpus[vcpu_id];
+   if (vcpu)
+   kvm_apic_set_irq(vcpu, vector,
+   trig_mode);
+   }
+   break;
+   default:
+   break;
+   }
+   }
+#endif /* CONFIG_X86 */
+}
+
 /* This should be called with the kvm-lock mutex held */
-void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level)
+void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level)
 {
-   unsigned long *irq_state = (unsigned long *)kvm-arch.irq_states[irq];
+   unsigned long *irq_state;
+
+   if (gsi  KVM_GSI_ROUTE_MASK) {
+   gsi_dispatch(kvm, gsi);
+   return;
+   }
+
+   irq_state = (unsigned long *)kvm-arch.irq_states[gsi];
 
/* Logical OR for level trig interrupt */
if (level)
@@ -39,9 +110,9 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int 
irq, int level)
 * IOAPIC.  So set the bit in both. The guest will ignore
 * writes to the unused one.
 */
-   kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state));
+   kvm_ioapic_set_irq(ioapic_irqchip(kvm), gsi, !!(*irq_state));
 #ifdef CONFIG_X86
-   kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state));
+   kvm_pic_set_irq(pic_irqchip(kvm), gsi, !!(*irq_state));
 #endif
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 

Re: [PATCH] CPUID Masking MSRs

2009-01-07 Thread Alexander Graf


On 07.01.2009, at 12:16, Andre Przywara wrote:


Alexander Graf wrote:
Well if I could take the FlexMigration design into account when  
putting
variables in the vcpu context, that'd be great. But I can't seem to  
find

it in the Intel documentation, so I'll leave it for now.
Not real documentation (tell me if you find some!), but this code  
shows almost everything you probably need:

http://xenbits.xensource.com/xen-unstable.hg?rev/be20b11656bb


It only shows two of the four feature values, but it's definitely a  
start :-). Thanks a lot! Looks like the Intel way is about the same.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings

2009-01-07 Thread Avi Kivity

Marcelo Tosatti wrote:

Let me shoot at one direction: a shadow page with PGE bit in either
state is created. Later that shadow page is nuked (via mmu notifiers,
for example). 


I doubt that mmu notifiers were invoked in this case (the bug would be 
very rare); in any case we flush the tlb.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Alexander Graf


On 07.01.2009, at 11:15, Avi Kivity wrote:


Alexander Graf wrote:

Hi,

while trying to run a current openSUSE in VMWare ESX in KVM (using  
NPT),

some KVM code seems to be stuck in an endless loop. The qemu process
hangs, I can't attach gdb to it and the kernel module seems to be
hanging in a place where I don't see any looping code. One CPU is
definitely stuck in sys at 100% though.

This is running git as of yesterday with some minor ESX modifications
that should not touch any of these parts (userspace and MSRs).

Maybe one of you guys has a clue what's going on here. You'll find a
snippet of a t-sysrq trace with all qemu relevant parts below. The
registers (incl. IP) of these don't change over time.

Alex

qemu-system-x D 810001025280 0 27900   9501
8101000e5c58 0082  8101000e5c1c
81011446e728 807e6280 807e6280 8100388ca680
80601890 8100388ca9c0 00200200 8100388ca9c0
Call Trace:
[804485ec] __mutex_lock_slowpath+0x72/0xa9
[8044847a] mutex_lock+0x1e/0x22
[88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
[88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777
[802acada] vfs_ioctl+0x2a/0x78
[802acd6f] do_vfs_ioctl+0x247/0x261
[802acdde] sys_ioctl+0x55/0x77
[8020bffa] system_call_after_swapgs+0x8a/0x8f
[7f2f3b15eb67]




Waiting for kvm-lock, so can't kill or strace.


qemu-system-x R  running task0 27908   9501
 88d7d3ad 0390 810100120040
810116491000 fee00390  
81011b361d08 88d7f1fb  0001
Call Trace:
Inexact backtrace:

[88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
[88d7f1fb] :kvm:emulate_instruction+0x199/0x266
[88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86
[88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1
[88a3e1b4] :kvm_amd:handle_exit+0x218/0x221
[88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
[88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485
[802acada] vfs_ioctl+0x2a/0x78
[802acd6f] do_vfs_ioctl+0x247/0x261
[802a13a3] fget_light+0x1/0x83
[802acdde] sys_ioctl+0x55/0x77
[802a0b48] sys_writev+0x60/0x94
[8020bffa] system_call_after_swapgs+0x8a/0x8f



But the mutex is not taken here.  Looks like we lost it, maybe  
CONFIG_LOCKDEP can find out where.


I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's  
actually locking itself up?

Btw: The issue seems to be easily reproducible :-)

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:


I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's 
actually locking itself up?

Btw: The issue seems to be easily reproducible :-)


Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just 
indicates the arch can do it if you want, IIUC.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Alexander Graf
Avi Kivity wrote:
 Alexander Graf wrote:

 I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
 actually locking itself up?
 Btw: The issue seems to be easily reproducible :-)

 Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
 indicates the arch can do it if you want, IIUC.

I just added some debug #define's to show me where exactly things break.


Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {

   2145 mmio:
   2146 /*
   2147  * Is this MMIO handled locally?
   2148  */
   2149 mutex_lock(vcpu-kvm-lock);
   2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
   2151 if (mmio_dev) {
   2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
   2153 mutex_unlock(vcpu-kvm-lock);
   2154 return X86EMUL_CONTINUE;
   2155 }
   2156 mutex_unlock(vcpu-kvm-lock);

   1901 case KVM_IRQ_LINE: {
   1902 struct kvm_irq_level irq_event;
   1903
   1904 r = -EFAULT;
   1905 if (copy_from_user(irq_event, argp, sizeof
irq_event))
   1906 goto out;
   1907 if (irqchip_in_kernel(kvm)) {
   1908 mutex_lock(kvm-lock);
   1909 kvm_set_irq(kvm,
KVM_USERSPACE_IRQ_SOURCE_ID,
   1910 irq_event.irq, irq_event.level);
   1911 mutex_unlock(kvm-lock);
   1912 r = 0;
   1913 }
   1914 break;
   1915 }

Any ideas?

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings

2009-01-07 Thread Marcelo Tosatti
On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
 Marcelo Tosatti wrote:
 Let me shoot at one direction: a shadow page with PGE bit in either
 state is created. Later that shadow page is nuked (via mmu notifiers,
 for example). 

 I doubt that mmu notifiers were invoked in this case (the bug would be  
 very rare); in any case we flush the tlb.

This comment is worrying

/*
 * FIXME: Tis shouldn't be necessary here, but there is a flush
 * missing in the MMU code. Until we find this bug, flush the
 * complete TLB here on an NPF
 */
if (npt_enabled)
svm_flush_tlb(svm-vcpu);

Alexander, you might want to try this patch, -ENONPT here (and revert the 
previous
one). I have no clue, what else could be causing this?

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10bdb2a..bf68e5b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -33,6 +33,7 @@
 #include asm/cmpxchg.h
 #include asm/io.h
 #include asm/vmx.h
+#include asm/tlbflush.h
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
 
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
pseudo_gfn = (iterator.addr  PT64_DIR_BASE_ADDR_MASK) 
 PAGE_SHIFT;
+
+kvm_flush_remote_tlbs(vcpu-kvm);
+kvm_mmu_flush_tlb(vcpu);
+__flush_tlb();
+
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Avi Kivity

Alexander Graf wrote:

Avi Kivity wrote:
  

Alexander Graf wrote:


I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
actually locking itself up?
Btw: The issue seems to be easily reproducible :-)
  

Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
indicates the arch can do it if you want, IIUC.



I just added some debug #define's to show me where exactly things break.


Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {

   2145 mmio:
   2146 /*
   2147  * Is this MMIO handled locally?
   2148  */
   2149 mutex_lock(vcpu-kvm-lock);
   2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
   2151 if (mmio_dev) {
   2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
   2153 mutex_unlock(vcpu-kvm-lock);
   2154 return X86EMUL_CONTINUE;
   2155 }
   2156 mutex_unlock(vcpu-kvm-lock);

  


The lock was lost here.  But how?


   1901 case KVM_IRQ_LINE: {
   1902 struct kvm_irq_level irq_event;
   1903
   1904 r = -EFAULT;
   1905 if (copy_from_user(irq_event, argp, sizeof
irq_event))
   1906 goto out;
   1907 if (irqchip_in_kernel(kvm)) {
   1908 mutex_lock(kvm-lock);
   1909 kvm_set_irq(kvm,
KVM_USERSPACE_IRQ_SOURCE_ID,
   1910 irq_event.irq, irq_event.level);
   1911 mutex_unlock(kvm-lock);
   1912 r = 0;
   1913 }
   1914 break;
   1915 }

  
This is your hung iothread trying to inject an interrupt.  It's waiting 
for the lost lock.


I suggest enabling all the lock debug magic you can find in kconfig.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


routed tap devices

2009-01-07 Thread Sterling Windmill
I am using kvm-82 on a 64-bit host and giving my virtual machines routed tap 
devices and utilizing proxy arp to provide them connectivity.

My host has two ethernet adapters, one connected to the WAN and the other is a 
private link to another server with a private IP address.

Even though I'm assigning device names (on the host) based upon mac address, it 
seems that depending upon the order in which the linux kernel sees my ethernet 
adapters they are behaving differently in terms of ip forwarding. 

If I run `ip link` I see eth1 listed before eth0 and a virtual machine running 
behind a tap device that is using ip forwarding sees eth1's IP as it's first 
hop in a traceroute. 

If I swap eth0 and eth1 (via their configuration), the first hop in the guest's 
traceroute is eth0's IP and `ip link` shows eth0 first. Is there a way to 
control this behavior other than switching physical ethernet adapters?

I may be paranoid, but I don't want the virtual machines to see my private IP 
address when using standard tools such as traceroute.

Anyone have any ideas?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] ATS capability support for Intel IOMMU

2009-01-07 Thread Yu Zhao
This patch series implements Address Translation Service support for
the Intel IOMMU. ATS provides ability for the PCI Endpoint to request
the DMA address translation from the IOMMU and cache the translation
in the Endpoint to alleviate IOMMU pressure and improve the hardware
performance in the I/O virtualization environment.

[PATCH 1/6] PCI: support the ATS capability
[PATCH 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure
[PATCH 3/6] VT-d: add queue invalidation fault status support
[PATCH 4/6] VT-d: add device IOTLB invalidation support
[PATCH 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
[PATCH 6/6] VT-d: support the device IOTLB
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] PCI: support the ATS capability

2009-01-07 Thread Yu Zhao
The ATS spec can be found at http://www.pcisig.com/specifications/iov/ats/
(it requires membership).

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/pci.c|   68 ++
 include/linux/pci.h  |   15 ++
 include/linux/pci_regs.h |   10 +++
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 061d1ee..5abab14 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1337,6 +1337,74 @@ void pci_enable_ari(struct pci_dev *dev)
bridge-ari_enabled = 1;
 }
 
+/**
+ * pci_enable_ats - enable the ATS capability
+ * @dev: the PCI device
+ * @ps: the IOMMU page shift
+ *
+ * Returns 0 on success, or a negative value on error.
+ */
+int pci_enable_ats(struct pci_dev *dev, int ps)
+{
+   int pos;
+   u16 ctrl;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
+   if (!pos)
+   return -ENODEV;
+
+   if (ps  PCI_ATS_MIN_STU)
+   return -EINVAL;
+
+   ctrl = PCI_ATS_CTRL_STU(ps - PCI_ATS_MIN_STU) | PCI_ATS_CTRL_ENABLE;
+   pci_write_config_word(dev, pos + PCI_ATS_CTRL, ctrl);
+
+   dev-ats_enabled = 1;
+
+   return 0;
+}
+
+/**
+ * pci_disable_ats - disable the ATS capability
+ * @dev: the PCI device
+ */
+void pci_disable_ats(struct pci_dev *dev)
+{
+   int pos;
+   u16 ctrl;
+
+   if (!dev-ats_enabled)
+   return;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
+   if (!pos)
+   return;
+
+   pci_read_config_word(dev, pos + PCI_ATS_CTRL, ctrl);
+   ctrl = ~PCI_ATS_CTRL_ENABLE;
+   pci_write_config_word(dev, pos + PCI_ATS_CTRL, ctrl);
+}
+
+/**
+ * pci_ats_qdep - query ATS Invalidate Queue Depth
+ * @dev: the PCI device
+ *
+ * Returns the queue depth on success, or 0 on error.
+ */
+int pci_ats_qdep(struct pci_dev *dev)
+{
+   int pos;
+   u16 cap;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
+   if (!pos)
+   return 0;
+
+   pci_read_config_word(dev, pos + PCI_ATS_CAP, cap);
+
+   return PCI_ATS_CAP_QDEP(cap) ? : PCI_ATS_MAX_QDEP;
+}
+
 int
 pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4bb156b..e6a1b5a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,7 @@ struct pci_dev {
unsigned intmsi_enabled:1;
unsigned intmsix_enabled:1;
unsigned intari_enabled:1;  /* ARI forwarding */
+   unsigned intats_enabled:1;  /* Address Translation Service */
unsigned intis_managed:1;
unsigned intis_pcie:1;
pci_dev_flags_t dev_flags;
@@ -1155,5 +1156,19 @@ static inline void __iomem *pci_ioremap_bar(struct 
pci_dev *pdev, int bar)
 }
 #endif
 
+extern int pci_enable_ats(struct pci_dev *dev, int ps);
+extern void pci_disable_ats(struct pci_dev *dev);
+extern int pci_ats_qdep(struct pci_dev *dev);
+/**
+ * pci_ats_enabled - query the ATS status
+ * @dev: the PCI device
+ *
+ * Returns 1 if ATS capability is enabled, or 0 if not.
+ */
+static inline int pci_ats_enabled(struct pci_dev *dev)
+{
+   return dev-ats_enabled;
+}
+
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */
diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index e5effd4..00c9db5 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -436,6 +436,7 @@
 #define PCI_EXT_CAP_ID_DSN 3
 #define PCI_EXT_CAP_ID_PWR 4
 #define PCI_EXT_CAP_ID_ARI 14
+#define PCI_EXT_CAP_ID_ATS 15
 
 /* Advanced Error Reporting */
 #define PCI_ERR_UNCOR_STATUS   4   /* Uncorrectable Error Status */
@@ -553,4 +554,13 @@
 #define  PCI_ARI_CTRL_ACS  0x0002  /* ACS Function Groups Enable */
 #define  PCI_ARI_CTRL_FG(x)(((x)  4)  7) /* Function Group */
 
+/* Address Translation Service */
+#define PCI_ATS_CAP0x04/* ATS Capability Register */
+#define  PCI_ATS_CAP_QDEP(x)   ((x)  0x1f)/* Invalidate Queue Depth */
+#define  PCI_ATS_MAX_QDEP  32  /* Max Invalidate Queue Depth */
+#define PCI_ATS_CTRL   0x06/* ATS Control Register */
+#define  PCI_ATS_CTRL_ENABLE   0x8000  /* ATS Enable */
+#define  PCI_ATS_CTRL_STU(x)   ((x)  0x1f)/* Smallest Translation Unit */
+#define  PCI_ATS_MIN_STU   12  /* shift of minimum STU block */
+
 #endif /* LINUX_PCI_REGS_H */
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure

2009-01-07 Thread Yu Zhao
Parse the Root Port ATS Capability Reporting Structure in DMA Remapping
Reporting Structure ACPI table.

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/dmar.c  |  114 --
 include/linux/dmar.h|9 +++
 include/linux/intel-iommu.h |1 +
 3 files changed, 118 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f5a662a..f2859d1 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -254,6 +254,86 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
}
return ret;
 }
+
+LIST_HEAD(dmar_atsr_units);
+
+static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
+{
+   struct acpi_dmar_atsr *atsr;
+   struct dmar_atsr_unit *atsru;
+
+   atsr = container_of(hdr, struct acpi_dmar_atsr, header);
+   atsru = kzalloc(sizeof(*atsru), GFP_KERNEL);
+   if (!atsru)
+   return -ENOMEM;
+
+   atsru-hdr = hdr;
+   atsru-include_all = atsr-flags  0x1;
+
+   if (atsru-include_all)
+   list_add_tail(atsru-list, dmar_atsr_units);
+   else
+   list_add(atsru-list, dmar_atsr_units);
+
+   return 0;
+}
+
+static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru)
+{
+   int ret = 0;
+   struct acpi_dmar_atsr *atsr;
+
+   atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header);
+   if (!atsru-include_all)
+   ret = dmar_parse_dev_scope((void *)(atsr + 1),
+   (void *)atsr + atsr-header.length,
+   atsru-devices_cnt, atsru-devices,
+   atsr-segment);
+
+   if (ret || !(atsru-include_all || atsru-devices_cnt)) {
+   list_del(atsru-list);
+   kfree(atsru);
+   }
+
+   return ret;
+}
+
+int dmar_find_matched_atsr_unit(struct pci_dev *dev)
+{
+   int i;
+   struct pci_bus *bus;
+   struct acpi_dmar_atsr *atsr;
+   struct dmar_atsr_unit *atsru;
+
+   list_for_each_entry(atsru, dmar_atsr_units, list) {
+   atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header);
+   if (atsr-segment == pci_domain_nr(dev-bus))
+   goto found;
+   }
+
+   return 0;
+
+found:
+   for (bus = dev-bus; bus; bus = bus-parent) {
+   struct pci_dev *bridge = bus-self;
+
+   if (!bridge || !bridge-is_pcie ||
+   bridge-pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+   return 0;
+
+   if (bridge-pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+   for (i = 0; i  atsru-devices_cnt; i++)
+   if (atsru-devices[i] == bridge)
+   return 1;
+   break;
+   }
+   }
+
+   if (atsru-include_all)
+   return 1;
+
+   return 0;
+}
 #endif
 
 static void __init
@@ -261,22 +341,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header 
*header)
 {
struct acpi_dmar_hardware_unit *drhd;
struct acpi_dmar_reserved_memory *rmrr;
+   struct acpi_dmar_atsr *atsr;
 
switch (header-type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
-   drhd = (struct acpi_dmar_hardware_unit *)header;
+   drhd = container_of(header, struct acpi_dmar_hardware_unit,
+   header);
printk (KERN_INFO PREFIX
-   DRHD (flags: 0x%08x)base: 0x%016Lx\n,
-   drhd-flags, (unsigned long long)drhd-address);
+   DRHD base: %#016Lx flags: %#x\n,
+   (unsigned long long)drhd-address, drhd-flags);
break;
case ACPI_DMAR_TYPE_RESERVED_MEMORY:
-   rmrr = (struct acpi_dmar_reserved_memory *)header;
-
+   rmrr = container_of(header, struct acpi_dmar_reserved_memory,
+   header);
printk (KERN_INFO PREFIX
-   RMRR base: 0x%016Lx end: 0x%016Lx\n,
+   RMRR base: %#016Lx end: %#016Lx\n,
(unsigned long long)rmrr-base_address,
(unsigned long long)rmrr-end_address);
break;
+   case ACPI_DMAR_TYPE_ATSR:
+   atsr = container_of(header, struct acpi_dmar_atsr, header);
+   printk(KERN_INFO PREFIX ATSR flags: %#x\n, atsr-flags);
+   break;
}
 }
 
@@ -341,6 +427,11 @@ parse_dmar_table(void)
ret = dmar_parse_one_rmrr(entry_header);
 #endif
break;
+   case ACPI_DMAR_TYPE_ATSR:
+#ifdef CONFIG_DMAR
+   ret = dmar_parse_one_atsr(entry_header);
+#endif
+   break;
default:
printk(KERN_WARNING PREFIX
Unknown DMAR structure type\n);

[PATCH 3/6] VT-d: add queue invalidation fault status support

2009-01-07 Thread Yu Zhao
Check fault register after submitting an queue invalidation request.

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/dmar.c   |   59 +++--
 drivers/pci/intr_remapping.c |   21 --
 include/linux/intel-iommu.h  |4 ++-
 3 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f2859d1..eb77258 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -673,19 +673,49 @@ static inline void reclaim_free_desc(struct q_inval *qi)
}
 }
 
+static int qi_check_fault(struct intel_iommu *iommu, int index)
+{
+   u32 fault;
+   int head;
+   struct q_inval *qi = iommu-qi;
+   int wait_index = (index + 1) % QI_LENGTH;
+
+   fault = readl(iommu-reg + DMAR_FSTS_REG);
+
+   /*
+* If IQE happens, the head points to the descriptor associated
+* with the error. No new descriptors are fetched until the IQE
+* is cleared.
+*/
+   if (fault  DMA_FSTS_IQE) {
+   head = readl(iommu-reg + DMAR_IQH_REG);
+   if ((head  DMAR_IQ_OFFSET) == index) {
+   memcpy(qi-desc[index], qi-desc[wait_index],
+   sizeof(struct qi_desc));
+   __iommu_flush_cache(iommu, qi-desc[index],
+   sizeof(struct qi_desc));
+   writel(DMA_FSTS_IQE, iommu-reg + DMAR_FSTS_REG);
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 /*
  * Submit the queued invalidation descriptor to the remapping
  * hardware unit and wait for its completion.
  */
-void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
+int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 {
+   int rc = 0;
struct q_inval *qi = iommu-qi;
struct qi_desc *hw, wait_desc;
int wait_index, index;
unsigned long flags;
 
if (!qi)
-   return;
+   return 0;
 
hw = qi-desc;
 
@@ -703,7 +733,8 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 
hw[index] = *desc;
 
-   wait_desc.low = QI_IWD_STATUS_DATA(2) | QI_IWD_STATUS_WRITE | 
QI_IWD_TYPE;
+   wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
+   QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
wait_desc.high = virt_to_phys(qi-desc_status[wait_index]);
 
hw[wait_index] = wait_desc;
@@ -714,13 +745,11 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
qi-free_head = (qi-free_head + 2) % QI_LENGTH;
qi-free_cnt -= 2;
 
-   spin_lock(iommu-register_lock);
/*
 * update the HW tail register indicating the presence of
 * new descriptors.
 */
-   writel(qi-free_head  4, iommu-reg + DMAR_IQT_REG);
-   spin_unlock(iommu-register_lock);
+   writel(qi-free_head  DMAR_IQ_OFFSET, iommu-reg + DMAR_IQT_REG);
 
while (qi-desc_status[wait_index] != QI_DONE) {
/*
@@ -730,6 +759,10 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 * a deadlock where the interrupt context can wait indefinitely
 * for free slots in the queue.
 */
+   rc = qi_check_fault(iommu, index);
+   if (rc)
+   break;
+
spin_unlock(qi-q_lock);
cpu_relax();
spin_lock(qi-q_lock);
@@ -739,6 +772,8 @@ void qi_submit_sync(struct qi_desc *desc, struct 
intel_iommu *iommu)
 
reclaim_free_desc(qi);
spin_unlock_irqrestore(qi-q_lock, flags);
+
+   return rc;
 }
 
 /*
@@ -751,13 +786,13 @@ void qi_global_iec(struct intel_iommu *iommu)
desc.low = QI_IEC_TYPE;
desc.high = 0;
 
+   /* should never fail */
qi_submit_sync(desc, iommu);
 }
 
 int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 u64 type, int non_present_entry_flush)
 {
-
struct qi_desc desc;
 
if (non_present_entry_flush) {
@@ -771,10 +806,7 @@ int qi_flush_context(struct intel_iommu *iommu, u16 did, 
u16 sid, u8 fm,
| QI_CC_GRAN(type) | QI_CC_TYPE;
desc.high = 0;
 
-   qi_submit_sync(desc, iommu);
-
-   return 0;
-
+   return qi_submit_sync(desc, iommu);
 }
 
 int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -804,10 +836,7 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 
addr,
desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
| QI_IOTLB_AM(size_order);
 
-   qi_submit_sync(desc, iommu);
-
-   return 0;
-
+   return qi_submit_sync(desc, iommu);
 }
 
 /*
diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
index f78371b..45effc5 100644
--- a/drivers/pci/intr_remapping.c
+++ b/drivers/pci/intr_remapping.c
@@ 

[PATCH 4/6] VT-d: add device IOTLB invalidation support

2009-01-07 Thread Yu Zhao
Support device IOTLB invalidation to flush the translation cached in the
Endpoint.

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/dmar.c  |   63 --
 include/linux/intel-iommu.h |   13 -
 2 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index eb77258..88f6b1f 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -666,7 +666,8 @@ void free_iommu(struct intel_iommu *iommu)
  */
 static inline void reclaim_free_desc(struct q_inval *qi)
 {
-   while (qi-desc_status[qi-free_tail] == QI_DONE) {
+   while (qi-desc_status[qi-free_tail] == QI_DONE ||
+  qi-desc_status[qi-free_tail] == QI_ABORT) {
qi-desc_status[qi-free_tail] = QI_FREE;
qi-free_tail = (qi-free_tail + 1) % QI_LENGTH;
qi-free_cnt++;
@@ -676,10 +677,13 @@ static inline void reclaim_free_desc(struct q_inval *qi)
 static int qi_check_fault(struct intel_iommu *iommu, int index)
 {
u32 fault;
-   int head;
+   int head, tail;
struct q_inval *qi = iommu-qi;
int wait_index = (index + 1) % QI_LENGTH;
 
+   if (qi-desc_status[wait_index] == QI_ABORT)
+   return -EAGAIN;
+
fault = readl(iommu-reg + DMAR_FSTS_REG);
 
/*
@@ -699,6 +703,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int 
index)
}
}
 
+   /*
+* If ITE happens, all pending wait_desc commands are aborted.
+* No new descriptors are fetched until the ITE is cleared.
+*/
+   if (fault  DMA_FSTS_ITE) {
+   head = readl(iommu-reg + DMAR_IQH_REG);
+   head = ((head  DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+   head |= 1;
+   tail = readl(iommu-reg + DMAR_IQT_REG);
+   tail = ((tail  DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH;
+
+   writel(DMA_FSTS_ITE, iommu-reg + DMAR_FSTS_REG);
+
+   do {
+   if (qi-desc_status[head] == QI_IN_USE)
+   qi-desc_status[head] = QI_ABORT;
+   head = (head - 2 + QI_LENGTH) % QI_LENGTH;
+   } while (head != tail);
+
+   if (qi-desc_status[wait_index] == QI_ABORT)
+   return -EAGAIN;
+   }
+
+   if (fault  DMA_FSTS_ICE)
+   writel(DMA_FSTS_ICE, iommu-reg + DMAR_FSTS_REG);
+
return 0;
 }
 
@@ -708,7 +738,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int 
index)
  */
 int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
 {
-   int rc = 0;
+   int rc;
struct q_inval *qi = iommu-qi;
struct qi_desc *hw, wait_desc;
int wait_index, index;
@@ -719,6 +749,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu 
*iommu)
 
hw = qi-desc;
 
+restart:
+   rc = 0;
+
spin_lock_irqsave(qi-q_lock, flags);
while (qi-free_cnt  3) {
spin_unlock_irqrestore(qi-q_lock, flags);
@@ -773,6 +806,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu 
*iommu)
reclaim_free_desc(qi);
spin_unlock_irqrestore(qi-q_lock, flags);
 
+   if (rc == -EAGAIN)
+   goto restart;
+
return rc;
 }
 
@@ -839,6 +875,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 
addr,
return qi_submit_sync(desc, iommu);
 }
 
+int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, int qdep,
+   u64 addr, unsigned int mask)
+{
+   struct qi_desc desc;
+
+   if (mask) {
+   BUG_ON(addr  ((1  (VTD_PAGE_SHIFT + mask)) - 1));
+   addr |= (1  (VTD_PAGE_SHIFT + mask - 1)) - 1;
+   desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
+   } else
+   desc.high = QI_DEV_IOTLB_ADDR(addr);
+
+   if (qdep = QI_DEV_IOTLB_MAX_INVS)
+   qdep = 0;
+
+   desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
+  QI_DIOTLB_TYPE;
+
+   return qi_submit_sync(desc, iommu);
+}
+
 /*
  * Enable Queued Invalidation interface. This is a must to support
  * interrupt-remapping. Also used by DMA-remapping, which replaces
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0a220c9..d82bdac 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -196,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 #define DMA_FSTS_PPF ((u32)2)
 #define DMA_FSTS_PFO ((u32)1)
 #define DMA_FSTS_IQE (1  4)
+#define DMA_FSTS_ICE (1  5)
+#define DMA_FSTS_ITE (1  6)
 #define dma_fsts_fault_record_index(s) (((s)  8)  0xff)
 
 /* FRCD_REG, 32 bits access */
@@ -224,7 +226,8 @@ do {
\
 enum {
QI_FREE,
QI_IN_USE,
-   QI_DONE
+   QI_DONE,
+   QI_ABORT
 };
 
 #define 

[PATCH 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps

2009-01-07 Thread Yu Zhao
Make iommu_flush_iotlb_psi() and flush_unmaps() easier to read.

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/intel-iommu.c |   46 +---
 1 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 235fb7a..261b6bd 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -916,30 +916,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, 
u16 did,
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
-   unsigned int mask;
+   int rc;
+   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
 
BUG_ON(addr  (~VTD_PAGE_MASK));
BUG_ON(pages == 0);
 
-   /* Fallback to domain selective flush if no PSI support */
-   if (!cap_pgsel_inv(iommu-cap))
-   return iommu-flush.flush_iotlb(iommu, did, 0, 0,
-   DMA_TLB_DSI_FLUSH,
-   non_present_entry_flush);
-
/*
+* Fallback to domain selective flush if no PSI support or the size is
+* too big.
 * PSI requires page size to be 2 ^ x, and the base address is naturally
 * aligned to the size
 */
-   mask = ilog2(__roundup_pow_of_two(pages));
-   /* Fallback to domain selective flush if size is too big */
-   if (mask  cap_max_amask_val(iommu-cap))
-   return iommu-flush.flush_iotlb(iommu, did, 0, 0,
-   DMA_TLB_DSI_FLUSH, non_present_entry_flush);
-
-   return iommu-flush.flush_iotlb(iommu, did, addr, mask,
-   DMA_TLB_PSI_FLUSH,
-   non_present_entry_flush);
+   if (!cap_pgsel_inv(iommu-cap) || mask  cap_max_amask_val(iommu-cap))
+   rc = iommu-flush.flush_iotlb(iommu, did, 0, 0,
+   DMA_TLB_DSI_FLUSH,
+   non_present_entry_flush);
+   else
+   rc = iommu-flush.flush_iotlb(iommu, did, addr, mask,
+   DMA_TLB_PSI_FLUSH,
+   non_present_entry_flush);
+   return rc;
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -2292,15 +2289,16 @@ static void flush_unmaps(void)
if (!iommu)
continue;
 
-   if (deferred_flush[i].next) {
-   iommu-flush.flush_iotlb(iommu, 0, 0, 0,
-DMA_TLB_GLOBAL_FLUSH, 0);
-   for (j = 0; j  deferred_flush[i].next; j++) {
-   __free_iova(deferred_flush[i].domain[j]-iovad,
-   deferred_flush[i].iova[j]);
-   }
-   deferred_flush[i].next = 0;
+   if (!deferred_flush[i].next)
+   continue;
+
+   iommu-flush.flush_iotlb(iommu, 0, 0, 0,
+DMA_TLB_GLOBAL_FLUSH, 0);
+   for (j = 0; j  deferred_flush[i].next; j++) {
+   __free_iova(deferred_flush[i].domain[j]-iovad,
+   deferred_flush[i].iova[j]);
}
+   deferred_flush[i].next = 0;
}
 
list_size = 0;
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] VT-d: support the device IOTLB

2009-01-07 Thread Yu Zhao
Support device IOTLB (i.e. ATS) for both native and KVM environments.

Signed-off-by: Yu Zhao yu.z...@intel.com

---
 drivers/pci/intel-iommu.c   |   97 +-
 include/linux/intel-iommu.h |1 +
 2 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 261b6bd..a7ff7cb 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -125,6 +125,7 @@ static inline void context_set_fault_enable(struct 
context_entry *context)
 }
 
 #define CONTEXT_TT_MULTI_LEVEL 0
+#define CONTEXT_TT_DEV_IOTLB   1
 
 static inline void context_set_translation_type(struct context_entry *context,
unsigned long value)
@@ -240,6 +241,8 @@ struct device_domain_info {
struct list_head global; /* link to global list */
u8 bus; /* PCI bus numer */
u8 devfn;   /* PCI devfn number */
+   int qdep;   /* invalidate queue depth */
+   struct intel_iommu *iommu; /* IOMMU used by this device */
struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
struct dmar_domain *domain; /* pointer to domain */
 };
@@ -913,6 +916,75 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, 
u16 did,
return 0;
 }
 
+static struct device_domain_info *
+iommu_support_dev_iotlb(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+   int found = 0;
+   unsigned long flags;
+   struct device_domain_info *info;
+   struct intel_iommu *iommu = device_to_iommu(bus, devfn);
+
+   if (!ecap_dev_iotlb_support(iommu-ecap))
+   return NULL;
+
+   if (!iommu-qi)
+   return NULL;
+
+   spin_lock_irqsave(device_domain_lock, flags);
+   list_for_each_entry(info, domain-devices, link)
+   if (info-dev  info-bus == bus  info-devfn == devfn) {
+   found = 1;
+   break;
+   }
+   spin_unlock_irqrestore(device_domain_lock, flags);
+
+   if (!found)
+   return NULL;
+
+   if (!dmar_find_matched_atsr_unit(info-dev))
+   return NULL;
+
+   info-iommu = iommu;
+   info-qdep = pci_ats_qdep(info-dev);
+   if (!info-qdep)
+   return NULL;
+
+   return info;
+}
+
+static void iommu_enable_dev_iotlb(struct device_domain_info *info)
+{
+   pci_enable_ats(info-dev, VTD_PAGE_SHIFT);
+}
+
+static void iommu_disable_dev_iotlb(struct device_domain_info *info)
+{
+   if (info-dev  pci_ats_enabled(info-dev))
+   pci_disable_ats(info-dev);
+}
+
+static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
+ u64 addr, unsigned int mask)
+{
+   int rc;
+   u16 sid;
+   unsigned long flags;
+   struct device_domain_info *info;
+
+   spin_lock_irqsave(device_domain_lock, flags);
+   list_for_each_entry(info, domain-devices, link) {
+   if (!info-dev || !pci_ats_enabled(info-dev))
+   continue;
+
+   sid = info-bus  8 | info-devfn;
+   rc = qi_flush_dev_iotlb(info-iommu, sid,
+   info-qdep, addr, mask);
+   if (rc)
+   printk(KERN_ERR IOMMU: flush device IOTLB failed\n);
+   }
+   spin_unlock_irqrestore(device_domain_lock, flags);
+}
+
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
@@ -936,6 +1008,9 @@ static int iommu_flush_iotlb_psi(struct intel_iommu 
*iommu, u16 did,
rc = iommu-flush.flush_iotlb(iommu, did, addr, mask,
DMA_TLB_PSI_FLUSH,
non_present_entry_flush);
+   if (!rc  !non_present_entry_flush)
+   iommu_flush_dev_iotlb(iommu-domains[did], addr, mask);
+
return rc;
 }
 
@@ -1460,6 +1535,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
unsigned long ndomains;
int id;
int agaw;
+   struct device_domain_info *info;
 
pr_debug(Set context mapping for %02x:%02x.%d\n,
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1525,7 +1601,11 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
context_set_domain_id(context, id);
context_set_address_width(context, iommu-agaw);
context_set_address_root(context, virt_to_phys(pgd));
-   context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
+   info = iommu_support_dev_iotlb(domain, bus, devfn);
+   if (info)
+   context_set_translation_type(context, CONTEXT_TT_DEV_IOTLB);
+   else
+   context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL);
context_set_fault_enable(context);
context_set_present(context);
   

Re: [PATCH 01/10] KVM: Add a route layer to convert MSI message to GSI

2009-01-07 Thread Marcelo Tosatti
Hi Sheng,

On Wed, Jan 07, 2009 at 06:42:37PM +0800, Sheng Yang wrote:
 Avi's purpose, to use single kvm_set_irq() to deal with all interrupt, 
 including
 MSI. So here is it.
 
 struct gsi_route_entry is a mapping from a special gsi(with KVM_GSI_MSG_MASK) 
 to
 MSI/MSI-X message address/data. And the struct can also be extended for other
 purpose.
 
 Now we support up to 256 gsi_route_entry mapping, and gsi is allocated by 
 kernel and
 provide two ioctls to userspace, which is more flexiable.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  include/linux/kvm.h  |   26 +++
  include/linux/kvm_host.h |   20 +
  virt/kvm/irq_comm.c  |   70 ++
  virt/kvm/kvm_main.c  |  106 
 ++
  4 files changed, 222 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 71c150f..bbefce6 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -399,6 +399,9 @@ struct kvm_trace_rec {
  #if defined(CONFIG_X86)
  #define KVM_CAP_REINJECT_CONTROL 24
  #endif
 +#if defined(CONFIG_X86)
 +#define KVM_CAP_GSI_ROUTE 25
 +#endif
  
  /*
   * ioctls for VM fds
 @@ -433,6 +436,8 @@ struct kvm_trace_rec {
  #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \
   struct kvm_assigned_irq)
  #define KVM_REINJECT_CONTROL  _IO(KVMIO, 0x71)
 +#define KVM_REQUEST_GSI_ROUTE  _IOWR(KVMIO, 0x72, void *)
 +#define KVM_FREE_GSI_ROUTE _IOR(KVMIO, 0x73, void *)
  
  /*
   * ioctls for vcpu fds
 @@ -553,4 +558,25 @@ struct kvm_assigned_irq {
  #define KVM_DEV_IRQ_ASSIGN_MSI_ACTIONKVM_DEV_IRQ_ASSIGN_ENABLE_MSI
  #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI(1  0)
  
 +struct kvm_gsi_route_guest {
 + __u32 entries_nr;
 + struct kvm_gsi_route_entry_guest *entries;
 +};
 +
 +#define KVM_GSI_ROUTE_MSI(1  0)
 +struct kvm_gsi_route_entry_guest {
 + __u32 gsi;
 + __u32 type;
 + __u32 flags;
 + __u32 reserved;
 + union {
 + struct {
 + __u32 addr_lo;
 + __u32 addr_hi;
 + __u32 data;
 + } msi;
 + __u32 padding[8];
 + };
 +};
 +
  #endif
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index a8bcad0..6a00201 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -136,6 +136,9 @@ struct kvm {
   unsigned long mmu_notifier_seq;
   long mmu_notifier_count;
  #endif
 + struct hlist_head gsi_route_list;
 +#define KVM_NR_GSI_ROUTE_ENTRIES256
 + DECLARE_BITMAP(gsi_route_bitmap, KVM_NR_GSI_ROUTE_ENTRIES);
  };
  
  /* The guest did something we don't support. */
 @@ -336,6 +339,19 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, 
 int irq,
 struct kvm_irq_mask_notifier *kimn);
  void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask);
  
 +#define KVM_GSI_ROUTE_MASK0x100ull
 +struct kvm_gsi_route_entry {
 + u32 gsi;
 + u32 type;
 + u32 flags;
 + u32 reserved;
 + union {
 + struct msi_msg msi;
 + u32 reserved[8];
 + };
 + struct hlist_node link;
 +};
 +
  void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
  void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
  void kvm_register_irq_ack_notifier(struct kvm *kvm,
 @@ -343,6 +359,10 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
  void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian);
  int kvm_request_irq_source_id(struct kvm *kvm);
  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
 +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry);
 +struct kvm_gsi_route_entry *kvm_find_gsi_route_entry(struct kvm *kvm, u32 
 gsi);
 +void kvm_free_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry);
 +void kvm_free_gsi_route_list(struct kvm *kvm);
  
  #ifdef CONFIG_DMAR
  int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index 5162a41..7460e7f 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -123,3 +123,73 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, 
 bool mask)
   kimn-func(kimn, mask);
  }
  
 +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry)
 +{
 + struct kvm_gsi_route_entry *found_entry, *new_entry;
 + int r, gsi;
 +
 + mutex_lock(kvm-lock);
 + /* Find whether we need a update or a new entry */
 + found_entry = kvm_find_gsi_route_entry(kvm, entry-gsi);
 + if (found_entry)
 + *found_entry = *entry;
 + else {

Having a kvm_find_alloc_gsi_route_entry which either returns a present
entry if found or returns a newly allocated one makes the code easier to
read for me. Then just

entry = kvm_find_alloc_gsi_route_entry
*entry = 

[ kvm-Bugs-2030703 ] Virtio Vista drivers

2009-01-07 Thread SourceForge.net
Bugs item #2030703, was opened at 2008-07-29 05:05
Message generated for change (Comment added) made by roy-anonymous
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ross Patterson (rossp)
Assigned to: Nobody/Anonymous (nobody)
Summary: Virtio Vista drivers

Initial Comment:
Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized 
ethernet adapter or block device seem to work under Windows Vista.  It would be 
nice to have Vista compatible drivers.

--

Comment By: roy anonymous (roy-anonymous)
Date: 2009-01-08 01:09

Message:
Is there really a block device driver for win2k and winxp??

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2490866 ] repeatable corruption with qcow2 on kvm-79

2009-01-07 Thread SourceForge.net
Bugs item #2490866, was opened at 2009-01-07 05:10
Message generated for change (Comment added) made by roy-anonymous
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2490866group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Adrian Bridgett (abridgett)
Assigned to: Nobody/Anonymous (nobody)
Summary: repeatable corruption with qcow2 on kvm-79

Initial Comment:
Creating a qcow2 image, mkfs.ext3, sometimes mounting it would fail 
immediately, but in all cases it would corrupt (overwritten with zeros) after 
starting up backuppc on it.   This is KVM-79 on a Debian lenny host and guest.

This occured using virtio or not.  Swapping to a raw file or LV worked 
flawlessly.  I've tested the box with memtest and I don't have issues elsewhere 
but I've seen corruptions on other images.  host and guest are both 
2.6.26-1-adm64 kernel (debian lenny) I'm running 32-bit userspace everywhere.  
Dual core Intel Core2 E6300.

I see KVM-81 has improve qcow2 data integrity with cache=writethrough which 
_might_ be what I'm hitting - but I can't find more details about this to check 
(and backport patch to debian package or wait for newer debian package).

thanks.

--

Comment By: roy anonymous (roy-anonymous)
Date: 2009-01-08 01:14

Message:
I am not quite sure it's true or not, for my case, I get corruption if I
have a new FC9 Guest installation with qcow2 with virtio_blk. But it won't
have any problem if I install with a FC8 qcow2 installation, then upgrade
to FC9 with virtio_blk

--

Comment By: Laszlo Dvornik (ldvornik)
Date: 2009-01-07 15:42

Message:
Same problem here.

With Lenny and vanilla 2.6.28 kernel, with KVM 79, and with KVM 82 user
tools.
Tried with KVM 82 module compile for 2.6.28 and with 2.6.28 builtin KVM
sources.
32-bit userspace and kernel, Intel C2D T7100.

Another effect:
With empty qcow2, vmdk disk image formats, when I try to create a
partition and save the new partition table, they can't save it until
reboot. With raw image format there is no such problem.

I liked to try with qcow, but:
qemu: could not open disk image teszt.qcow

I switched all of my disk images to raw, until the problem fixed.

PS: The host filesystem is ext4, but I tested under ext3 filesystem too
and the problem wasn't disappeared.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2490866group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5][RFC] virtio-net: Add load/save for status bits

2009-01-07 Thread Alex Williamson
virtio-net: Add load/save for status bits

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index bfb7510..77e3077 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -16,6 +16,8 @@
 #include qemu-timer.h
 #include virtio-net.h
 
+#define VIRTIO_VM_VERSION  2
+
 typedef struct VirtIONet
 {
 VirtIODevice vdev;
@@ -307,13 +309,14 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 
 qemu_put_buffer(f, n-mac, 6);
 qemu_put_be32(f, n-tx_timer_active);
+qemu_put_be16(f, n-status);
 }
 
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 {
 VirtIONet *n = opaque;
 
-if (version_id != 1)
+if (version_id  1 || version_id  VIRTIO_VM_VERSION)
 return -EINVAL;
 
 virtio_load(n-vdev, f);
@@ -321,6 +324,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_buffer(f, n-mac, 6);
 n-tx_timer_active = qemu_get_be32(f);
 
+if (version_id = 2)
+n-status = qemu_get_be16(f);
+
 if (n-tx_timer_active) {
 qemu_mod_timer(n-tx_timer,
qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
@@ -363,7 +369,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int 
devfn)
 n-tx_timer_active = 0;
 n-mergeable_rx_bufs = 0;
 
-register_savevm(virtio-net, virtio_net_id++, 1,
+register_savevm(virtio-net, virtio_net_id++, VIRTIO_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
 
 return (PCIDevice *)n;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5][RFC] virti-net: Enable filtering based on MAC, promisc, broadcast and allmulti

2009-01-07 Thread Alex Williamson
virti-net: Enable filtering based on MAC, promisc, broadcast and allmulti

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 653cad4..fa8e71c 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -167,6 +167,25 @@ static int receive_header(VirtIONet *n, struct iovec *iov, 
int iovcnt,
 return offset;
 }
 
+static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
+{
+static uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+
+if (n-status.bits.promisc)
+return 1;
+
+if ((buf[0]  1)  n-status.bits.allmulti)
+return 1;
+
+if (!memcmp(buf, bcast, sizeof(bcast)))
+return 1;
+
+if (!memcmp(buf, n-mac, 6))
+return 1;
+
+return 0;
+}
+
 static void virtio_net_receive(void *opaque, const uint8_t *buf, int size)
 {
 VirtIONet *n = opaque;
@@ -176,6 +195,9 @@ static void virtio_net_receive(void *opaque, const uint8_t 
*buf, int size)
 if (!do_virtio_net_can_receive(n, size))
 return;
 
+if (!receive_filter(n, buf, size))
+return;
+
 /* hdr_len refers to the header we supply to the guest */
 hdr_len = n-mergeable_rx_bufs ?
 sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct 
virtio_net_hdr);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5][RFC] virtio-net: Add additional MACs via a filter table

2009-01-07 Thread Alex Williamson
virtio-net: Add additional MACs via a filter table

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   27 +--
 hw/virtio-net.h |4 
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index fa8e71c..f7cc36f 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -16,7 +16,7 @@
 #include qemu-timer.h
 #include virtio-net.h
 
-#define VIRTIO_VM_VERSION  2
+#define VIRTIO_VM_VERSION  3
 
 typedef struct VirtIONet
 {
@@ -28,6 +28,7 @@ typedef struct VirtIONet
 uint16_t link:1;
 uint16_t promisc:1;
 uint16_t allmulti:1;
+uint16_t mac_table:1;
 } bits;
 } status;
 VirtQueue *rx_vq;
@@ -36,6 +37,7 @@ typedef struct VirtIONet
 QEMUTimer *tx_timer;
 int tx_timer_active;
 int mergeable_rx_bufs;
+uint64_t mac_table[16];
 } VirtIONet;
 
 /* TODO
@@ -54,6 +56,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t 
*config)
 
 netcfg.status.raw = n-status.raw;
 memcpy(netcfg.mac, n-mac, 6);
+memcpy(netcfg.mac_table, n-mac_table, sizeof(netcfg.mac_table));
 memcpy(config, netcfg, sizeof(netcfg));
 }
 
@@ -77,7 +80,12 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 n-status.bits.promisc = netcfg.status.bits.promisc;
if (netcfg.status.bits.allmulti != n-status.bits.allmulti)
 n-status.bits.allmulti = netcfg.status.bits.allmulti;
+   if (netcfg.status.bits.mac_table != n-status.bits.mac_table)
+n-status.bits.mac_table = netcfg.status.bits.mac_table;
 }
+
+if (memcmp(n-mac_table, netcfg.mac_table, sizeof(n-mac_table)))
+memcpy(n-mac_table, netcfg.mac_table, sizeof(n-mac_table));
 }
 
 static void virtio_net_set_link_status(VLANClientState *vc)
@@ -92,7 +100,8 @@ static void virtio_net_set_link_status(VLANClientState *vc)
 
 static uint32_t virtio_net_get_features(VirtIODevice *vdev)
 {
-uint32_t features = (1  VIRTIO_NET_F_MAC) | (1  VIRTIO_NET_F_STATUS);
+uint32_t features = (1  VIRTIO_NET_F_MAC) | (1  VIRTIO_NET_F_STATUS) |
+(1  VIRTIO_NET_F_MAC_TABLE);
 
 return features;
 }
@@ -170,6 +179,7 @@ static int receive_header(VirtIONet *n, struct iovec *iov, 
int iovcnt,
 static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
 {
 static uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+int i;
 
 if (n-status.bits.promisc)
 return 1;
@@ -183,6 +193,15 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
 if (!memcmp(buf, n-mac, 6))
 return 1;
 
+if (n-status.bits.mac_table) {
+for (i = 0; i  16; i++) {
+uint8_t *mac = (uint8_t *)n-mac_table[i];
+
+if (mac[7]  !memcmp(buf, mac, 6))
+return 1;
+}
+}
+
 return 0;
 }
 
@@ -342,6 +361,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 qemu_put_buffer(f, n-mac, 6);
 qemu_put_be32(f, n-tx_timer_active);
 qemu_put_be16(f, n-status.raw);
+qemu_put_buffer(f, (uint8_t *)n-mac_table, sizeof(n-mac_table));
 }
 
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
@@ -361,6 +381,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 else
 n-status.raw |= (VIRTIO_NET_S_PROMISC | VIRTIO_NET_S_ALLMULTI);
 
+if (version_id = 3)
+   qemu_get_buffer(f, (uint8_t *)n-mac_table, sizeof(n-mac_table));
+
 if (n-tx_timer_active) {
 qemu_mod_timer(n-tx_timer,
qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 74f1595..532c7c4 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -38,10 +38,12 @@
 #define VIRTIO_NET_F_HOST_UFO   14  /* Host can handle UFO in. */
 #define VIRTIO_NET_F_MRG_RXBUF  15  /* Host can merge receive buffers. */
 #define VIRTIO_NET_F_STATUS 16  /* virtio_net_config.status available 
*/
+#define VIRTIO_NET_F_MAC_TABLE  17  /* Additional MAC addresses */
 
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
 #define VIRTIO_NET_S_PROMISC2   /* Promiscuous mode */
 #define VIRTIO_NET_S_ALLMULTI   4   /* All-multicast mode */
+#define VIRTIO_NET_S_MAC_TABLE  8   /* Enable MAC filter table */
 
 #define TX_TIMER_INTERVAL 15 /* 150 us */
 
@@ -59,8 +61,10 @@ struct virtio_net_config
 uint16_t link:1;
 uint16_t promisc:1;
 uint16_t allmulti:1;
+uint16_t mac_table:1;
 } bits;
 } status;
+uint64_t mac_table[16];
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2030703 ] Virtio Vista drivers

2009-01-07 Thread SourceForge.net
Bugs item #2030703, was opened at 2008-07-28 23:05
Message generated for change (Comment added) made by martinmaurer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ross Patterson (rossp)
Assigned to: Nobody/Anonymous (nobody)
Summary: Virtio Vista drivers

Initial Comment:
Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized 
ethernet adapter or block device seem to work under Windows Vista.  It would be 
nice to have Vista compatible drivers.

--

Comment By: martinmaurer (martinmaurer)
Date: 2009-01-07 18:36

Message:
there are drivers for vista (network).
the latest release: see
https://sourceforge.net/project/showfiles.php?group_id=180599package_id=267944

(virtio drivers for a block device on windows are not available)

--

Comment By: roy anonymous (roy-anonymous)
Date: 2009-01-07 18:09

Message:
Is there really a block device driver for win2k and winxp??

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5][RFC] virtio-net: MAC filtering

2009-01-07 Thread Alex Williamson
This series is based on some of the work Mark McLoughlin has been doing,
so isn't going to apply until that makes it into the tree.  The goal is
to enable MAC filtering at the qemu/kvm level for virtio-net packets.  I
start by adding the capability to set the MAC address, naming the bits
in the status field, enabling filtering, and finally adding a MAC table
for additional MAC addresses.  If this looks reasonable, I'll follow up
with VLAN filtering support.

A concern here is the growing size of the virtio-net I/O port space
config.  This series brings it up to 256 bytes with PCI resource
rounding.  The VLAN filter bitmap would increase that by another 512
bytes, making it 1kB and limiting us to something less than 64 such
devices per guest.  Is anyone worried?  Should filter tables live in
MMIO space for virtio devices?  I'll send out the guest side patches for
virtio-net in a separate thread.  Thanks,

Alex

-- 
Alex Williamson HP Open Source  Linux Org.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5][RFC] virtio-net: Allow setting the MAC address via set_config

2009-01-07 Thread Alex Williamson
virtio-net: Allow setting the MAC address via set_config

Rename get_config for simplicity

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   21 +++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 2c41b3e..bfb7510 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -38,7 +38,7 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev)
 return (VirtIONet *)vdev;
 }
 
-static void virtio_net_update_config(VirtIODevice *vdev, uint8_t
*config)
+static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
 {
 VirtIONet *n = to_virtio_net(vdev);
 struct virtio_net_config netcfg;
@@ -48,6 +48,22 @@ static void virtio_net_update_config(VirtIODevice
*vdev, uint8_t *config)
 memcpy(config, netcfg, sizeof(netcfg));
 }
 
+static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t
*config)
+{
+VirtIONet *n = to_virtio_net(vdev);
+struct virtio_net_config netcfg;
+
+memcpy(netcfg, config, sizeof(netcfg));
+
+if (memcmp(netcfg.mac, n-mac, 6)) {
+memcpy(n-mac, netcfg.mac, 6);
+snprintf(n-vc-info_str, sizeof(n-vc-info_str),
+ virtio macaddr=%02x:%02x:%02x:%02x:%02x:%02x,
+ n-mac[0], n-mac[1], n-mac[2],
+ n-mac[3], n-mac[4], n-mac[5]);
+}
+}
+
 static void virtio_net_set_link_status(VLANClientState *vc)
 {
 VirtIONet *n = vc-opaque;
@@ -326,7 +342,8 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd,
int devfn)
 if (!n)
 return NULL;
 
-n-vdev.get_config = virtio_net_update_config;
+n-vdev.get_config = virtio_net_get_config;
+n-vdev.set_config = virtio_net_set_config;
 n-vdev.get_features = virtio_net_get_features;
 n-vdev.set_features = virtio_net_set_features;
 n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti

2009-01-07 Thread Alex Williamson
virtio-net: Name the status bits, adding promisc and allmulti

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   36 
 hw/virtio-net.h |   11 ++-
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 77e3077..653cad4 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -22,7 +22,14 @@ typedef struct VirtIONet
 {
 VirtIODevice vdev;
 uint8_t mac[6];
-uint16_t status;
+union {
+uint16_t raw;
+struct {
+uint16_t link:1;
+uint16_t promisc:1;
+uint16_t allmulti:1;
+} bits;
+} status;
 VirtQueue *rx_vq;
 VirtQueue *tx_vq;
 VLANClientState *vc;
@@ -45,7 +52,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t 
*config)
 VirtIONet *n = to_virtio_net(vdev);
 struct virtio_net_config netcfg;
 
-netcfg.status = n-status;
+netcfg.status.raw = n-status.raw;
 memcpy(netcfg.mac, n-mac, 6);
 memcpy(config, netcfg, sizeof(netcfg));
 }
@@ -64,20 +71,23 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
  n-mac[0], n-mac[1], n-mac[2],
  n-mac[3], n-mac[4], n-mac[5]);
 }
+
+if (netcfg.status.raw != n-status.raw) {
+   if (netcfg.status.bits.promisc != n-status.bits.promisc)
+n-status.bits.promisc = netcfg.status.bits.promisc;
+   if (netcfg.status.bits.allmulti != n-status.bits.allmulti)
+n-status.bits.allmulti = netcfg.status.bits.allmulti;
+}
 }
 
 static void virtio_net_set_link_status(VLANClientState *vc)
 {
 VirtIONet *n = vc-opaque;
-uint16_t old_status = n-status;
-
-if (vc-link_down)
-n-status = ~VIRTIO_NET_S_LINK_UP;
-else
-n-status |= VIRTIO_NET_S_LINK_UP;
 
-if (n-status != old_status)
+if (n-status.bits.link != !(vc-link_down)) {
+   n-status.bits.link = !(vc-link_down);
 virtio_notify_config(n-vdev);
+}
 }
 
 static uint32_t virtio_net_get_features(VirtIODevice *vdev)
@@ -309,7 +319,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 
 qemu_put_buffer(f, n-mac, 6);
 qemu_put_be32(f, n-tx_timer_active);
-qemu_put_be16(f, n-status);
+qemu_put_be16(f, n-status.raw);
 }
 
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
@@ -325,7 +335,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 n-tx_timer_active = qemu_get_be32(f);
 
 if (version_id = 2)
-n-status = qemu_get_be16(f);
+n-status.raw = qemu_get_be16(f);
+else
+n-status.raw |= (VIRTIO_NET_S_PROMISC | VIRTIO_NET_S_ALLMULTI);
 
 if (n-tx_timer_active) {
 qemu_mod_timer(n-tx_timer,
@@ -355,7 +367,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int 
devfn)
 n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx);
 n-tx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_tx);
 memcpy(n-mac, nd-macaddr, 6);
-n-status = VIRTIO_NET_S_LINK_UP;
+n-status.raw = VIRTIO_NET_S_LINK_UP;
 n-vc = qemu_new_vlan_client(nd-vlan, nd-model, nd-name,
  virtio_net_receive, virtio_net_can_receive, 
n);
 n-vc-link_status_changed = virtio_net_set_link_status;
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 9ac9e34..74f1595 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -40,6 +40,8 @@
 #define VIRTIO_NET_F_STATUS 16  /* virtio_net_config.status available 
*/
 
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
+#define VIRTIO_NET_S_PROMISC2   /* Promiscuous mode */
+#define VIRTIO_NET_S_ALLMULTI   4   /* All-multicast mode */
 
 #define TX_TIMER_INTERVAL 15 /* 150 us */
 
@@ -51,7 +53,14 @@ struct virtio_net_config
 /* The config defining mac address (6 bytes) */
 uint8_t mac[6];
 /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
-uint16_t status;
+union {
+uint16_t raw;
+struct {
+uint16_t link:1;
+uint16_t promisc:1;
+uint16_t allmulti:1;
+} bits;
+} status;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2][RFC] virtio_net: MAC filtering

2009-01-07 Thread Alex Williamson

This series builds on some of the patches Mark McLoughlin has sent out
recently, so likely won't apply to any current trees until those get
upstream.  The goal is to enable MAC filtering at the kvm/qemu level for
virtio-net packets.  Promiscuous and allmulti mode are handled by adding
bits to Mark's proposed status field.  I also add a 16 entry MAC table
for additional unicast and multicast addresses to filter.  If this looks
reasonable, I'll follow-up with VLAN filtering.

As noted in the RFC thread adding the kvm/qemu backing, this does
increase the size of the virtio-net device I/O port space, up to 1kB
with PCI rounding if we add a 4k entry VLAN bitmap.  A 64 device limit
is still pretty high for a VM, but maybe we should think about adding
MMIO space for virtio-pci.  Thanks,

Alex

-- 
Alex Williamson HP Open Source  Linux Org.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2][RFC] virtio_net: Enable setting MAC, promisc, and allmulti mode

2009-01-07 Thread Alex Williamson
virtio_net: Enable setting MAC, promisc, and allmulti mode

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 drivers/net/virtio_net.c   |   79 
 include/linux/virtio_net.h |   11 ++
 2 files changed, 82 insertions(+), 8 deletions(-)


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3af5e33..f502edd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,7 +41,14 @@ struct virtnet_info
struct virtqueue *rvq, *svq;
struct net_device *dev;
struct napi_struct napi;
-   unsigned int status;
+   union {
+   u16 raw;
+   struct {
+   u16 link:1;
+   u16 promisc:1;
+   u16 allmulti:1;
+   } bits;
+   } status; 
 
/* The skb we couldn't send because buffers were full. */
struct sk_buff *last_xmit_skb;
@@ -476,6 +483,54 @@ static int virtnet_set_tx_csum(struct net_device *dev, u32 
data)
return ethtool_op_set_tx_hw_csum(dev, data);
 }
 
+static int virtnet_set_mac_address(struct net_device *dev, void *p)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+   struct sockaddr *addr = p;
+
+   if (!is_valid_ether_addr(addr-sa_data))
+   return -EADDRNOTAVAIL;
+
+   memcpy(dev-dev_addr, addr-sa_data, dev-addr_len);
+
+   vdev-config-set(vdev, offsetof(struct virtio_net_config, mac),
+ dev-dev_addr, dev-addr_len);
+
+   return 0;
+}
+
+static void virtnet_set_rx_mode(struct net_device *dev)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+   u16 status = vi-status.raw;
+
+   if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_STATUS))
+   return;
+
+   if (dev-flags  IFF_PROMISC)
+   status |= VIRTIO_NET_S_PROMISC;
+   else
+   status = ~VIRTIO_NET_S_PROMISC;
+
+   if (dev-flags  IFF_ALLMULTI)
+   status |= VIRTIO_NET_S_ALLMULTI;
+   else
+   status = ~VIRTIO_NET_S_ALLMULTI;
+   
+   if (dev-uc_count)
+   status |= VIRTIO_NET_S_PROMISC;
+   if (dev-mc_count)
+   status |= VIRTIO_NET_S_ALLMULTI;
+
+   if (status != vi-status.raw) {
+   vi-status.raw = status;
+   vdev-config-set(vdev, offsetof(struct virtio_net_config,
+ status), vi-status, sizeof(vi-status));
+   }
+}
+
 static struct ethtool_ops virtnet_ethtool_ops = {
.set_tx_csum = virtnet_set_tx_csum,
.set_sg = ethtool_op_set_sg,
@@ -494,14 +549,15 @@ static void virtnet_update_status(struct virtnet_info *vi)
   v, sizeof(v));
 
/* Ignore unknown (future) status bits */
-   v = VIRTIO_NET_S_LINK_UP;
+   v = VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_PROMISC |
+ VIRTIO_NET_S_ALLMULTI;
 
-   if (vi-status == v)
+   if (vi-status.raw == v)
return;
 
-   vi-status = v;
+   vi-status.raw = v;
 
-   if (vi-status  VIRTIO_NET_S_LINK_UP) {
+   if (vi-status.bits.link) {
netif_carrier_on(vi-dev);
netif_wake_queue(vi-dev);
} else {
@@ -563,8 +619,17 @@ static int virtnet_probe(struct virtio_device *vdev)
vdev-config-get(vdev,
  offsetof(struct virtio_net_config, mac),
  dev-dev_addr, dev-addr_len);
-   } else
+   } else {
+   struct sockaddr addr;
+
random_ether_addr(dev-dev_addr);
+   memset(addr, 0, sizeof(addr));
+   memcpy(addr.sa_data, dev-dev_addr, dev-addr_len);
+   virtnet_set_mac_address(dev, addr);
+   }
+
+   dev-set_mac_address = virtnet_set_mac_address;
+   dev-set_rx_mode = virtnet_set_rx_mode;
 
/* Set up our device-specific information */
vi = netdev_priv(dev);
@@ -621,7 +686,7 @@ static int virtnet_probe(struct virtio_device *vdev)
goto unregister;
}
 
-   vi-status = VIRTIO_NET_S_LINK_UP;
+   vi-status.raw = VIRTIO_NET_S_LINK_UP;
virtnet_update_status(vi);
 
pr_debug(virtnet: registered device %s\n, dev-name);
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index d9174be..5a70edb 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -23,6 +23,8 @@
 #define VIRTIO_NET_F_STATUS16  /* virtio_net_config.status available */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
+#define VIRTIO_NET_S_PROMISC   2   /* Promiscuous mode */
+#define VIRTIO_NET_S_ALLMULTI  4   /* All-multicast mode */
 
 struct virtio_net_config
 {
@@ -30,7 +32,14 @@ struct virtio_net_config
__u8 mac[6];
/* Status supplied by host; see 

[PATCH 2/2][RFC] virtio_net: Add MAC fitler table support

2009-01-07 Thread Alex Williamson
virtio_net: Add MAC fitler table support

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 drivers/net/virtio_net.c   |   52 +---
 include/linux/virtio_net.h |6 -
 2 files changed, 54 insertions(+), 4 deletions(-)


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f502edd..d751711 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -505,6 +505,8 @@ static void virtnet_set_rx_mode(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi-vdev;
u16 status = vi-status.raw;
+   struct dev_addr_list *uc_ptr, *mc_ptr;
+   int i;
 
if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_STATUS))
return;
@@ -519,11 +521,55 @@ static void virtnet_set_rx_mode(struct net_device *dev)
else
status = ~VIRTIO_NET_S_ALLMULTI;

-   if (dev-uc_count)
+   if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_MAC_TABLE)) {
+   if (dev-uc_count)
+   status |= VIRTIO_NET_S_PROMISC;
+   if (dev-mc_count)
+   status |= VIRTIO_NET_S_ALLMULTI;
+   if (status != vi-status.raw) {
+   vi-status.raw = status;
+   vdev-config-set(vdev,
+ offsetof(struct virtio_net_config,
+ status), vi-status,
+ sizeof(vi-status));
+   }
+   return;
+   }
+
+   if (dev-uc_count  16) {
status |= VIRTIO_NET_S_PROMISC;
-   if (dev-mc_count)
+   if (dev-mc_count  16)
+   status |= VIRTIO_NET_S_ALLMULTI;
+   } else if (dev-uc_count + dev-mc_count  16)
status |= VIRTIO_NET_S_ALLMULTI;
 
+   if ((dev-uc_count  !(status  VIRTIO_NET_S_PROMISC)) ||
+   (dev-mc_count  !(status  VIRTIO_NET_S_ALLMULTI)))
+   status |= VIRTIO_NET_S_MAC_TABLE;
+   else
+   status = ~VIRTIO_NET_S_MAC_TABLE;
+
+   uc_ptr = dev-uc_list;
+   mc_ptr = dev-mc_list;
+
+   for (i = 0; i  16; i++) {
+   uint8_t entry[8] = { 0 };
+
+   if (uc_ptr  !(status  VIRTIO_NET_S_PROMISC)) {
+   memcpy(entry, uc_ptr-da_addr, 6);
+   entry[7] = 1;
+   uc_ptr = uc_ptr-next;
+   } else if (mc_ptr  !(status  VIRTIO_NET_S_ALLMULTI)) {
+   memcpy(entry, mc_ptr-da_addr, 6);
+   entry[7] = 1;
+   mc_ptr = mc_ptr-next;
+   }
+
+   vdev-config-set(vdev, offsetof(struct virtio_net_config,
+ mac_table) + (sizeof(entry) * i),
+ entry, sizeof(entry));
+   }
+
if (status != vi-status.raw) {
vi-status.raw = status;
vdev-config-set(vdev, offsetof(struct virtio_net_config,
@@ -744,7 +790,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */
-   VIRTIO_NET_F_STATUS,
+   VIRTIO_NET_F_STATUS, VIRTIO_NET_F_MAC_TABLE,
VIRTIO_F_NOTIFY_ON_EMPTY,
 };
 
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 5a70edb..905319b 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -21,10 +21,12 @@
 #define VIRTIO_NET_F_HOST_ECN  13  /* Host can handle TSO[6] w/ ECN in. */
 #define VIRTIO_NET_F_HOST_UFO  14  /* Host can handle UFO in. */
 #define VIRTIO_NET_F_STATUS16  /* virtio_net_config.status available */
+#define VIRTIO_NET_F_MAC_TABLE 17  /* Additional MAC addresses */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 #define VIRTIO_NET_S_PROMISC   2   /* Promiscuous mode */
 #define VIRTIO_NET_S_ALLMULTI  4   /* All-multicast mode */
+#define VIRTIO_NET_S_MAC_TABLE 8   /* Enable MAC filter table */
 
 struct virtio_net_config
 {
@@ -38,8 +40,10 @@ struct virtio_net_config
__u16 link:1;
__u16 promisc:1;
__u16 allmulti:1;
+   __u16 mac_table:1;
} bits;
-   } status;
+   } status;
+   __u64 mac_table[16];
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti

2009-01-07 Thread Anthony Liguori

Alex Williamson wrote:

virtio-net: Name the status bits, adding promisc and allmulti

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   36 
 hw/virtio-net.h |   11 ++-
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 77e3077..653cad4 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -22,7 +22,14 @@ typedef struct VirtIONet
 {
 VirtIODevice vdev;
 uint8_t mac[6];
-uint16_t status;
+union {
+uint16_t raw;
+struct {
+uint16_t link:1;
+uint16_t promisc:1;
+uint16_t allmulti:1;
+} bits;
+} status;
  


I'd prefer the use of #define's like we have today.  bit fields have 
really weird packing and ordering properties across architectures.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5][RFC] virtio-net: Add load/save for status bits

2009-01-07 Thread Anthony Liguori

Alex Williamson wrote:

virtio-net: Add load/save for status bits

Signed-off-by: Alex Williamson alex.william...@hp.com
---

 hw/virtio-net.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index bfb7510..77e3077 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -16,6 +16,8 @@
 #include qemu-timer.h
 #include virtio-net.h
 
+#define VIRTIO_VM_VERSION	2

+
  


virtio-net is now at 2 already because of the mergable buffers fix but 
this is definitely needed for Mark's set_link changes.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti

2009-01-07 Thread Alex Williamson
On Wed, 2009-01-07 at 12:09 -0600, Anthony Liguori wrote:
 Alex Williamson wrote:
  virtio-net: Name the status bits, adding promisc and allmulti

 
 I'd prefer the use of #define's like we have today.  bit fields have 
 really weird packing and ordering properties across architectures.

Ok, it made a few things easier, but I'll work on using a mask
interface.  Thanks,

Alex

-- 
Alex Williamson HP Open Source  Linux Org.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] virtio_net: MAC filtering

2009-01-07 Thread Anthony Liguori

Alex Williamson wrote:

This series builds on some of the patches Mark McLoughlin has sent out
recently, so likely won't apply to any current trees until those get
upstream.  The goal is to enable MAC filtering at the kvm/qemu level for
virtio-net packets.  Promiscuous and allmulti mode are handled by adding
bits to Mark's proposed status field.  I also add a 16 entry MAC table
for additional unicast and multicast addresses to filter.  If this looks
reasonable, I'll follow-up with VLAN filtering.

As noted in the RFC thread adding the kvm/qemu backing, this does
increase the size of the virtio-net device I/O port space, up to 1kB
with PCI rounding if we add a 4k entry VLAN bitmap.  A 64 device limit
is still pretty high for a VM, but maybe we should think about adding
MMIO space for virtio-pci.  Thanks,
  


I'm not quite sure the best way to address this.  Maybe another control 
queue for sending commands to control this sort of stuff?  What are your 
thoughts Rusty?


Regards,

Anthony Liguori


Alex

  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_net: add link status handling

2009-01-07 Thread Mark McLoughlin
Hi Rusty,

On Fri, 2008-12-12 at 18:34 +1030, Rusty Russell wrote:
 On Thursday 11 December 2008 05:04:44 Mark McLoughlin wrote:
  On Tue, 2008-12-09 at 21:11 -0600, Anthony Liguori wrote:
   Rusty Russell wrote:
On Wednesday 10 December 2008 08:02:14 Anthony Liguori wrote:
It would be nice if the virtio-net card wrote some acknowledgement 
that 
it has received the link status down/up events.
   
How about of every status change event? ie. a generic virtio_pci 
solution?
   
   A really simple way to do it would just be to have another status field 
   that was the guest's status (verses the host requested status which the 
   current field is).  All config reads/writes result in exits so it's easy 
   to track.
   
   Adding YA virtio event may be a little overkill.
  
  Sounds very reasonable; that and Rusty's mask out unknown bits
  suggestion in the version below.
 
 Not quite what I was after.  I've taken the original patch, added the
 masking change.  I'll test here and feed to DaveM.

This never got pushed to davem, did it? What you've got in your queue
looks fine to me ...

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] virtio_net: MAC filtering

2009-01-07 Thread Alex Williamson
On Wed, 2009-01-07 at 12:14 -0600, Anthony Liguori wrote:
 Alex Williamson wrote:
 
  As noted in the RFC thread adding the kvm/qemu backing, this does
  increase the size of the virtio-net device I/O port space, up to 1kB
  with PCI rounding if we add a 4k entry VLAN bitmap.  A 64 device limit
  is still pretty high for a VM, but maybe we should think about adding
  MMIO space for virtio-pci.  Thanks,

 
 I'm not quite sure the best way to address this.  Maybe another control 
 queue for sending commands to control this sort of stuff?  What are your 
 thoughts Rusty?

This is also a good time to decide if a fixed 16 entry MAC filter table
is sufficient.  Should the size be programmed into the config space?
There's plenty of room to make it a bigger fixed size and still stay at
1kB of I/O port space with the VLAN table.  This implementation is a
little wasteful of space in using 8 bytes to store the MAC and a valid
bit, but I suspect there's some endian issues I'm ignoring and a
standard data type might make that easier later.

Alex

-- 
Alex Williamson HP Open Source  Linux Org.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2][RFC] virtio_net: MAC filtering

2009-01-07 Thread Anthony Liguori

Alex Williamson wrote:

On Wed, 2009-01-07 at 12:14 -0600, Anthony Liguori wrote:
  

Alex Williamson wrote:


As noted in the RFC thread adding the kvm/qemu backing, this does
increase the size of the virtio-net device I/O port space, up to 1kB
with PCI rounding if we add a 4k entry VLAN bitmap.  A 64 device limit
is still pretty high for a VM, but maybe we should think about adding
MMIO space for virtio-pci.  Thanks,
  
  
I'm not quite sure the best way to address this.  Maybe another control 
queue for sending commands to control this sort of stuff?  What are your 
thoughts Rusty?



This is also a good time to decide if a fixed 16 entry MAC filter table
is sufficient.  Should the size be programmed into the config space?
There's plenty of room to make it a bigger fixed size and still stay at
1kB of I/O port space with the VLAN table.  This implementation is a
little wasteful of space in using 8 bytes to store the MAC and a valid
bit, but I suspect there's some endian issues I'm ignoring and a
standard data type might make that easier later.
  


If we switch to a command queue, then there's no need to have any fixed 
limitation.


Regards,

Anthony Liguori


Alex

  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host kernel hang

2009-01-07 Thread Alexander Graf





On 07.01.2009, at 14:53, Avi Kivity a...@redhat.com wrote:


Alexander Graf wrote:

Avi Kivity wrote:


Alexander Graf wrote:


I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
actually locking itself up?
Btw: The issue seems to be easily reproducible :-)


Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
indicates the arch can do it if you want, IIUC.



I just added some debug #define's to show me where exactly things  
break.



Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {

  2145 mmio:
  2146 /*
  2147  * Is this MMIO handled locally?
  2148  */
  2149 mutex_lock(vcpu-kvm-lock);
  2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
  2151 if (mmio_dev) {
  2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
  2153 mutex_unlock(vcpu-kvm-lock);
  2154 return X86EMUL_CONTINUE;
  2155 }
  2156 mutex_unlock(vcpu-kvm-lock);




The lock was lost here.  But how?


  1901 case KVM_IRQ_LINE: {
  1902 struct kvm_irq_level irq_event;
  1903
  1904 r = -EFAULT;
  1905 if (copy_from_user(irq_event, argp, sizeof
irq_event))
  1906 goto out;
  1907 if (irqchip_in_kernel(kvm)) {
  1908 mutex_lock(kvm-lock);
  1909 kvm_set_irq(kvm,
KVM_USERSPACE_IRQ_SOURCE_ID,
  1910 irq_event.irq,  
irq_event.level);

  1911 mutex_unlock(kvm-lock);
  1912 r = 0;
  1913 }
  1914 break;
  1915 }


This is your hung iothread trying to inject an interrupt.  It's  
waiting for the lost lock.


I suggest enabling all the lock debug magic you can find in kconfig.


I did that and still don't get anything. I'll try digging deeper into  
this tomorrow.


Alex




--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG() with SCSI-interfaced disk images

2009-01-07 Thread John Morrissey
On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
 I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
 images. I've tried with the Debian packaging of KVM 79 and 82; both
 exhibit the same behavior (disclaimer: Debian has about a dozen patches in
 their kvm packaging, but they all seem to be changes to the build/install
 process or security-related).

Not to be pushy, but does anyone have any ideas on this, or can I provide
any additional information? I'm afraid I'm a bit over my head when debugging
kernel internals.

john

 IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
 lenny (32-bit/i386) running kernel 2.6.26 (Debian
 linux-image-2.6.26-1-amd64 2.6.26-12).
 
 After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
 filesystem is a reliable trigger), the kernel BUGs (oops output below).
 
 I was previously using KVM 72, and tried upgrading to 79 because both
 Debian lenny and Ubuntu hardy guests were panicing due to sym
 disconnects/timeouts. 79 makes the lenny guest start BUGging as described
 above. 82 is not perceivably different from 79 for the lenny guest.
 
 FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up,
 although it emits:
 
 Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : 
 No Sense [current] 
 Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: 
 No additional sense information
 
 at seemingly random intervals. The upgrade to 82 made the hardy guest
 start BUGging on soft lockups at random intervals (I can provide the full
 output if anyone's interested, but I'm much more interested in the lenny
 guest oops at this point).
 
 john
 
 
 run via libvirt:
 /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
   -boot c -drive file=image.qcow,if=scsi,index=0,boot=on
   -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
   -net tap,fd=17,script=,vlan=0,ifname=vnet2 \
   -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
   -net tap,fd=18,script=,vlan=1,ifname=vnet3 \
   -serial pty -parallel none -usb -vnc 0.0.0.0:1
 
 [The KVMWiki asks whether the problem is reproducible with
  -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the
  above command line by hand (outside of libvirt), the VNC console was
  always blank and there was no console output on the serial pty. If this
  would be useful information to have in this case, I'd love to know what
  I'm doing wrong, or if there's a way to specify additional command line
  arguments with libvirt.]
 
 oops generated in the guest:
 [  140.101828] sym0: unexpected disconnect
 [  140.102748] BUG: unable to handle kernel NULL pointer dereference at 
 0358
 [  140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f
 [  140.106449] *pdpt = 1f5f9001 *pde =  
 [  140.107356] Oops:  [#1] SMP 
 [  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr 
 serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod 
 cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring 
 virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix 
 ide_core thermal processor fan thermal_sys
 [  140.108062] 
 [  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
 [  140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0
 [  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
 [  140.108062] EAX: 000a EBX:  ECX: 1f98c084 EDX: 0030
 [  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
 [  140.108062]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 [  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 
 task.ti=de0f2000)
 [  140.108062] Stack:  000144d6 7f5a222c c011a853 0021d496  
   
 [  140.108062] df98c000 e08e08cd   0001 
  df98c000 
 [  140.108062]0084 e08e3f2f df988c00 0046  df544400 
 0196  
 [  140.108062] Call Trace:
 [  140.108062]  [c011a853] pvclock_clocksource_read+0x4b/0xd0
 [  140.108062]  [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
 [  140.108062]  [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
 [  140.108062]  [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
 [  140.108062]  [c0158e4e] handle_IRQ_event+0x23/0x51
 [  140.108062]  [c0159f4d] handle_fasteoi_irq+0x71/0xa4
 [  140.108062]  [c010afd2] do_IRQ+0x4d/0x63
 [  140.108062]  [c01092a7] common_interrupt+0x23/0x28
 [  140.108062]  [c01300d8] ptrace_request+0x1ec/0x278
 [  140.108062]  [c012d0c6] __do_softirq+0x57/0xd3
 [  140.108062]  [c012d187] do_softirq+0x45/0x53
 [  140.108062]  [c012d43e] irq_exit+0x35/0x67
 [  140.108062]  [c01152b6] smp_apic_timer_interrupt+0x6b/0x75
 [  140.108062]  [c0109364] apic_timer_interrupt+0x28/0x30
 [  140.108062]  

Re: [PATCH 05/10] KVM: Merge MSI handling to kvm_set_irq

2009-01-07 Thread Marcelo Tosatti
On Wed, Jan 07, 2009 at 06:42:41PM +0800, Sheng Yang wrote:
 Using kvm_set_irq to handle all interrupt injection.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  include/linux/kvm_host.h |2 +-
  virt/kvm/irq_comm.c  |   79 +++--
  virt/kvm/kvm_main.c  |   79 
 +++---
  3 files changed, 81 insertions(+), 79 deletions(-)
 
 +static void gsi_dispatch(struct kvm *kvm, u32 gsi)
 +{
 + int vcpu_id;
 + struct kvm_vcpu *vcpu;
 + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
 + struct kvm_gsi_route_entry *gsi_entry;
 + int dest_id, vector, dest_mode, trig_mode, delivery_mode;
 + u32 deliver_bitmask;
 +
 + BUG_ON(!ioapic);
 +
 + gsi_entry = kvm_find_gsi_route_entry(kvm, gsi);
 + if (!gsi_entry) {
 + printk(KERN_WARNING kvm: fail to find correlated gsi entry\n);
 + return;
 + }
 +
 +#ifdef CONFIG_X86
 + if (gsi_entry-type  KVM_GSI_ROUTE_MSI) {
 + dest_id = (gsi_entry-msi.address_lo  MSI_ADDR_DEST_ID_MASK)
 +  MSI_ADDR_DEST_ID_SHIFT;
 + vector = (gsi_entry-msi.data  MSI_DATA_VECTOR_MASK)
 +  MSI_DATA_VECTOR_SHIFT;
 + dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT,
 + (unsigned long *)gsi_entry-msi.address_lo);
 + trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT,
 + (unsigned long *)gsi_entry-msi.data);
 + delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT,
 + (unsigned long *)gsi_entry-msi.data);
 + deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic,
 + dest_id, dest_mode);
 + /* IOAPIC delivery mode value is the same as MSI here */
 + switch (delivery_mode) {

Sheng, 

This code seems to ignore the RH bit (MSI_ADDR_REDIRECTION_SHIFT):

4.Destination mode (DM) — This bit indicates whether the Destination
ID field should be interpreted as logical or physical APIC ID for
delivery of the lowest priority interrupt. If RH is 1 and DM is 0,
the Destination ID field is in physical destination mode and only the
processor in the system that has the matching APIC ID is considered
for delivery of that interrupt (this means no re-direction). If RH
is 1 and DM is 1, the Destination ID Field is interpreted as in
logical destination mode and the redirection is limited to only those
processors that are part of the logical group of processors based
on the processor’s logical APIC ID and the Destination ID field
in the message. The logical group of processors consists of those
identified by matching the 8-bit Destination ID with the logical
destination identified by the Destination Format Register and the
Logical Destination Register in each local APIC.

Is that intentional?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2030703 ] Virtio Vista drivers

2009-01-07 Thread SourceForge.net
Bugs item #2030703, was opened at 2008-07-29 00:05
Message generated for change (Comment added) made by thekozmo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ross Patterson (rossp)
Assigned to: Nobody/Anonymous (nobody)
Summary: Virtio Vista drivers

Initial Comment:
Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized 
ethernet adapter or block device seem to work under Windows Vista.  It would be 
nice to have Vista compatible drivers.

--

Comment By: Dor Laor (thekozmo)
Date: 2009-01-08 00:30

Message:
What version did you use for pvnet on Vista? Avi uploaded a new one last
week.
If there is an error, dump, bsod, please provide it.
Currently there is no pv block support for win*. There is work in progress
on this one.

--

Comment By: martinmaurer (martinmaurer)
Date: 2009-01-07 19:36

Message:
there are drivers for vista (network).
the latest release: see
https://sourceforge.net/project/showfiles.php?group_id=180599package_id=267944

(virtio drivers for a block device on windows are not available)

--

Comment By: roy anonymous (roy-anonymous)
Date: 2009-01-07 19:09

Message:
Is there really a block device driver for win2k and winxp??

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] KVM: Update intr delivery func to accept unsigned long* bitmap

2009-01-07 Thread Marcelo Tosatti

Better separate the bitmap patches from this series to ease merging of
the MSI changes.

On Wed, Jan 07, 2009 at 06:42:45PM +0800, Sheng Yang wrote:
 Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is
 increased.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  arch/x86/kvm/lapic.c |8 
  include/linux/kvm_host.h |2 +-
  virt/kvm/ioapic.c|4 ++--
  virt/kvm/ioapic.h|4 ++--
  virt/kvm/irq_comm.c  |6 +++---
  5 files changed, 12 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2493108 ] Win2k problems on some (not all) Intel hosts

2009-01-07 Thread SourceForge.net
Bugs item #2493108, was opened at 2009-01-08 12:53
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2493108group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kevin Shanahan (kmshanah)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win2k problems on some (not all) Intel hosts

Initial Comment:
I have a Windows 2000 guest that I have been testing on an desktop machine 
(E8400 CPU) and that has been working well (apart from being affected by bug 
2314737).

When I moved (not a live migration, just shutdown, rsync and boot on the new 
server) the guest to our server, which is an IBM X3550 with two Xeon 5130 CPUs 
the guest has short freezes where the guest CPU usage spikes on both virtual 
CPUs and the guest becomes unresponsive for several seconds at a time. These 
guest CPU spikes can last over a minute, but the guest might have moments where 
it briefly responds to the delayed keystrokes, etc. every few seconds. One 
symptom of this behaviour for me last night was that Win2k AD replication was 
failing, so it's not just interactive use that suffers.

Something I noticed that may be relevant - due to the other bug (2314737) which 
causes each guest CPU to use 100% of the host CPU, whether the guest CPU is 
idle or not, I can tell when the guest is having this problem by looking at the 
'top' output on the host.

When the guest is operating normally, the host will show the qemu-system-x86_64 
process using 200% CPU (the guest is using -smp 2). However, when the guest is 
misbehaving, the host will show the qemu-system-x86_64 process only using 
_100%_ CPU. Maybe one of the threads is stuck, or something is forcing them to 
share a single core?

Both hosts are running Debian Lenny/Sid, 64-bit with a kernel.org 2.6.28 kernel 
and kvm-82.

The command line used in both cases:

/usr/local/kvm/bin/qemu-system-x86_64 \
-smp 2 \
-localtime -m 2048 \
-drive if=ide,file=kvm-ks-02a.img,index=0,media=disk,boot=on \
-drive if=ide,file=kvm-ks-02b.img,index=2,media=disk \
-net nic,vlan=0,macaddr=52:54:00:12:34:68,model=virtio \
-net tap,vlan=0,ifname=tap18,script=no \
-vnc 127.0.0.1:18 -usbdevice tablet \
-daemonize

CPUs on the good host:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8400  @ 3.00GHz
stepping: 10
cpu MHz : 3000.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx 
est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips: 5984.98
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8400  @ 3.00GHz
stepping: 10
cpu MHz : 3000.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx 
est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips: 5984.97
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

And the bad host:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
stepping: 6
cpu MHz : 1995.117
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant
_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl 

Userspace specific host irq?

2009-01-07 Thread Sheng Yang
Hi all

This piece of code puzzled me at all time:

virt/kvm/kvm_main.c:assigned_device_update_intx()

   if (airq-host_irq)
   adev-host_irq = airq-host_irq;
   else
   adev-host_irq = adev-dev-irq;

I don't know why we can let userspace use a different host_irq rather than the 
real one.

I've queried Amit and Ben-Ami who are the original author about this question, 
and they also think this piece of code is redundancy. Send the question to the 
mailing list, if everyone agree, I would discard this one. 

Thanks!

-- 
regards
Yang, Sheng



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: userspace: change vtd.o to iommu.o in Kbuild

2009-01-07 Thread Huang, Wei W
vtd.c has been renamed to iommu.c, need to change it in Kbuild accordingly.

Signed-off-by: Wei Huang wei.w.hu...@intel.com
---
 kernel/x86/Kbuild |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild
index c4723b1..48339b4 100644
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -10,7 +10,7 @@ ifeq ($(EXT_CONFIG_KVM_TRACE),y)
 kvm-objs += kvm_trace.o
 endif
 ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+kvm-objs += iommu.o
 endif
 kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
 kvm-amd-objs := svm.o ../external-module-compat.o
-- 
1.6.1.rc3


0001-Change-vtd.o-to-iommu.o-in-Kbuild.patch
Description: 0001-Change-vtd.o-to-iommu.o-in-Kbuild.patch