date:20090515

[COMMIT master] Fix rpm top directory

2009-05-15 Thread Avi Kivity

From: Avi Kivity a...@redhat.com

We inherited kvm-userspace's directory structure, which is now wrong.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Makefile b/Makefile
index 1e0420e..a8e8e0b 100644
--- a/Makefile
+++ b/Makefile
@@ -51,7 +51,7 @@ install:
 
 tmpspec = .tmp.kvm-kmod.spec
 
-rpm-topdir := $$(pwd)/../rpmtop
+rpm-topdir := $$(pwd)/rpmtop
 
 RPMDIR = $(rpm-topdir)/RPMS
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: event injection MACROs

2009-05-15 Thread Dong, Eddie

Gleb Natapov wrote:
 On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote:
 Gleb Natapov wrote:
 On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
 Avi Kivity wrote:
 Dong, Eddie wrote:
 OK.
 Also back to Gleb's question, the reason I want to do that is to
 simplify event generation mechanism in current KVM.
 
 Today KVM use additional layer of exception/nmi/interrupt such as
 vcpu.arch.exception.pending, vcpu-arch.interrupt.pending 
 vcpu-arch.nmi_injected. All those additional layer is due to
 compete of VM_ENTRY_INTR_INFO_FIELD
 write to inject the event. Both SVM  VMX has only one resource
 to inject the virtual event but KVM generates 3 catagory of
 events in parallel which further requires additional
 logic to dictate among them.
 
 I thought of using a queue to hold all pending events (in a common
 format), sort it by priority, and inject the head.
 
 The SDM Table 5-4 requires to merge 2 events together, i.e.
 convert to #DF/ Triple fault or inject serially when 2 events
 happens no matter NMI, IRQ or exception. 
 
 As if considering above events merging activity, that is a single
 element queue.
 I don't know how you got to this conclusion from you previous
 statement. See explanation to table 5-2 for instate where it is
 stated that interrupt should be held pending if there is exception
 with higher priority. Should be held pending where? In the queue,
 like we do. Note that low prio exceptions are just dropped since
 they will be regenerated.
 
 I have different understanding here.
 My understanding is that held means NO INTA in HW, i.e. LAPIC
 still hold this IRQ. 
 
 And what if INTA already happened and CPU is ready to fetch IDT for
 interrupt vector and at this very moment CPU faults?

If INTA happens, that means it is delivered. If its delivery triggers another 
exception, that is what Table5-4 handles.

My understanding is that it is 2 stage process. Table 5-2 talk about 
events happening before delivery, so that HW needs to prioritize them. 
Once a decision is make, the highest one is delivered but then it could 
trigger another exception when fetching IDT etc.

Current execption.pending/interrupt.pending/nmi_injected doesn't match 
either of above, interrupt/nmi is only for failed event injection, and a strange
fixed priority check when it is really injected: 
exception  failed NMI  failed IRQ  new NMI  new IRQ.

Table 5-2 looks missed in current KVM IMO except a wrong (but minor)
 exception  NMI  IRQ sequence.

 
 
  We could have either:  1) A pure SW queue that will be flush to
 HW register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW
 register. 
 
 We have three event sources 1) exceptions 2) IRQ 3) NMI. We should
 have queue of three elements sorted by priority. On each entry we
 should 
 
 Table 5-4 alreadys says NMI/IRQ is BENIGN.
 Table 5-2 applies here not table 5-4 I think.
 
 
 inject an event with highest priority. And remove it from queue on
 exit.
 
 The problem is that we have to decide to inject only one of above 3,
 and discard the rest. Whether priority them or merge (to one event
 as Table 5-4) is another story. 
 Only a small number of event are merged into #DF. Most handled
 serially (SDM does not define what serially means unfortunately), so
 I don't understand where discard the rest is come from. We can

vmx_complete_interrupts clear all of them at next EXIT.

Even from HW point of view, if there are pending NMI/IRQ/exception,
CPU pick highest one, NMI, ignore/discard IRQ (but LAPIC still holds 
IRQ, thus it can be re-injected), completely discard exception.

I don't say discarding has any problem, but unnecessary to keep all of 3.
the only difference is when to discard the rest 2, at queue_exception/irq/nmi 
time or later on (even at next EXIT time), which is same to me.

 discard exception since it will be regenerated anyway, but IRQ and
 NMI is another story. SDM says that IRQ should be held pending (once
 again not much explanation here), nothing about NMI.
 
 
 
 A potential benefit is that it can avoid duplicated code and
 potential bugs in current code as following patch shows if I
 understand correctly: 
 
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu
 *vcpu, struct kvm_run *kvm_run) cr2 =
 vmcs_readl(EXIT_QUALIFICATION);
 KVMTRACE_3D(PAGE_FAULT, vcpu,
 error_code, (u32)cr2, (u32)((u64)cr2  32), handler); -
 if (vcpu-arch.interrupt.pending || vcpu-arch.exception.pending )
 + if (vcpu-arch.interrupt.pending ||
 vcpu-arch.exception.pending  ||
 vcpu-arch.nmi_injected) kvm_mmu_unprotect_page_virt(vcpu,
 cr2); return kvm_mmu_page_fault(vcpu, cr2, error_code); }
 This fix is already in Avi's tree (not yet pushed).
 
 Either way are OK and up to you. BTW Xen uses HW register directly
 to representing an pending event.
 
 In this particular case I don't mind to

[PATCH 5/6] Nested SVM: Implement INVLPGA

2009-05-15 Thread Alexander Graf

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 30e6b43..b2c6cf3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+   struct kvm_vcpu *vcpu = svm-vcpu;
+   nsvm_printk(INVLPGA\n);
+   svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
+   skip_emulated_instruction(svm-vcpu);
+
+   kvm_mmu_reset_context(vcpu);
+   kvm_mmu_load(vcpu);
+   return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
   struct kvm_run *kvm_run)
 {
@@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_INVD] = emulate_on_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
-   [SVM_EXIT_INVLPGA]  = invalid_op_interception,
+   [SVM_EXIT_INVLPGA]  = invlpga_interception,
[SVM_EXIT_IOIO] = io_interception,
[SVM_EXIT_MSR]  = msr_interception,
[SVM_EXIT_TASK_SWITCH]  = task_switch_interception,
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] Emulator: Inject #PF when page was not found

2009-05-15 Thread Alexander Graf

If we couldn't find a page on read_emulated, it might be a good
idea to tell the guest about that and inject a #PF.

We do the same already for write faults. I don't know why it was
not implemented for reads.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/x86.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fcde2c..5aa1219 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr,
goto mmio;
 
if (kvm_read_guest_virt(addr, val, bytes, vcpu)
-   == X86EMUL_CONTINUE)
+   == X86EMUL_CONTINUE) {
return X86EMUL_CONTINUE;
-   if (gpa == UNMAPPED_GVA)
+   }
+   if (gpa == UNMAPPED_GVA) {
+   kvm_inject_page_fault(vcpu, addr, 0);
return X86EMUL_PROPAGATE_FAULT;
+   }
 
 mmio:
/*
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] Implement Hyper-V MSRs

2009-05-15 Thread Alexander Graf

Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.

But let's be nice today and have it its way, because otherwise it fails
terribly.

For MSRs where I could find a name I used the name, otherwise they're just
added in their hex form for now.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ef43a18..30e6b43 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1932,6 +1932,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
*data = svm-hsave_msr;
break;
case MSR_VM_CR:
+   case 0x4081:
*data = 0;
break;
case MSR_IA32_UCODE_REV:
@@ -2034,6 +2035,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
case MSR_VM_HSAVE_PA:
svm-hsave_msr = data;
break;
+   case MSR_VM_CR:
+   case MSR_VM_IGNNE:
+   case MSR_K8_HWCR:
+   break;
default:
return kvm_set_msr_common(vcpu, ecx, data);
}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-15 Thread Alexander Graf

Now that we have nested SVM in place, let's make use of it and virtualize
something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the dom0 in virtualized mode already. I haven't been able to run a
second VM within for now though, but maybe I just wasn't patient enough ;-).

Alexander Graf (6):
  Add definition for IGNNE MSR
  MMU: don't bail on PAT bits in PTE
  Emulator: Inject #PF when page was not found
  Implement Hyper-V MSRs
  Nested SVM: Implement INVLPGA
  Nested SVM: Improve interrupt injection

 arch/x86/include/asm/msr-index.h |1 +
 arch/x86/kvm/mmu.c   |2 +-
 arch/x86/kvm/svm.c   |   59 +++--
 arch/x86/kvm/x86.c   |7 +++-
 4 files changed, 50 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Alexander Graf

A 64bit PTE can have bit7 set to 1 which means Use this bit for the PAT.
Currently KVM's MMU code treats this bit as reserved, even though it's not.

As long as we're not required to make use of the PAT bits which is only
required for DMA/MMIO from my understanding, we can safely ignore it.

Hyper-V uses this bit for kernel PTEs.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8fcdae9..cce055a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, 
int level)
context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51) |
rsvd_bits(13, 20);  /* large page */
-   context-rsvd_bits_mask[1][0] = ~0ull;
+   context-rsvd_bits_mask[1][0] = 0ull;
break;
}
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] Nested SVM: Improve interrupt injection

2009-05-15 Thread Alexander Graf

While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.

This patch makes nested SVM's interrupt injection behave more like on a
real machine.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/x86/kvm/svm.c |   40 +---
 1 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b2c6cf3..1d22d46 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, 
void *arg1,
/* Kill any pending exceptions */
if (svm-vcpu.arch.exception.pending == true)
nsvm_printk(WARNING: Pending Exception\n);
-   svm-vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(svm-vcpu);
+   kvm_clear_interrupt_queue(svm-vcpu);
 
/* Restore selected save entries */
svm-vmcb-save.es = hsave-save.es;
@@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void 
*arg1,
svm-nested_vmcb = svm-vmcb-save.rax;
 
/* Clear internal status */
-   svm-vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(svm-vcpu);
+   kvm_clear_interrupt_queue(svm-vcpu);
 
/* Save the old vmcb, so we don't need to pick what we save, but
   can restore everything when a VMEXIT occurs */
@@ -2276,21 +2278,15 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, 
int irq)
((/*control-int_vector  4*/ 0xf)  V_INTR_PRIO_SHIFT);
 }
 
-static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-
-   svm-vmcb-control.event_inj = nr |
-   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
-}
-
 static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   nested_svm_intr(svm);
+   if(!(svm-vcpu.arch.hflags  HF_GIF_MASK))
+   return;
 
-   svm_queue_irq(vcpu, irq);
+   svm-vmcb-control.event_inj = irq |
+   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
 }
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -2318,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
struct vmcb *vmcb = svm-vmcb;
return (vmcb-save.rflags  X86_EFLAGS_IF) 
!(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK) 
-   (svm-vcpu.arch.hflags  HF_GIF_MASK);
+   (svm-vcpu.arch.hflags  HF_GIF_MASK) 
+   !is_nested(svm);
 }
 
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
-   svm_set_vintr(to_svm(vcpu));
-   svm_inject_irq(to_svm(vcpu), 0x0);
+   struct vcpu_svm *svm = to_svm(vcpu);
+   nsvm_printk(Trying to open IRQ window\n);
+
+   nested_svm_intr(svm);
+
+   /* In case GIF=0 we can't rely on the CPU to tell us when
+* GIF becomes 1, because that's a separate STGI/VMRUN intercept.
+* The next time we get that intercept, this function will be
+* called again though and we'll get the vintr intercept. */
+   if (svm-vcpu.arch.hflags  HF_GIF_MASK) {
+   svm_set_vintr(svm);
+   svm_inject_irq(svm, 0x0);
+   }
 }
 
 static void enable_nmi_window(struct kvm_vcpu *vcpu)
@@ -2392,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
   vector, but re-execute and instruction instead */
+   if (is_nested(svm))
+   break;
if (vector == BP_VECTOR || vector == OF_VECTOR)
break;
if (exitintinfo  SVM_EXITINTINFO_VALID_ERR) {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][KVM-AUTOTEST] TAP network support in kvm-autotest

2009-05-15 Thread jason wang


Michael Goldish 写道:

Hi Micheal, thanks for your comments.

Hi Jason,

We already have patches that implement similar functionality here in
TLV, as mentioned in the to-do list (item #4 under 'Framework').
They're not yet committed upstream because they're still quite fresh.
  

OK, I would pay more attention to to-do list.

Still, your patch looks good and is quite similar to mine. The main
difference is that I use MAC/IP address pools specified by the user,
instead of random MACs with arp/nmap to detect the matching IP
addresses.
  
We've considers the use of MAC/IP address pools, but this method need to 
handle the cases of multiple kvm-autotest running on multiple guests. 
The MAC pools should not overlapped when using public bridges.

I will post my patch to the mailing list soon, but it will come
together with quite a few other patches that I haven't posted yet, so
please be patient.

Comments/questions:

Why do you use nmap in addition to arp? In what cases will arp not
suffice? I'm a little put off by the fact that nmap imposes an
additional requirement on the host. Three hosts I've tried don't come
with nmap installed by default.
  
We use nmap to make sure the guest IP could be finally found somehow. 
During our tests, the scripts may fail to get the IP address of guest 
when host iptables is turned on.

Please see additional comments below.

- Jason Wang jasow...@redhat.com wrote:

  

Hi All:
This patch tries to add tap network support in kvm-autotest. Multiple
nics connected to different bridges could be achieved through this
script. Public bridge is important for testing real network traffic
and migration. The patch gives each nic with randomly generated mac
address. The ip address required in the test could be dynamically
probed through nmap/arp. Only the ip address of first NIC is used
through the test.

Example:
nics = nic1 nic2
network = bridge
bridge = switch
ifup =/etc/qemu-ifup-switch
ifdown =/etc/qemu-ifdown-switch

This would make the virtual machine have two nics both of which are
connected to a bridge with the name of 'switch'. Ifup/ifdown scripts
are also specified.

Another Example:
nics = nic1 nic2
network = bridge
bridge = switch
bridge_nic2 = virbr0
ifup =/etc/qemu-ifup-switch
ifup_nic2 = /etc/qemu-ifup-virbr0

This would makes the virtual machine have two nics: nic1 are connected
to bridge 'switch' and nci2 are connected to bridge 'virbr0'.

Public mode and user mode nic could also be mixed:
nics = nic1 nic2
network = bridge
network_nic2 = user

Looking forward for comments and suggestions.

From: jason jasow...@redhat.com
Date: Wed, 13 May 2009 16:15:28 +0800
Subject: [PATCH] Add tap networking support.

---
 client/tests/kvm_runtest_2/kvm_utils.py |7 +++
 client/tests/kvm_runtest_2/kvm_vm.py|   74
++-
 2 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/client/tests/kvm_runtest_2/kvm_utils.py
b/client/tests/kvm_runtest_2/kvm_utils.py
index be8ad95..0d1f7f8 100644
--- a/client/tests/kvm_runtest_2/kvm_utils.py
+++ b/client/tests/kvm_runtest_2/kvm_utils.py
@@ -773,3 +773,10 @@ def md5sum_file(filename, size=None):
 size -= len(data)
 f.close()
 return o.hexdigest()
+
+def random_mac():
+mac=[0x00,0x16,0x30,
+ random.randint(0x00,0x09),
+ random.randint(0x00,0x09),
+ random.randint(0x00,0x09)]
+return ':'.join(map(lambda x: %02x %x,mac))



Random MAC addresses will not necessarily work everywhere, as far as
I know. That's why I prefer user specified MAC/IP address ranges.
  
Yes, maybe we could use user specified mac address prefix or more useful 
algorithm to generate mac address.

diff --git a/client/tests/kvm_runtest_2/kvm_vm.py
b/client/tests/kvm_runtest_2/kvm_vm.py
index fab839f..ea7dab6 100644
--- a/client/tests/kvm_runtest_2/kvm_vm.py
+++ b/client/tests/kvm_runtest_2/kvm_vm.py
@@ -105,6 +105,10 @@ class VM:
 self.qemu_path = qemu_path
 self.image_dir = image_dir
 self.iso_dir = iso_dir
+self.macaddr = []
+for nic_name in kvm_utils.get_sub_dict_names(params,nics):
+macaddr = kvm_utils.random_mac()
+self.macaddr.append(macaddr)

 def verify_process_identity(self):
 Make sure .pid really points to the original qemu
process.
@@ -189,9 +193,25 @@ class VM:
 for nic_name in kvm_utils.get_sub_dict_names(params,
nics):
 nic_params = kvm_utils.get_sub_dict(params, nic_name)
 qemu_cmd +=  -net nic,vlan=%d % vlan
+net = nic_params.get(network)
+if net == bridge:
+qemu_cmd += ,macaddr=%s % self.macaddr[vlan]
 if nic_params.get(nic_model):
 qemu_cmd += ,model=%s % nic_params.get(nic_model)
-qemu_cmd +=  -net user,vlan=%d % vlan
+if net == bridge:
+qemu_cmd +=  -net tap,vlan=%d % vlan
+ifup = nic_params.get(ifup)
+if ifup:
+

Re: kvm-autotest: The automation plans?

2009-05-15 Thread jason wang


Michael Goldish 写道:

- sudhir kumar smalik...@gmail.com wrote:

  

On Thu, May 14, 2009 at 12:22 PM, jason wang jasow...@redhat.com
wrote:


sudhir kumar 写道:
  

Hi Uri/Lucas,

Do you have any plans for enhancing kvm-autotest?
I was looking mainly on the following 2 aspects:

(1).
we have standalone migration only. Is there any plans of enhancing
kvm-autotest so that we can trigger migration while a workload is
running?
Something like this:
Start a workload(may be n instances of it).
let the test execute for some time.
Trigger migration.
Log into the target.
Check if the migration is succesful
Check if the test results are consistent.



We have some patches of ping pong migration and workload adding.
  

The


migration is based on public bridge and workload adding is based on
  

running


benchmark in the background of guest.
  

Cool. I would like to have look on them. So how do you manage the
background process/thread?

Yes, we would try to sent it here as soon as possible. The background 
workload could be added through various methods. We could an simple 
algorithm as follows:


run_migration2():
pid = run_autotest_background(test,params,env,dbench,control.60)

Do ping-pong migration ...

wait_autoteset_background(pid)

run_autotest_background() would fork a subprocess to run function 
run_autotest() and catch its exception.
wait_autotest_background(pid) would wait until the background benchmark 
complete and analyse the result through the return value of the subprocess.
The child process could work well depends the fact that the ssh 
connection should alive during migration.

I believe this could be also achieved through job.parallel()

(2).
How can we run N parallel instances of a test? Will the current
configuration  be easily able to support it?

Please provide your thoughts on the above features.




The parallelized instances could be easily achieved through
  

job.parallel()


of autotest framework, and that is what we have used in our tests.
  

We have


make some helper routines such as get_free_port to be reentrant
  

through file


lock.
We've implemented following test cases: timedrift(already sent
  

here),


savevm/loadvm, suspend/resume, jumboframe, migration between two
  

machines


and others. We will sent it here for review in the following weeks.
There are some other things could be improved:
1) Current kvm_test.cfg.sample/kvm_test.cfg is transparent to
  

autotest


server UI. This would make it hard to configure the tests in the
  

server


side. During our test, we have merged it into control and make it
  

could be


configured by editing control file function of autotest server
  

side web


UI.
  

Not much clue here. But I would like to keep the control file as
simple as possible and as much independent of test scenarios as
possible. kvm_tests.cfg should be the right file untill and unless it
is impossible to do by using it.


2) Public bridge support: I've sent a patch(TAP network support in
kvm-autotest), this patch needs external DHCP server and requires
  

nmap


support. I don't know whether the method of original
  

kvm_runtes_old(DHCP


server of private bridge) is preferable.
  

The old approach is better. All might not be able to run an external
DHCP server for running the test. I do not see any issue with the old
approach.



We're taking more of a minimalist approach in kvm_runtest_2: the
framework should handle only the things directly related to testing.
Configuring and running a DHCP server is and should be beyond the scope
of the KVM-Autotest framework. To emulate the old behavior, you can just
start the DHCP server yourself locally. If you wish, maybe we can
bundle example scripts with the framework that will do this for the user,
but they should not be an integral part of the framework in my opinion.

  


--
Sudhir Kumar
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Alexander Graf

When using nested SVM we usually want the guest to see the exact CPUID values
we gave it and not some mangled ones.

Hyper-V for example doesn't even start when the hypervisor present bit is set.

Signed-off-by: Alexander Graf ag...@suse.de
---
 target-i386/helper.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 24fcea8..5f56698 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1496,7 +1496,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  * isn't supported in compatibility mode on Intel.  so advertise the
  * actuall cpu, and say goodbye to migration between different vendors
  * is you use compatibility mode. */
-if (kvm_enabled())
+if (kvm_enabled()  !kvm_nested)
 host_cpuid(0, 0, NULL, ebx, ecx, edx);
 break;
 case 1:
@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = env-cpuid_features;
 
 /* Hypervisor present bit required for Microsoft SVVP */
-if (kvm_enabled())
+if (kvm_enabled()  !kvm_nested)
 *ecx |= (1  31);
 break;
 case 2:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm build error with latest commit

2009-05-15 Thread Avi Kivity


Xu, Jiajun wrote:

Hi all,
Latest kvm can not build with 2.6.30-rc4 kernel. Could anyone help on the issue?

Error as following:

make[1]: Leaving directory 
`/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm'
  


The external module is now build using the kvm-kmod repository:

 http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=summary

If you clone it, and use the commands 'git submodule init; git submodule 
update' is will create a linux-2.6 directory.  Afterwards all you need 
is to pull from both repositories, and make sync and make rpm will work 
as usual.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-15 Thread Alexander Graf



On 15.05.2009, at 10:22, Alexander Graf wrote:

Now that we have nested SVM in place, let's make use of it and  
virtualize

something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the dom0 in virtualized mode already. I haven't been able to run a
second VM within for now though, but maybe I just wasn't patient  
enough ;-).


In order to find out why things were slow with nested SVM I hacked  
intercept reporting into debugfs in my local tree and found pretty  
interesting results (using NPT):


 SVM_EXIT_CLGI  3888080   0
 SVM_EXIT_CPUID3460   0
 SVM_EXIT_CR0_SEL_WRI 0   0
 SVM_EXIT_ERR 0   0
 SVM_EXIT_FERR_FREEZE 0   0
 SVM_EXIT_GDTR_READ   0   0
 SVM_EXIT_GDTR_WRITE  0   0
 SVM_EXIT_HLT 40186   0
 SVM_EXIT_ICEBP   0   0
 SVM_EXIT_IDTR_READ   0   0
 SVM_EXIT_IDTR_WRITE  0   0
 SVM_EXIT_INIT0   0
 SVM_EXIT_INTR   193173   0
 SVM_EXIT_INVD0   0
 SVM_EXIT_INVLPG  1   0
 SVM_EXIT_INVLPGA536994   0
 SVM_EXIT_IOIO  3450484   0
 SVM_EXIT_IRET0   0
 SVM_EXIT_LDTR_READ   0   0
 SVM_EXIT_LDTR_WRITE  0   0
 SVM_EXIT_MONITOR 0   0
 SVM_EXIT_MSR124614   0
 SVM_EXIT_MWAIT   0   0
 SVM_EXIT_MWAIT_COND  0   0
 SVM_EXIT_NMI 0   0
 SVM_EXIT_NPF   1040416   0
 SVM_EXIT_PAUSE   0   0
 SVM_EXIT_POPF0   0
 SVM_EXIT_PUSHF   0   0
 SVM_EXIT_RDPMC   0   0
 SVM_EXIT_RDTSC   0   0
 SVM_EXIT_RDTSCP  0   0
 SVM_EXIT_RSM 0   0
 SVM_EXIT_SHUTDOWN0   0
 SVM_EXIT_SKINIT  0   0
 SVM_EXIT_SMI20   0
 SVM_EXIT_STGI  3888080   0
 SVM_EXIT_SWINT   0   0
 SVM_EXIT_TASK_SWITCH 0   0
 SVM_EXIT_TR_READ 0   0
 SVM_EXIT_TR_WRITE0   0
 SVM_EXIT_VINTR  402865   0
 SVM_EXIT_VMLOAD3888096   0
 SVM_EXIT_VMMCALL767288   0
 SVM_EXIT_VMRUN 3888096   0
 SVM_EXIT_VMSAVE3888096   0
 SVM_EXIT_WBINVD 64   0


So apparently the most intercepts come from the SVM helper calls  
(clgi, stgi, vmload, vmsave). I guess I need to get back to the  
emulate when GIF=0 approach to get things fast.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Status of pci passthrough work?

2009-05-15 Thread Amit Shah

Hello,

On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote:
 Amit,
 I trying to use PVDMA. I've downloaded a kernel snapshot from the 
 your kvm git, but I couldn't download a snapshot or the repo from your 
 kvm-userspace tree. I tried to launch the VM using kvm-85 user space but it 
 hangs before loading it. Should it work with kvm-85 user space? Do you have 
 the userspace patches for PVDMA?

The pvdma userspace patches are at

http://git.kernel.org/?p=linux/kernel/git/amit/kvm-userspace.git;a=shortlog;h=pvdma

(look for the branch 'pvdma' in the tree).

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Avi Kivity


Alexander Graf wrote:

When using nested SVM we usually want the guest to see the exact CPUID values
we gave it and not some mangled ones.
  


That would triggered by -cpu host, not nesting.


@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = env-cpuid_features;
 
 /* Hypervisor present bit required for Microsoft SVVP */

-if (kvm_enabled())
+if (kvm_enabled()  !kvm_nested)
 *ecx |= (1  31);
 break;
  


-cpu host,-hypervisor

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Alexander Graf



On 15.05.2009, at 13:09, Avi Kivity wrote:


Alexander Graf wrote:
When using nested SVM we usually want the guest to see the exact  
CPUID values

we gave it and not some mangled ones.



That would triggered by -cpu host, not nesting.


Oh we have -cpu host already? If so, we don't need that hackery of  
course :-)


@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t  
index, uint32_t count,

*edx = env-cpuid_features;
 /* Hypervisor present bit required for Microsoft SVVP */
-if (kvm_enabled())
+if (kvm_enabled()  !kvm_nested)
*ecx |= (1  31);
break;



-cpu host,-hypervisor


hm - treating the hypervisor bit like any other cpuid bit sounds like  
a good idea. I'm wondering though which way should be preferred. I  
usually don't want to have the hypervisor bit set - but maybe I'm the  
minority.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Avi Kivity


Alexander Graf wrote:
When using nested SVM we usually want the guest to see the exact 
CPUID values

we gave it and not some mangled ones.



That would triggered by -cpu host, not nesting.


Oh we have -cpu host already?


No, we don't :)

hm - treating the hypervisor bit like any other cpuid bit sounds like 
a good idea. I'm wondering though which way should be preferred. I 
usually don't want to have the hypervisor bit set - but maybe I'm the 
minority.




Windows requires the hypervisor bit to set in order to pass some testing 
program.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Set bit 1 in disabled processor's _STA

2009-05-15 Thread Glauber Costa

This patch sets bits 1 in disabled processor's _STA.
According to the ACPI spec, this bit means:
 Set if the device is enabled and decoding its resources.

Without it, Windows 2008 device manager shows the processors
as malfunctioning hardware.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 kvm/bios/acpi-dsdt.dsl |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index c756fed..c53816c 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -56,7 +56,7 @@ DefinitionBlock (
 }   \
 Method (_STA) { \
 If (CRST(nr)) { Return(0xF) }   \
-Else { Return(0x9) }\
+Else { Return(0xB) }\
 }   \
 }   \
 
-- 
1.5.6.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Status of pci passthrough work?

2009-05-15 Thread Passera, Pablo R

Hi Amit,
Thanks for your answer. I was able to get your userspace pvdma version. 
So now, I am using the PVDMA patched kernel and the PVDMA patches userspace. 
However, I am not able to start the VM. I am running qemu with the following 
options (I am trying without any pci passthrough first)

./qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/kvm/dm2.img -m 256 -net none

The SDL windows appear but it hangs after showing the message Press F12 for 
boot menu.. I am not getting any message neither in qemu nor in dmesg. Do you 
know what could be happening? May be a kernel compile option? It would be great 
if you can send me the .config file that you used to compile it, just to check 
the options.

Thanks,
Pablo

-Original Message-
From: Amit Shah [mailto:amit.s...@redhat.com]
Sent: Friday, May 15, 2009 8:00 AM
To: Passera, Pablo R
Cc: kvm@vger.kernel.org
Subject: Re: Status of pci passthrough work?

Hello,

On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote:
 Amit,
 I trying to use PVDMA. I've downloaded a kernel snapshot from
the your kvm git, but I couldn't download a snapshot or the repo from
your kvm-userspace tree. I tried to launch the VM using kvm-85 user
space but it hangs before loading it. Should it work with kvm-85 user
space? Do you have the userspace patches for PVDMA?

The pvdma userspace patches are at

http://git.kernel.org/?p=linux/kernel/git/amit/kvm-
userspace.git;a=shortlog;h=pvdma

(look for the branch 'pvdma' in the tree).

   Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -tip] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h

2009-05-15 Thread Jaswinder Singh Rajput

Hello Avi,

On Thu, 2009-05-14 at 11:57 +0530, Jaswinder Singh Rajput wrote:
 Use standard msr-index.h's MSR declaration.
 
 MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
 80 column issue.
 
 Signed-off-by: Jaswinder Singh Rajput jaswinderraj...@gmail.com
 ---

If this patch looks sane to you can apply in kvm tree.

Here is the updated patch based on kvm tree:

[PATCH] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of 
msr-index.h

Use standard msr-index.h's MSR declaration.

MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.

Signed-off-by: Jaswinder Singh Rajput jaswinderraj...@gmail.com
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/svm.c  |4 ++--
 arch/x86/kvm/vmx.c  |4 ++--
 arch/x86/kvm/x86.c  |5 ++---
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 716a4ec..5c72897 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -753,8 +753,6 @@ static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 
error_code)
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
 }
 
-#define MSR_IA32_TIME_STAMP_COUNTER0x010
-
 #define TSS_IOPB_BASE_OFFSET 0x66
 #define TSS_BASE_SIZE 0x68
 #define TSS_IOPB_SIZE (65536 / 8)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 71510e0..dd667dd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1953,7 +1953,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
struct vcpu_svm *svm = to_svm(vcpu);
 
switch (ecx) {
-   case MSR_IA32_TIME_STAMP_COUNTER: {
+   case MSR_IA32_TSC: {
u64 tsc;
 
rdtscll(tsc);
@@ -2043,7 +2043,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
struct vcpu_svm *svm = to_svm(vcpu);
 
switch (ecx) {
-   case MSR_IA32_TIME_STAMP_COUNTER: {
+   case MSR_IA32_TSC: {
u64 tsc;
 
rdtscll(tsc);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe2ce2b..98e6915 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -931,7 +931,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_index, pdata);
 #endif
-   case MSR_IA32_TIME_STAMP_COUNTER:
+   case MSR_IA32_TSC:
data = guest_read_tsc();
break;
case MSR_IA32_SYSENTER_CS:
@@ -991,7 +991,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
case MSR_IA32_SYSENTER_ESP:
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
-   case MSR_IA32_TIME_STAMP_COUNTER:
+   case MSR_IA32_TSC:
rdtscll(host_tsc);
guest_write_tsc(data, host_tsc);
break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 44e87a5..4150edb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -462,7 +462,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+   MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
 };
 
@@ -640,8 +640,7 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
 
/* Keep irq disabled to prevent changes to the clock */
local_irq_save(flags);
-   kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
- vcpu-hv_clock.tsc_timestamp);
+   kvm_get_msr(v, MSR_IA32_TSC, vcpu-hv_clock.tsc_timestamp);
ktime_get_ts(ts);
local_irq_restore(flags);
 
-- 
1.6.1.1



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -tip] x86: kvm/x86.c use MSR names in place of address

2009-05-15 Thread Jaswinder Singh Rajput

On Thu, 2009-05-14 at 11:00 +0530, Jaswinder Singh Rajput wrote:

 Here is the patch:
 
 [PATCH -tip] x86: kvm/x86.c use MSR names in place of address
 
 Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR.
 
 Signed-off-by: Jaswinder Singh Rajput jaswinderraj...@gmail.com
 ---

This patch can also apply to kvm tree without any changes.

Thanks,

--
JSR

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 12:53:42PM +0200, Alexander Graf wrote:

 On 15.05.2009, at 12:25, Michael S. Tsirkin wrote:

 On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote:
 A 64bit PTE can have bit7 set to 1 which means Use this bit for the 
 PAT.
 Currently KVM's MMU code treats this bit as reserved, even though  
 it's not.

 As long as we're not required to make use of the PAT bits which is  
 only
 required for DMA/MMIO from my understanding, we can safely ignore it.

 Hyper-V uses this bit for kernel PTEs.

 Signed-off-by: Alexander Graf ag...@suse.de
 ---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 8fcdae9..cce055a 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct  
 kvm_vcpu *vcpu, int level)
 context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
 rsvd_bits(maxphyaddr, 51) |
 rsvd_bits(13, 20);  /* large page */
 -   context-rsvd_bits_mask[1][0] = ~0ull;
 +   context-rsvd_bits_mask[1][0] = 0ull;
 break;
 }
 }

 Just to make sure I understand what this does: if guest sets bit7,  
 will
 bit7 get set in shadow PTEs as well?

 I don't see any code that interprets bit7, so the shadow PTE should be  
 completely unaffected.

 But to be sure I asked Jörg to take a look at it as well, as he's more  
 familiar with the x86 SPT code than I am :-).

The PAT bit is not propagated into the shadow page tables. Anyway, the
problem is fixed the wrong way in this patch. The real problem is that a
4kb pte is checked with mask considered for large pages (which do not
exist on walker level 0). The attached patch fixes it the better way
imho.

From 7530aef3ed580b70a74224f8c04857754501c496 Mon Sep 17 00:00:00 2001
From: Joerg Roedel joerg.roe...@amd.com
Date: Fri, 15 May 2009 15:14:19 +0200
Subject: [PATCH] kvm/mmu: fix reserved bit checking on 4kb pte level

The reserved bits checking code looks at bit 7 of the pte to determine
if it has to use the mask for a large pte or a normal pde. This does not
work on 4kb pte level because bit 7 is used there for PAT. Account this
in the checking function.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/mmu.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 479e748..8d9552e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2124,9 +2124,11 @@ static void paging_free(struct kvm_vcpu *vcpu)
 
 static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
 {
-   int bit7;
+   int bit7 = 0;
+
+   if (level != PT_PAGE_TABLE_LEVEL)
+   bit7 = (gpte  7)  1;
 
-   bit7 = (gpte  7)  1;
return (gpte  vcpu-arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
-- 
1.6.2.4


-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] Emulator: Inject #PF when page was not found

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 10:22:17AM +0200, Alexander Graf wrote:
 If we couldn't find a page on read_emulated, it might be a good
 idea to tell the guest about that and inject a #PF.
 
 We do the same already for write faults. I don't know why it was
 not implemented for reads.

Have you checked that the emulator will never ever do speculative reads?
This may be the reason why the fault was not injected here.

 
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  arch/x86/kvm/x86.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 5fcde2c..5aa1219 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr,
   goto mmio;
  
   if (kvm_read_guest_virt(addr, val, bytes, vcpu)
 - == X86EMUL_CONTINUE)
 + == X86EMUL_CONTINUE) {
   return X86EMUL_CONTINUE;
 - if (gpa == UNMAPPED_GVA)
 + }
 + if (gpa == UNMAPPED_GVA) {
 + kvm_inject_page_fault(vcpu, addr, 0);
   return X86EMUL_PROPAGATE_FAULT;
 + }
  
  mmio:
   /*
 -- 
 1.6.0.2
 
 

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] Nested SVM: Implement INVLPGA

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 10:22:19AM +0200, Alexander Graf wrote:
 SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
 so let's implement it!
 
 For now we just do the same thing invlpg does, as asid switching
 means we flush the mmu anyways. That might change one day though.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  arch/x86/kvm/svm.c |   14 +-
  1 files changed, 13 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 30e6b43..b2c6cf3 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, 
 struct kvm_run *kvm_run)
   return 1;
  }
  
 +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run 
 *kvm_run)
 +{
 + struct kvm_vcpu *vcpu = svm-vcpu;
 + nsvm_printk(INVLPGA\n);
 + svm-next_rip = kvm_rip_read(svm-vcpu) + 3;
 + skip_emulated_instruction(svm-vcpu);
 +
 + kvm_mmu_reset_context(vcpu);
 + kvm_mmu_load(vcpu);
 + return 1;
 +}
 +

Hmm, since we flush the TLB on every nested-guest entry I think we can
make this function a nop.

  static int invalid_op_interception(struct vcpu_svm *svm,
  struct kvm_run *kvm_run)
  {
 @@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
   [SVM_EXIT_INVD] = emulate_on_interception,
   [SVM_EXIT_HLT]  = halt_interception,
   [SVM_EXIT_INVLPG]   = invlpg_interception,
 - [SVM_EXIT_INVLPGA]  = invalid_op_interception,
 + [SVM_EXIT_INVLPGA]  = invlpga_interception,
   [SVM_EXIT_IOIO] = io_interception,
   [SVM_EXIT_MSR]  = msr_interception,
   [SVM_EXIT_TASK_SWITCH]  = task_switch_interception,
 -- 
 1.6.0.2
 
 

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP smp using a lot of CPU

2009-05-15 Thread Ross Boylan

On Fri, 2009-05-15 at 11:56 -0300, Marcelo Tosatti wrote:
 Ross,
 
 Can you confirm the qemu process CPU consumption is down to acceptable
 levels if you dont specify -no-acpi?
 
 Thanks
Simply starting without -no-acpi did not help.  I tried to do a Windows
XP repair, but seemed to end up nasically doing a reinstall.  The system
now seems to be hung up.

I'm probably going to end up trying a fresh install; I'll report more
results when I have them.
 
 
 On Thu, May 14, 2009 at 01:01:11PM -0700, Ross Boylan wrote:
  On Wed, 2009-05-13 at 09:56 +0300, Avi Kivity wrote:
   Ross Boylan wrote:
I just installed XP into a new VM, specifying -smp 2 for the machine.
According to top, it's using nearly 200% of a cpu even when I'm not
doing anything.
   
Is this real CPU useage, or just a reporting problem (just as my disk
image is big according to ls, but isn't really)?
   
If it's real, is there anything I can do about it?
   
kvm 0.7.2 on Debian Lenny (but 2.6.29 kernel), amd64.  Xeon chips; 32
bit version of XP pro installed, now fully patched (including the
Windows Genuine Advantage stuff, though I cancelled it when it wanted to
run).  
   
Task manager in XP shows virtually no CPU useage.
   
Please cc me on responses.
   
  
   
   I'm guessing Windows uses a pio port to sleep, which kvm doesn't 
   support.  Can you provide kvm_stat output?
  markov:~# kvm_stat -1
  efer_reload0 0
  exits9921384   566
  fpu_reload267970 0
  halt_exits 1 0
  halt_wakeup3 0
  host_state_reload402605017
  hypercalls 0 0
  insn_emulation   1329455 0
  insn_emulation_fail  154 0
  invlpg176773 0
  io_exits 3818270 0
  irq_exits1434046   566
  irq_injections326730 0
  irq_window164827 0
  largepages 0 0
  mmio_exits 35892 0
  mmu_cache_miss 29760 0
  mmu_flooded19908 0
  mmu_pde_zapped 15557 0
  mmu_pte_updated82088 0
  mmu_pte_write  97990 0
  mmu_recycled   0 0
  mmu_shadow_zapped  43276 0
  mmu_unsync   891 0
  mmu_unsync_global  0 0
  nmi_injections 0 0
  nmi_window 0 0
  pf_fixed 1231164 0
  pf_guest  276083 0
  remote_tlb_flush  115606 0
  request_irq0 0
  request_nmi0 0
  signal_exits   5 0
  tlb_flush 960198 0
  
  This is with the VM displaying the XP It is now safe to turn off your
  computer.  CPU remains about 200% from kvm.  Invoked with
  sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
  -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
  -std-vga -hda XP.raw \
  -boot c \
  -soundhw es1370 -localtime -no-acpi  -m 1G -smp 2
  
  Next I'll trying fiddling with acpi.
  
  -- 
  Ross Boylan  wk:  (415) 514-8146
  185 Berry St #5700   r...@biostat.ucsf.edu
  Dept of Epidemiology and Biostatistics   fax: (415) 514-8150
  University of California, San Francisco
  San Francisco, CA 94107-1739 hm:  (415) 550-1062
  
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kmod: Update .gitignore

2009-05-15 Thread Jan Kiszka

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 .gitignore |  118 +++-
 1 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/.gitignore b/.gitignore
index 22a8200..bdebd0a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,64 +3,66 @@
 *~
 *.flat
 *.a
-config.mak
 .*.cmd
-qemu/config-host.h
-qemu/config-host.mak
-user/test/bootstrap
-user/kvmctl
-qemu/dyngen
-qemu/x86_64-softmmu
-qemu/qemu-img
-qemu/qemu-nbd
 *.ko
 *.mod.c
-bios/*.bin
-bios/*.sym
-bios/*.txt
-bios/acpi-dsdt.aml
-vgabios/*.bin
-vgabios/*.txt
-extboot/extboot.bin
-extboot/extboot.img
-extboot/signrom
-kernel/config.kbuild
-kernel/modules.order
-kernel/Module.symvers
-kernel/Modules.symvers
-kernel/Module.markers
-kernel/.tmp_versions
-kernel/include-compat/asm
-kernel/include-compat/asm-x86/asm-x86
-kernel/include
-kernel/x86/modules.order
-kernel/x86/i825[49].[ch]
-kernel/x86/kvm_main.c
-kernel/x86/kvm_svm.h
-kernel/x86/vmx.[ch]
-kernel/x86/svm.[ch]
-kernel/x86/mmu.[ch]
-kernel/x86/paging_tmpl.h
-kernel/x86/x86_emulate.[ch]
-kernel/x86/ioapic.[ch]
-kernel/x86/iodev.h
-kernel/x86/irq.[ch]
-kernel/x86/kvm_trace.c
-kernel/x86/lapic.[ch]
-kernel/x86/tss.h
-kernel/x86/x86.[ch]
-kernel/x86/coalesced_mmio.[ch]
-kernel/x86/kvm_cache_regs.h
-kernel/x86/vtd.c
-kernel/x86/irq_comm.c
-kernel/x86/timer.c
-kernel/x86/kvm_timer.h
-kernel/x86/iommu.c
-qemu/pc-bios/extboot.bin
-qemu/qemu-doc.html
-qemu/*.[18]
-qemu/*.pod
-qemu/qemu-tech.html
-qemu/qemu-options.texi
-user/kvmtrace
-user/test/x86/bootstrap
+config.kbuild
+config.mak
+modules.order
+Module.symvers
+Modules.symvers
+Module.markers
+.tmp_versions
+include-compat/asm
+include-compat/asm-x86/asm-x86
+include
+x86/modules.order
+x86/i825[49].[ch]
+x86/kvm_main.c
+x86/kvm_svm.h
+x86/vmx.[ch]
+x86/svm.[ch]
+x86/mmu.[ch]
+x86/paging_tmpl.h
+x86/x86_emulate.[ch]
+x86/ioapic.[ch]
+x86/iodev.h
+x86/irq.[ch]
+x86/kvm_trace.c
+x86/lapic.[ch]
+x86/tss.h
+x86/x86.[ch]
+x86/coalesced_mmio.[ch]
+x86/kvm_cache_regs.h
+x86/vtd.c
+x86/irq_comm.c
+x86/timer.c
+x86/kvm_timer.h
+x86/iommu.c
+ia64/asm-offsets.c
+ia64/coalesced_mmio.[ch]
+ia64/ioapic.[ch]
+ia64/iodev.h
+ia64/iommu.c
+ia64/irq.h
+ia64/irq_comm.c
+ia64/kvm-ia64.c
+ia64/kvm_fw.c
+ia64/kvm_lib.c
+ia64/kvm_main.c
+ia64/kvm_minstate.h
+ia64/kvm_trace.c
+ia64/lapic.h
+ia64/memcpy.S
+ia64/memset.S
+ia64/misc.h
+ia64/mmio.c
+ia64/optvfault.S
+ia64/process.c
+ia64/trampoline.S
+ia64/vcpu.[ch]
+ia64/vmm.c
+ia64/vmm_ivt.S
+ia64/vti.h
+ia64/vtlb.c
+.stgit-*
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-15 Thread Marcelo Tosatti

Beth,

On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote:
 Anthony Liguori wrote:
 Vincent Minet wrote:
 External ACPI tables are counted twice for the RSDT size and the load
 address for the first external table is in the MADT (interrupt override
 entries are overwritten).

 Signed-off-by: Vincent Minet vinc...@vincent-minet.net
   

 Beth,

 I think you had a patch attempting to address the same issue.  It was  
 a bit more involved though.

 Which is the proper fix and are they both to the same problem?
 They are for 2 different bases. My patch was for qemu's bochs bios and  
 this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in  
 this area of setting up the ACPI tables. My patch is still needed for  
 the qemu base. I hope we'll be getting to one base soon :-)

 Assuming the intent of the code was for MAX_RSDT_ENTRIES to include  
 external_tables, this patch looks correct. I think one additional check  
 would be needed (in my patch) to make sure that the code doesn't exceed  
 MAX_RSDT_ENTRIES when the external tables are being loaded.

 My patch also puts all the code that calculates madt_size in the same  
 place, at the beginning of the table layout. I believe this is neater  
 and will avoid problems like this one in the future. As much as  
 possible, I think it best to get all the tables layed out, then fill  
 them in. If for some reason this is not acceptable, we need to add a big  
 note that no tables should be layed out after the madt because the madt  
 may grow further down in the code and overwrite the other table.

I like this better too, see questions/comments below.


 Regards,

 Anthony Liguori

 ---
  kvm/bios/rombios32.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)

 diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
 index cbd5f15..289361b 100755
 --- a/kvm/bios/rombios32.c
 +++ b/kvm/bios/rombios32.c
 @@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
  addr = base_addr = ram_size - ACPI_DATA_SIZE;
  rsdt_addr = addr;
  rsdt = (void *)(addr);
 -rsdt_size = sizeof(*rsdt) + external_tables * 4;
 +rsdt_size = sizeof(*rsdt);
  addr += rsdt_size;
   fadt_addr = addr;
 @@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
  }
  int_override++;
  madt_size += sizeof(struct madt_int_override);
 +addr += sizeof(struct madt_int_override);
  }
  acpi_build_table_header((struct acpi_table_header *)madt,
  APIC, madt_size, 1);
   


 diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
 index cbd5f15..23835b6 100755
 --- a/kvm/bios/rombios32.c
 +++ b/kvm/bios/rombios32.c
 @@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
  addr = base_addr = ram_size - ACPI_DATA_SIZE;
  rsdt_addr = addr;
  rsdt = (void *)(addr);
 -rsdt_size = sizeof(*rsdt) + external_tables * 4;
 +rsdt_size = sizeof(*rsdt);
  addr += rsdt_size;
  
  fadt_addr = addr;
 @@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
  
  addr = (addr + 7)  ~7;
  madt_addr = addr;
 +madt = (void *)(addr);
  madt_size = sizeof(*madt) +
  sizeof(struct madt_processor_apic) * MAX_CPUS +
  #ifdef BX_QEMU
 @@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
  #else
  sizeof(struct madt_io_apic);
  #endif
 -madt = (void *)(addr);
 +for ( i = 0; i  16; i++ ) {
 +if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
 +madt_size += sizeof(struct madt_int_override);
 +}
 +}
  addr += madt_size;

This bug could only affect the HPET descriptor right? 

  #ifdef BX_QEMU
 @@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
  continue;
  }
  int_override++;
 -madt_size += sizeof(struct madt_int_override);
  }
  acpi_build_table_header((struct acpi_table_header *)madt,
  APIC, madt_size, 1);
 @@ -1868,17 +1872,6 @@ void acpi_bios_init(void)
  acpi_build_table_header((struct  acpi_table_header *)hpet,
   HPET, sizeof(*hpet), 1);
  #endif
 -
 -acpi_additional_tables(); /* resets cfg to required entry */
 -for(i = 0; i  external_tables; i++) {
 -uint16_t len;
 -if(acpi_load_table(i, addr, len)  0)
 -BX_PANIC(Failed to load ACPI table from QEMU\n);
 -rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
 -addr += len;
 -if(addr = ram_size)
 -BX_PANIC(ACPI table overflow\n);
 -}

The external ACPI tables fix(es) are logically separate from the MADT
intoverride size calculation, and so they could be separate patches?

  #endif
  
  /* RSDT */
 @@ -1891,6 +1884,16 @@ void acpi_bios_init(void)
  //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
  if (nb_numa_nodes  0)
  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 +acpi_additional_tables(); /*

[PATCH v2] kmod: Add distclean rule

2009-05-15 Thread Jan Kiszka

Jan Kiszka wrote:
 --- a/Makefile
 +++ b/Makefile
 @@ -68,3 +68,6 @@ rpm:all
  
  clean:
   $(MAKE) -C $(KERNELDIR) M=`pwd` $@
 +
 +distclean:
 + rm -f config.kbuild config.mak

This one is cleaner:

-

Remove the configure output config.kbuild and config.mak via distclean.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 Makefile |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dad5f0b..a4c59c9 100644
--- a/Makefile
+++ b/Makefile
@@ -68,3 +68,6 @@ rpm:  all
 
 clean:
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
+
+distclean: clean
+   rm -f config.kbuild config.mak
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 0/4] iosignalfd

2009-05-15 Thread Gregory Haskins

[

Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here:

http://lkml.org/lkml/2009/5/14/258

]

This is v2 of the series.  For more details, please see the header to
patch 4/4.

[
   Changelog:

  v2:
   *) added optional data-matching capability (via cookie field)
   *) changed name from iofd to iosignalfd
   *) added io_bus unregister function
   *) implemented deassign feature

  v1:
   *) original release (integrated into irqfd v7 series as iofd)
]

---

Gregory Haskins (4):
  kvm: add iosignalfd support
  kvm: add io_bus unregister function
  kvm: add return value to kvm_io_bus_register_dev
  eventfd: export eventfd interfaces for module use


 arch/x86/kvm/i8254.c  |7 +-
 arch/x86/kvm/i8259.c  |5 +
 fs/eventfd.c  |3 +
 include/linux/kvm.h   |   15 
 include/linux/kvm_host.h  |   10 ++-
 virt/kvm/coalesced_mmio.c |4 +
 virt/kvm/eventfd.c|  154 +
 virt/kvm/ioapic.c |4 +
 virt/kvm/kvm_main.c   |   62 --
 9 files changed, 249 insertions(+), 15 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 1/4] eventfd: export eventfd interfaces for module use

2009-05-15 Thread Gregory Haskins

We want to use eventfd from KVM which can be compiled as a module, so
export the interfaces.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 fs/eventfd.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..3f0e197 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include linux/anon_inodes.h
 #include linux/eventfd.h
 #include linux/syscalls.h
+#include linux/module.h
 
 struct eventfd_ctx {
wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 2/4] kvm: add return value to kvm_io_bus_register_dev

2009-05-15 Thread Gregory Haskins

Today this function returns void and will internally BUG_ON if it fails.
We want to create dynamic MMIO/PIO entries driven from userspace later in
the series, so enhance this API to return an error code on failure.

We also fix up all the callsites to check the return code and BUG_ON if
it fails.

The net result should be identical behavior both before and after this
patch.  We are simply laying the groundwork for the dynamic usage

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/i8254.c  |7 +--
 arch/x86/kvm/i8259.c  |5 -
 include/linux/kvm_host.h  |4 ++--
 virt/kvm/coalesced_mmio.c |4 +++-
 virt/kvm/ioapic.c |4 +++-
 virt/kvm/kvm_main.c   |7 +--
 6 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4d6f0d2..cc274d6 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -564,6 +564,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm)
 {
struct kvm_pit *pit;
struct kvm_kpit_state *pit_state;
+   int ret;
 
pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
if (!pit)
@@ -584,13 +585,15 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm)
pit-dev.write = pit_ioport_write;
pit-dev.in_range = pit_in_range;
pit-dev.private = pit;
-   kvm_io_bus_register_dev(kvm-pio_bus, pit-dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus, pit-dev);
+   BUG_ON(ret  0);
 
pit-speaker_dev.read = speaker_ioport_read;
pit-speaker_dev.write = speaker_ioport_write;
pit-speaker_dev.in_range = speaker_in_range;
pit-speaker_dev.private = pit;
-   kvm_io_bus_register_dev(kvm-pio_bus, pit-speaker_dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus, pit-speaker_dev);
+   BUG_ON(ret  0);
 
kvm-arch.vpit = pit;
pit-kvm = kvm;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..7d39b5b 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -519,6 +519,8 @@ static void pic_irq_request(void *opaque, int level)
 struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 {
struct kvm_pic *s;
+   int ret;
+
s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL);
if (!s)
return NULL;
@@ -538,6 +540,7 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
s-dev.write = picdev_write;
s-dev.in_range = picdev_in_range;
s-dev.private = s;
-   kvm_io_bus_register_dev(kvm-pio_bus, s-dev);
+   ret = kvm_io_bus_register_dev(kvm-pio_bus, s-dev);
+   BUG_ON(ret  0);
return s;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dc91610..94c1a11 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -61,8 +61,8 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
 void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
  gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
-struct kvm_io_device *dev);
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5ae620d..19945e1 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -86,6 +86,7 @@ static void coalesced_mmio_destructor(struct kvm_io_device 
*this)
 int kvm_coalesced_mmio_init(struct kvm *kvm)
 {
struct kvm_coalesced_mmio_dev *dev;
+   int ret;
 
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
@@ -96,7 +97,8 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
dev-dev.private  = dev;
dev-kvm = kvm;
kvm-coalesced_mmio_dev = dev;
-   kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev);
+   ret = kvm_io_bus_register_dev(kvm-mmio_bus, dev-dev);
+   BUG_ON(ret  0);
 
return 0;
 }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 1eddae9..3eee4c9 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -317,6 +317,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
 int kvm_ioapic_init(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic;
+   int ret;
 
ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL);
if (!ioapic)
@@ -328,7 +329,8 @@ int kvm_ioapic_init(struct kvm *kvm)
ioapic-dev.in_range = ioapic_in_range;
ioapic-dev.private = ioapic;
ioapic-kvm = kvm;
-   kvm_io_bus_register_dev(kvm-mmio_bus, ioapic-dev);
+   ret = kvm_io_bus_register_dev(kvm-mmio_bus, ioapic-dev);
+   BUG_ON(ret  0);
return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b2db766..60ba0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2461,11 +2461,14 @@ struct kvm_io_device

[KVM PATCH v2 3/4] kvm: add io_bus unregister function

2009-05-15 Thread Gregory Haskins

We want to support the notion of dynamic MMIO/PIO registrations and
therefore will need to support both register as well as unregister.

However, the current io_bus code is structured as a linear array and
is not conducive to unregistering, so refactor to allow holes in the
array.  We then enhance the API with an unregister function.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm_host.h |4 +++-
 virt/kvm/kvm_main.c  |   48 ++
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 94c1a11..214089f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
  * in one place.
  */
 struct kvm_io_bus {
-   int   dev_count;
+   spinlock_t lock;
 #define NR_IOBUS_DEVS 6
struct kvm_io_device *devs[NR_IOBUS_DEVS];
 };
@@ -63,6 +63,8 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus 
*bus,
  gpa_t addr, int len, int is_write);
 int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60ba0cf..5f5e443 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2433,16 +2433,18 @@ static struct notifier_block kvm_reboot_notifier = {
 void kvm_io_bus_init(struct kvm_io_bus *bus)
 {
memset(bus, 0, sizeof(*bus));
+   spin_lock_init(bus-lock);
 }
 
 void kvm_io_bus_destroy(struct kvm_io_bus *bus)
 {
int i;
 
-   for (i = 0; i  bus-dev_count; i++) {
+   for (i = 0; i  NR_IOBUS_DEVS; i++) {
struct kvm_io_device *pos = bus-devs[i];
 
-   kvm_iodevice_destructor(pos);
+   if (pos)
+   kvm_iodevice_destructor(pos);
}
 }
 
@@ -2451,10 +2453,10 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct 
kvm_io_bus *bus,
 {
int i;
 
-   for (i = 0; i  bus-dev_count; i++) {
+   for (i = 0; i  NR_IOBUS_DEVS; i++) {
struct kvm_io_device *pos = bus-devs[i];
 
-   if (pos-in_range(pos, addr, len, is_write))
+   if (pos  pos-in_range(pos, addr, len, is_write))
return pos;
}
 
@@ -2463,12 +2465,42 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct 
kvm_io_bus *bus,
 
 int kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
 {
-   if (bus-dev_count  (NR_IOBUS_DEVS-1))
-   return -ENOSPC;
+   int i;
 
-   bus-devs[bus-dev_count++] = dev;
+   spin_lock(bus-lock);
 
-   return 0;
+   for (i = 0; i  NR_IOBUS_DEVS; i++) {
+   if (bus-devs[i])
+   continue;
+
+   bus-devs[i] = dev;
+   spin_unlock(bus-lock);
+   return 0;
+   }
+
+   spin_unlock(bus-lock);
+
+   return -ENOSPC;
+}
+
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, struct kvm_io_device 
*dev)
+{
+   int i;
+
+   spin_lock(bus-lock);
+
+   for (i = 0; i  NR_IOBUS_DEVS; i++) {
+
+   if (bus-devs[i] == dev) {
+   bus-devs[i] = NULL;
+   spin_unlock(bus-lock);
+   return 0;
+   }
+   }
+
+   spin_unlock(bus-lock);
+
+   return -ENOENT;
 }
 
 static struct notifier_block kvm_cpu_notifier = {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 4/4] kvm: add iosignalfd support

2009-05-15 Thread Gregory Haskins

iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any arbitrary
IO address with a corresponding eventfd and then pass the eventfd to a
specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
heavy-weight exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called doorbell.  This
module has a function called doorbell_ring() which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via iosignalfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:   11 iops, 9.09us rtt
iosignalfd-mmio: 200100 iops, 5.00us rtt
iosignalfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:  153139 iops, 6.53us rtt
iosignalfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.



Signed-off-by: Gregory Haskins ghask...@novell.com
---

 include/linux/kvm.h  |   15 
 include/linux/kvm_host.h |2 +
 virt/kvm/eventfd.c   |  154 ++
 virt/kvm/kvm_main.c  |   13 
 4 files changed, 184 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index a1ecc6a..9372b12 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -292,6 +292,19 @@ struct kvm_guest_debug {
struct kvm_guest_debug_arch arch;
 };
 
+#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1  0)
+#define KVM_IOSIGNALFD_FLAG_PIO   (1  1)
+#define KVM_IOSIGNALFD_FLAG_COOKIE(1  2)
+
+struct kvm_iosignalfd {
+   __u64 cookie;
+   __u64 addr;
+   __u32 len;
+   __u32 fd;
+   __u32 flags;
+   __u8  pad[12];
+};
+
 #define KVM_TRC_SHIFT   16
 /*
  * kvm trace categories
@@ -416,6 +429,7 @@ struct kvm_trace_rec {
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
 #define KVM_CAP_IRQFD 31
+#define KVM_CAP_IOSIGNALFD 32
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -509,6 +523,7 @@ struct kvm_irqfd {
_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
+#define KVM_IOSIGNALFD _IOW(KVMIO, 0x77, struct kvm_iosignalfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 214089f..4e4b174 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -137,6 +137,7 @@ struct kvm {
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
struct list_head irqfds;
+   struct list_head iosignalfds;
struct kvm_vm_stat stat;
struct kvm_arch arch;
atomic_t users_count;
@@ -530,5 +531,6 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi,

Re: [KVM PATCH v2 0/4] iosignalfd

2009-05-15 Thread Gregory Haskins

Gregory Haskins wrote:
 [

 Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here:

 http://lkml.org/lkml/2009/5/14/258
   

I should also mention: NOT FOR INCLUSION

I am still testing this code, so this is an rfc for now.
 ]

 This is v2 of the series.  For more details, please see the header to
 patch 4/4.

 [
Changelog:

   v2:
*) added optional data-matching capability (via cookie field)
*) changed name from iofd to iosignalfd
*) added io_bus unregister function
*) implemented deassign feature

   v1:
*) original release (integrated into irqfd v7 series as iofd)
 ]

 ---

 Gregory Haskins (4):
   kvm: add iosignalfd support
   kvm: add io_bus unregister function
   kvm: add return value to kvm_io_bus_register_dev
   eventfd: export eventfd interfaces for module use


  arch/x86/kvm/i8254.c  |7 +-
  arch/x86/kvm/i8259.c  |5 +
  fs/eventfd.c  |3 +
  include/linux/kvm.h   |   15 
  include/linux/kvm_host.h  |   10 ++-
  virt/kvm/coalesced_mmio.c |4 +
  virt/kvm/eventfd.c|  154 
 +
  virt/kvm/ioapic.c |4 +
  virt/kvm/kvm_main.c   |   62 --
  9 files changed, 249 insertions(+), 15 deletions(-)

   




signature.asc
Description: OpenPGP digital signature

[PATCH v2] qemu-kvm: add iosignalfd support

2009-05-15 Thread Gregory Haskins

An iosignalfd allows an eventfd to attach to a specific PIO/MMIO region in the
guest.  Any guest-writes to that region will trigger an eventfd signal.

For more details, see the kernel side patches submitted here:

http://lkml.org/lkml/2009/5/15/303

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 kvm/libkvm/libkvm.c |   68 +++
 kvm/libkvm/libkvm.h |   39 +
 2 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c
index ccab985..dc3414f 100644
--- a/kvm/libkvm/libkvm.c
+++ b/kvm/libkvm/libkvm.c
@@ -1501,3 +1501,71 @@ int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int 
gsi, int flags)
 }
 
 #endif /* KVM_CAP_IRQFD */
+
+#ifdef KVM_CAP_IOSIGNALFD
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags)
+{
+   int r;
+   int type = flags  IOSIGNALFD_FLAG_PIO; 
+   struct kvm_iosignalfd data = {
+   .cookie = cookie,
+   .addr   = addr,
+   .len= len,
+   .fd = fd,
+   .flags  = type ? KVM_IOSIGNALFD_FLAG_PIO : 0,
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags)
+{
+   int r;
+   int type = flags  IOSIGNALFD_FLAG_PIO; 
+   int cvalid = flags  IOSIGNALFD_FLAG_COOKIE;
+   struct kvm_iosignalfd data = {
+   .cookie  = cookie,
+   .addr= addr,
+   .flags   = KVM_IOSIGNALFD_FLAG_DEASSIGN |
+   (type ? KVM_IOSIGNALFD_FLAG_PIO : 0) |
+   (cvalid ? KVM_IOSIGNALFD_FLAG_COOKIE : 0),
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm-vm_fd, KVM_IOSIGNALFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+#else /* KVM_CAP_IOSIGNALFD */
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags)
+{
+   return -ENOENT;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags)
+{
+   return -ENOENT;
+}
+
+#endif /* KVM_CAP_IOSIGNALFD */
+
+
+
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index 3ccbe3d..ea81e55 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -882,6 +882,45 @@ int kvm_create_irqfd(kvm_context_t kvm, int gsi, int 
flags);
  */
 int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int gsi, int flags);
 
+enum {
+   iosignalfd_option_pio,
+   iosignalfd_option_cookie,
+};
+
+#define IOSIGNALFD_FLAG_PIO(1  iosignalfd_option_pio)
+#define IOSIGNALFD_FLAG_COOKIE (1  iosignalfd_option_cookie)
+
+/*!
+ * \brief Assign an eventfd to an IO port (PIO or MMIO)
+ *
+ * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
+ * address range.  Any guest writes to the specified range will generate
+ * an eventfd signal.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param cookie A user-assigned cookie for optional use in deassign
+ * \param addr The IO address
+ * \param len The length of the IO region at the address
+ * \param fd The eventfd file-descriptor
+ * \param flags FLAG_PIO: PIO, else MMIO
+ */
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags);
+
+/*!
+ * \brief Deassign an iosignalfd from a previously registered IO port
+ *
+ * Deassigns an iosignalfd previously registered with kvm_assign_iosignalfd()
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param cookie The cookie to (optionally) match (must specifcy FLAG_COOKIE)
+ * \param addr The IO address to deassign
+ * \param flags FLAG_PIO: PIO, else MMIO, FLAG_COOKIE: cookie is valid  
+ */
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags);
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(kvm_context_t kvm,
   struct kvm_assigned_msix_nr *msix_nr);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] kmod: Add distclean rule

2009-05-15 Thread Jan Kiszka

The smaller the patch... sigh.



Remove the configure output config.kbuild, config.mak and arch links via
distclean.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 Makefile |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dad5f0b..cef121d 100644
--- a/Makefile
+++ b/Makefile
@@ -68,3 +68,6 @@ rpm:  all
 
 clean:
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
+
+distclean: clean
+   rm -f config.kbuild config.mak include/asm include-compat/asm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: convert KVMTRACE to event traces

2009-05-15 Thread Christoph Hellwig

On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote:
 + trace_kvm_cr_write(cr, val);
   switch (cr) {
   case 0:
 - kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
 + kvm_set_cr0(vcpu, val);
   skip_emulated_instruction(vcpu);

Do we really need one trace point covering all cr writes, _and_ one for
each specific register?

   if (!npt_enabled)
 - KVMTRACE_3D(PAGE_FAULT, svm-vcpu, error_code,
 - (u32)fault_address, (u32)(fault_address  32),
 - handler);
 + trace_kvm_page_fault(fault_address, error_code);
   else
 - KVMTRACE_3D(TDP_FAULT, svm-vcpu, error_code,
 - (u32)fault_address, (u32)(fault_address  32),
 - handler);
 + trace_kvm_tdp_page_fault(fault_address, error_code);

Again this seems a bit cumbersome.  Why not just one tracepoint for
page faults, with a flag if we're using npt or not?

 +ifeq ($(CONFIG_TRACEPOINTS),y)
 +trace-objs = kvm-traces.o
 +arch-trace-objs = kvm-traces-arch.o
 +endif
 +
  EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
  
  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
 - i8254.o
 + i8254.o $(trace-objs)
  obj-$(CONFIG_KVM) += kvm.o
 -kvm-intel-objs = vmx.o
 +kvm-intel-objs = vmx.o $(arch-trace-objs)
  obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 -kvm-amd-objs = svm.o
 +kvm-amd-objs = svm.o $(arch-trace-objs)
  obj-$(CONFIG_KVM_AMD) += kvm-amd.o

The option to select even tracing bits is CONFIG_EVENT_TRACING and the
makefile syntax used here (both the original makefile and the additions)
is rather awkward.

A proper arch/x86/kvm/Makefile including tracing bits should look like
the following:

-- snip --
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm

kvm-y   += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
   coalesced_mmio.o irq_comm.o)
kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
kvm-$(CONFIG_IOMMU_API) += $(addprefix ../../../virt/kvm/, iommu.o)
kmv-y   += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
   i8254.o

kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o
kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o

kvm-intel-y += vmx.o $(kvm-arch-trace-y)
kvm-amd-y   += svm.o $(kvm-arch-trace-y)

obj-$(CONFIG_KVM)   += kvm.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
obj-$(CONFIG_KVM_AMD)   += kvm-amd.o
-- snip --

and do we actually still need kvm_trace.o after this?

Anyway, I'll send the upstream part of the makefile cleanup out ASAP,
then you can rebase later.

 Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
 ===
 --- /dev/null
 +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
 @@ -0,0 +1,5 @@
 +#include linux/sched.h
 +
 +
 +#define CREATE_TRACE_POINTS
 +#include trace/events/kvm/x86.h

Can't we just put this into some other common .c file?  That would also
reduce the amount of makefile magic required.

 Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
 ===
 --- /dev/null
 +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
 @@ -0,0 +1,5 @@
 +#include linux/sched.h
 +
 +
 +#define CREATE_TRACE_POINTS
 +#include trace/events/kvm/x86-arch.h

Same for this one, especially as the makefile hackery required for this
one is even worse..

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 resend 0/6] ATS capability support for Intel IOMMU

2009-05-15 Thread Jesse Barnes

On Thu, 14 May 2009 10:32:05 +0800
Yu Zhao yu.z...@intel.com wrote:

 This patch series implements Address Translation Service support for
 the Intel IOMMU. The PCIe Endpoint that supports ATS capability can
 request the DMA address translation from the IOMMU and cache the
 translation itself. This can alleviate IOMMU TLB pressure and improve
 the hardware performance in the I/O virtualization environment.
 
 The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The
 spec can be found at: http://www.pcisig.com/specifications/iov/ats/
 (it requires membership).

These ones can go through David's tree.  You can add my:
Acked-by: Jesse Barnes jbar...@virtuousgeek.org

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: convert KVMTRACE to event traces

2009-05-15 Thread Marcelo Tosatti

On Fri, May 15, 2009 at 01:10:34PM -0400, Christoph Hellwig wrote:
 On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote:
  +   trace_kvm_cr_write(cr, val);
  switch (cr) {
  case 0:
  -   kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
  +   kvm_set_cr0(vcpu, val);
  skip_emulated_instruction(vcpu);
 
 Do we really need one trace point covering all cr writes, _and_ one for
 each specific register?

There is one tracepoint named kvm_cr that covers cr reads and writes.

kvm_trace_cr_read/kvm_trace_cr_write are macros that expand to
kvm_trace_cr(rw=1 or rw=0). Perhaps that is not a very good idea.

 
  if (!npt_enabled)
  -   KVMTRACE_3D(PAGE_FAULT, svm-vcpu, error_code,
  -   (u32)fault_address, (u32)(fault_address  32),
  -   handler);
  +   trace_kvm_page_fault(fault_address, error_code);
  else
  -   KVMTRACE_3D(TDP_FAULT, svm-vcpu, error_code,
  -   (u32)fault_address, (u32)(fault_address  32),
  -   handler);
  +   trace_kvm_tdp_page_fault(fault_address, error_code);
 
 Again this seems a bit cumbersome.  Why not just one tracepoint for
 page faults, with a flag if we're using npt or not?

Issue is the meaning of these faults is different. With npt disabled the
fault is a guest fault (like a normal pagefault), but with npt enabled
the fault indicates the host pagetables the hardware uses to do the
translation are not set up correctly.

I did unify them as you suggest but reverted back to separate
tracepoints because the unification might be confusing.

Can be unified later if desirable.

  +ifeq ($(CONFIG_TRACEPOINTS),y)
  +trace-objs = kvm-traces.o
  +arch-trace-objs = kvm-traces-arch.o
  +endif
  +
   EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
   
   kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o 
  \
  -   i8254.o
  +   i8254.o $(trace-objs)
   obj-$(CONFIG_KVM) += kvm.o
  -kvm-intel-objs = vmx.o
  +kvm-intel-objs = vmx.o $(arch-trace-objs)
   obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
  -kvm-amd-objs = svm.o
  +kvm-amd-objs = svm.o $(arch-trace-objs)
   obj-$(CONFIG_KVM_AMD) += kvm-amd.o
 
 The option to select even tracing bits is CONFIG_EVENT_TRACING and the
 makefile syntax used here (both the original makefile and the additions)
 is rather awkward.
 
 A proper arch/x86/kvm/Makefile including tracing bits should look like
 the following:
 
 -- snip --
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
  coalesced_mmio.o irq_comm.o)
 kvm-$(CONFIG_KVM_TRACE)   += $(addprefix ../../../virt/kvm/, kvm_trace.o)
 kvm-$(CONFIG_IOMMU_API)   += $(addprefix ../../../virt/kvm/, iommu.o)
 kmv-y += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
  i8254.o
 
 kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o
 kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o
 
 kvm-intel-y   += vmx.o $(kvm-arch-trace-y)
 kvm-amd-y += svm.o $(kvm-arch-trace-y)
 
 obj-$(CONFIG_KVM) += kvm.o
 obj-$(CONFIG_KVM_INTEL)   += kvm-intel.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
 -- snip --
 
 and do we actually still need kvm_trace.o after this?

Your version looks much nicer. kvm_trace.o can disappear as soon as 
this is in Avi's tree and a decent replacement for user/kvm_trace.c 
is in qemu-kvm.git.

 Anyway, I'll send the upstream part of the makefile cleanup out ASAP,
 then you can rebase later.

OK.

 
  Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
  ===
  --- /dev/null
  +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
  @@ -0,0 +1,5 @@
  +#include linux/sched.h
  +
  +
  +#define CREATE_TRACE_POINTS
  +#include trace/events/kvm/x86.h
 
 Can't we just put this into some other common .c file?  That would also
 reduce the amount of makefile magic required.
 
  Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
  ===
  --- /dev/null
  +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
  @@ -0,0 +1,5 @@
  +#include linux/sched.h
  +
  +
  +#define CREATE_TRACE_POINTS
  +#include trace/events/kvm/x86-arch.h
 
 Same for this one, especially as the makefile hackery required for this
 one is even worse..

Probably for both. Now that you say I can't explain the reason for the
separate C files. Will put this up in a git tree in a couple of hours.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-15 Thread Beth Kon


Marcelo Tosatti wrote:

Beth,

On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote:
  

Anthony Liguori wrote:


Vincent Minet wrote:
  

External ACPI tables are counted twice for the RSDT size and the load
address for the first external table is in the MADT (interrupt override
entries are overwritten).

Signed-off-by: Vincent Minet vinc...@vincent-minet.net
  


Beth,

I think you had a patch attempting to address the same issue.  It was  
a bit more involved though.


Which is the proper fix and are they both to the same problem?
  
They are for 2 different bases. My patch was for qemu's bochs bios and  
this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in  
this area of setting up the ACPI tables. My patch is still needed for  
the qemu base. I hope we'll be getting to one base soon :-)


Assuming the intent of the code was for MAX_RSDT_ENTRIES to include  
external_tables, this patch looks correct. I think one additional check  
would be needed (in my patch) to make sure that the code doesn't exceed  
MAX_RSDT_ENTRIES when the external tables are being loaded.


My patch also puts all the code that calculates madt_size in the same  
place, at the beginning of the table layout. I believe this is neater  
and will avoid problems like this one in the future. As much as  
possible, I think it best to get all the tables layed out, then fill  
them in. If for some reason this is not acceptable, we need to add a big  
note that no tables should be layed out after the madt because the madt  
may grow further down in the code and overwrite the other table.



I like this better too, see questions/comments below.

  

Regards,

Anthony Liguori

  

---
 kvm/bios/rombios32.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..289361b 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
  fadt_addr = addr;
@@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
 }
 int_override++;
 madt_size += sizeof(struct madt_int_override);
+addr += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
  




  

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..23835b6 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;

@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;

 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i  16; i++ ) {
+if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;



This bug could only affect the HPET descriptor right? 
  
I'm not sure what you're asking. There were 2 bugs that Vincent pointed 
out. The first caused an incorrect rsdt_size to be reported, and the 
second (missing addr += sizeof(struct madt_int_override)) caused 
corruption of whatever came after the MADT. But even if his patch were 
applied, any future code that added a table and manipulated addr between 
the following points:


...
(about line 1676)
madt = (void *)(addr);
addr += madt_size;
...
(about line 1789)
madt_size += sizeof(struct madt_int_override);
addr += sizeof(struct madt_int_override);

would have wound up causing some kind of corruption, as happened with 
the HPET. Also the memset(madt, 0, madt_size) around line 1740 was not 
using the complete madt_size.


So this seems undesirable, and that's why I suggested moving all addr 
manipulation (with the exception of additional tables at the very end) 
to the same section of the table layout code. Seems best to manage 
madt_size all in one place.


  

 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
@@ -1868,17 +1872,6 @@ void acpi_bios_init(void)

Re: XP smp using a lot of CPU [SOLVED]

2009-05-15 Thread Ross Boylan

Using ACPI fixes the problem; CPU useage is now quite low.  Start line
was
sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
-net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
-boot d -cdrom /usr/local/backup/XPProSP3.iso \
-std-vga -hda /dev/turtle/XP00 \
-soundhw es1370 -localtime -m 1G -smp 2
I switched to -boot c later.

I ended up doing a fresh install; my repair got mucked up and I got the
message The requested lookup key was not found in any active activation
context when I entered a location into MSIE, including when I tried to
run Windows Update.  Googling showed this might indicate some permission
or file corruption issues.  They may have happened during my earlier
(virtual) system hang.

My experience suggests a theory: if you use SMP with XP (i.e., more than
1 virtual processor) you should enable acpi, i.e., not say -no-acpi.  It
this is true, the advice to run windows with -no-acpi should probably be
updated.  It's possible single CPU systems are affected as well.

Ross

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] qemu-kvm: Make PC speaker emulation aware of in-kernel PIT

2009-05-15 Thread Marcelo Tosatti

On Thu, May 14, 2009 at 10:43:05PM +0200, Jan Kiszka wrote:
 When using the in-kernel PIT the speaker emulation has to synchronize
 the PIT state with KVM. Enhance the existing speaker sound device and
 allow it to take over port 0x61 by using KVM_CREATE_PIT2 where
 available. This unbreaks -soundhw pcspk in KVM mode.
 
 Changes in v4:
  - preserve full PIT state across read-modify-write
  - update kvm.h
 
 Changes in v3:
  - re-added incorrectly dropped kvm_enabled checks
 
 Changes in v2:
  - rebased over qemu-kvm and KVM_CREATE_PIT2
  - refactored hooks in pcspk
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Jan,

You always attempt to use KVM_CREATE_PIT2, so say on migration if the
destination does not support the new ioctl you fallback to in-kernel
dummy naturally. Seems the right thing to do.

Would be nice to avoid sprinkling KVM details inside hw/pcspk.c though
but that is another problem.

Looks good (and v3 kernel patch).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP smp using a lot of CPU [SOLVED]

2009-05-15 Thread Brian Jackson



On May 15, 2009, at 3:24 PM, Ross Boylan wrote:


Using ACPI fixes the problem; CPU useage is now quite low.  Start line
was
sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
   -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
   -boot d -cdrom /usr/local/backup/XPProSP3.iso \
   -std-vga -hda /dev/turtle/XP00 \
   -soundhw es1370 -localtime -m 1G -smp 2
I switched to -boot c later.

I ended up doing a fresh install; my repair got mucked up and I got  
the
message The requested lookup key was not found in any active  
activation
context when I entered a location into MSIE, including when I tried  
to
run Windows Update.  Googling showed this might indicate some  
permission

or file corruption issues.  They may have happened during my earlier
(virtual) system hang.

My experience suggests a theory: if you use SMP with XP (i.e., more  
than
1 virtual processor) you should enable acpi, i.e., not say -no- 
acpi.  It
this is true, the advice to run windows with -no-acpi should  
probably be

updated.  It's possible single CPU systems are affected as well.



I removed the note about -no-acpi from the howto on the wiki. I don't  
think that's been true for a long time.


--Iggy





Ross



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon


This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  HPET, sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr = ram_size) || (nb_rsdt_entries  MAX_RSDT_ENTRIES)) 
+BX_PANIC(ACPI table overflow\n);
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Kumar, Venkat

Hi Cam, I have gone through you latest shared memory patch.
I have a few questions and comments.

Comment:-
+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+ram_size += ivshmem_get_size();
+}
+

In your initial patch this part of the patch is

+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+phys_ram_size += ivshmem_get_size();
+}

I think the phys_ram_size += ivshmem_get_size(); is correct.

Question:-
You are giving the desired virtual address for mmaping the shared memory object 
as s-ivshmem_ptr which is phys_ram_base + s-ivshmem_offset. This desired 
virtual address is nothing but the base virtual address of the memory that you 
are allocating after incrementing phys_ram_size. So now s-ivshmem_ptr would 
point to a new set of memory, which is the shared memory region instead of 
memory allocated through qemu_alloc_physram, which means if pages are allocated 
for sh-ivshmem_ptr virtual address range then those pages can never be 
addressed again. Correct me if my understanding is wrong.

Thx,

Venkat


-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Cam Macdonell
Sent: Thursday, May 07, 2009 9:47 PM
To: kvm@vger.kernel.org
Cc: Cam Macdonell
Subject: [PATCH v2] Shared memory device with interrupt support

Support an inter-vm shared memory device that maps a shared-memory object 
as a PCI device in the guest.  This patch also supports interrupts between 
guest by communicating over a unix domain socket.  This patch applies to the 
qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes messages to 
trigger interrupts.  Writes are trigger by writing to the Doorbell register 
on the shared memory PCI device.  The lower 8-bits of the value written to this 
register are sent as the 1-byte message so different meanings of interrupts can 
be supported.

Interrupts are only supported between 2 VMs currently.  One VM must act as the 
server by adding server to the command-line argument.  Shared memory devices 
are created with the following command-line:

-ivhshmem shm object,size in MB,[unix:path][,server]

Interrupts can also be used between host and guest as well by implementing a 
listener on the host.

Cam

---
 Makefile.target |3 +
 hw/ivshmem.c|  421 +++
 hw/pc.c |6 +
 hw/pc.h |3 +
 qemu-options.hx |   14 ++
 sysemu.h|8 +
 vl.c|   14 ++
 7 files changed, 469 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
 OBJS += rtl8139.o
 OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
 # Generic watchdog support and some watchdog devices
 OBJS += watchdog.o
 OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include hw.h
+#include console.h
+#include pc.h
+#include pci.h
+#include sysemu.h
+
+#include qemu-common.h
+#include sys/mman.h
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf(IVSHMEM:  fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+
+typedef struct PCI_IVShmemState {
+PCIDevice dev;
+IVShmemState ivshmem_state;
+} PCI_IVShmemState;
+
+typedef struct IVShmemDesc {
+char name[1024];
+char * chrdev;
+int size;
+} IVShmemDesc;
+
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 16,
+Doorbell = 32
+};
+
+static int num_ivshmem_devices = 0;
+static IVShmemDesc ivshmem_desc;
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+uint32_t addr, uint32_t size, int type)
+{
+PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev;
+IVShmemState *s = d-ivshmem_state;
+
+IVSHMEM_DPRINTF(addr = %u size =

Re: Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon


Beth Kon wrote:

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within

MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

  
This should have been patch 2/2. I think git-send-email didn't like that 
I didn't have a space after Subject: . Let me try to resend with the 
space added.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Clean up RSDT Table Creation

2009-05-15 Thread Beth Kon

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  HPET, sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr = ram_size) || (nb_rsdt_entries  MAX_RSDT_ENTRIES)) 
+BX_PANIC(ACPI table overflow\n);
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Cam Macdonell



On 15-May-09, at 8:45 PM, Kumar, Venkat wrote:


Hi Cam, I have gone through you latest shared memory patch.
I have a few questions and comments.

Comment:-
+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+ram_size += ivshmem_get_size();
+}
+

In your initial patch this part of the patch is

+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+phys_ram_size += ivshmem_get_size();
+}

I think the phys_ram_size += ivshmem_get_size(); is correct.


Hi Venkat,

Not with the newer qemu that qemu-kvm uses.   The newer patch is for  
qemu-kvm, not kvm-userspace.  There is no longer a variable named  
phys_ram_size in pc.c in qemu-kvm.




Question:-
You are giving the desired virtual address for mmaping the shared  
memory object as s-ivshmem_ptr which is phys_ram_base + s- 
ivshmem_offset. This desired virtual address is nothing but the  
base virtual address of the memory that you are allocating after  
incrementing phys_ram_size. So now s-ivshmem_ptr would point to a  
new set of memory, which is the shared memory region instead of  
memory allocated through qemu_alloc_physram, which means if pages  
are allocated for sh-ivshmem_ptr virtual address range then those  
pages can never be addressed again. Correct me if my understanding  
is wrong.


I don't think so.  With the mmap call, I specify MAP_FIXED which  
requires that the memory in the shared memory object be mapped to the  
address given in the first parameter (s-ivshmem_ptr).  If MAP_FIXED  
is not specified then mmap would allocate the memory and map on to it,  
but with MAP_FIXED it maps onto the already reserved space that  
ivshmem_ptr points to and was allocated with qemu_ram_alloc().


I hope that answers your question,

Cam



-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]  
On Behalf Of Cam Macdonell

Sent: Thursday, May 07, 2009 9:47 PM
To: kvm@vger.kernel.org
Cc: Cam Macdonell
Subject: [PATCH v2] Shared memory device with interrupt support

   Support an inter-vm shared memory device that maps a shared- 
memory object as a PCI device in the guest.  This patch also  
supports interrupts between guest by communicating over a unix  
domain socket.  This patch applies to the qemu-kvm repository.


This device now creates a qemu character device and sends 1-bytes  
messages to trigger interrupts.  Writes are trigger by writing to  
the Doorbell register on the shared memory PCI device.  The lower  
8-bits of the value written to this register are sent as the 1-byte  
message so different meanings of interrupts can be supported.


Interrupts are only supported between 2 VMs currently.  One VM must  
act as the server by adding server to the command-line argument.   
Shared memory devices are created with the following command-line:


-ivhshmem shm object,size in MB,[unix:path][,server]

Interrupts can also be used between host and guest as well by  
implementing a listener on the host.


Cam

---
Makefile.target |3 +
hw/ivshmem.c|  421 ++ 
+

hw/pc.c |6 +
hw/pc.h |3 +
qemu-options.hx |   14 ++
sysemu.h|8 +
vl.c|   14 ++
7 files changed, 469 insertions(+), 0 deletions(-)
create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
OBJS += rtl8139.o
OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
# Generic watchdog support and some watchdog devices
OBJS += watchdog.o
OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include hw.h
+#include console.h
+#include pc.h
+#include pci.h
+#include sysemu.h
+
+#include qemu-common.h
+#include sys/mman.h
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf(IVSHMEM:  fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+

Re: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Cam Macdonell



On 15-May-09, at 8:54 PM, Kumar, Venkat wrote:


Cam,

A questions on interrupts as well.
What is unix:path that needs to be passed in the argument list?
Can it be any string?


It has to be a valid path on the host.  It will create a unix domain  
socket on that path.




If my understanding is correct both the VM's who wants to  
communicate would gives this path in the command line with one of  
them specifying as server.


Exactly, the one with the server in the parameter list will wait for  
a connection before booting.


Cam



Thx,
Venkat






   Support an inter-vm shared memory device that maps a shared- 
memory object
as a PCI device in the guest.  This patch also supports interrupts  
between
guest by communicating over a unix domain socket.  This patch  
applies to the

qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes  
messages to
trigger interrupts.  Writes are trigger by writing to the Doorbell  
register
on the shared memory PCI device.  The lower 8-bits of the value  
written to this
register are sent as the 1-byte message so different meanings of  
interrupts can

be supported.

Interrupts are only supported between 2 VMs currently.  One VM must  
act as the
server by adding server to the command-line argument.  Shared  
memory devices

are created with the following command-line:

-ivhshmem shm object,size in MB,[unix:path][,server]

Interrupts can also be used between host and guest as well by  
implementing a

listener on the host.

Cam

---
Makefile.target |3 +
hw/ivshmem.c|  421 ++ 
+

hw/pc.c |6 +
hw/pc.h |3 +
qemu-options.hx |   14 ++
sysemu.h|8 +
vl.c|   14 ++
7 files changed, 469 insertions(+), 0 deletions(-)
create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
OBJS += rtl8139.o
OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
# Generic watchdog support and some watchdog devices
OBJS += watchdog.o
OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell c...@cs.ualberta.ca
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include hw.h
+#include console.h
+#include pc.h
+#include pci.h
+#include sysemu.h
+
+#include qemu-common.h
+#include sys/mman.h
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf(IVSHMEM:  fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+
+typedef struct PCI_IVShmemState {
+PCIDevice dev;
+IVShmemState ivshmem_state;
+} PCI_IVShmemState;
+
+typedef struct IVShmemDesc {
+char name[1024];
+char * chrdev;
+int size;
+} IVShmemDesc;
+
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 16,
+Doorbell = 32
+};
+
+static int num_ivshmem_devices = 0;
+static IVShmemDesc ivshmem_desc;
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+uint32_t addr, uint32_t size, int type)
+{
+PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev;
+IVShmemState *s = d-ivshmem_state;
+
+IVSHMEM_DPRINTF(addr = %u size = %u\n, addr, size);
+cpu_register_physical_memory(addr, s-ivshmem_size, s- 
ivshmem_offset);

+
+}
+
+void ivshmem_init(const char * optarg) {
+
+char * temp;
+char * ivshmem_sz;
+int size;
+
+num_ivshmem_devices++;
+
+/* currently we only support 1 device */
+if (num_ivshmem_devices  MAX_IVSHMEM_DEVICES) {
+return;
+}
+
+temp = strdup(optarg);
+snprintf(ivshmem_desc.name, 1024, /%s, strsep(temp,,));
+ivshmem_sz=strsep(temp,,);
+if (ivshmem_sz != NULL){
+size = atol(ivshmem_sz);
+} else {
+size = -1;
+}
+
+ivshmem_desc.chrdev = strsep(temp,\0);
+
+if ( size == -1) {
+ivshmem_desc.size = TARGET_PAGE_SIZE;
+} else {
+ivshmem_desc.size = size*1024*1024;
+}
+

[PATCH 2/5] kvm/e500: Add shadow ID mapping support

2009-05-15 Thread Liu Yu

Based on Hollis's idea,
this patch map (vcpu, as, pid) to individual shadow id.

Every vcpu has a mapping table,
which keep the mapping from guest (as, id) to shadow id.

Every hardware core has a shadow id reference table,
which keep the mapping from shadow id to (vcpu, as, pid).

When mapping is created, both vcpu and core need to update their tables.
But they can destroy the mapping one-sided.

When shadow id get exhausted,
a flush is needed for shadow id reference table.

Signed-off-by: Liu Yu yu@freescale.com
---
 arch/powerpc/include/asm/kvm_e500.h |3 +
 arch/powerpc/kvm/e500_tlb.c |   96 +++
 arch/powerpc/kvm/e500_tlb.h |4 +-
 3 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_e500.h 
b/arch/powerpc/include/asm/kvm_e500.h
index b248f31..bc73abe 100644
--- a/arch/powerpc/include/asm/kvm_e500.h
+++ b/arch/powerpc/include/asm/kvm_e500.h
@@ -42,6 +42,9 @@ struct kvmppc_vcpu_e500 {
/* Pages which are referenced in the shadow TLB. */
struct kvmppc_e500_shadow_ref *shadow_refs[E500_TLB_NUM];
 
+   /* MMU id mapping */
+   void *id_mapping;
+
unsigned int guest_tlb_size[E500_TLB_NUM];
unsigned int shadow_tlb_size[E500_TLB_NUM];
unsigned int guest_tlb_nv[E500_TLB_NUM];
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 4952dba..e5c9211 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -27,6 +27,96 @@
 
 static unsigned int tlb1_entry_num;
 
+struct id_mapping {
+   unsigned char id[2][256];
+};
+
+struct shadow_id_ref {
+   void *entry[256];
+};
+
+static DEFINE_PER_CPU(struct shadow_id_ref, host_sid);
+
+static inline int e500_id_create_mapping(unsigned char *entry)
+{
+   unsigned long sid;
+   int ret = -1;
+
+   preempt_disable();
+   sid = (unsigned long)++(__get_cpu_var(host_sid).entry[0]);
+   if (sid  256) {
+   *entry = (unsigned char)sid;
+   __get_cpu_var(host_sid).entry[sid] = entry;
+   ret = sid;
+   }
+   preempt_enable();
+
+   return ret;
+}
+
+static inline void e500_id_destroy_all(void)
+{
+   preempt_disable();
+   memset(__get_cpu_var(host_sid), 0, sizeof(__get_cpu_var(host_sid)));
+   preempt_enable();
+}
+
+static inline int e500_id_find_mapping(unsigned char *entry)
+{
+   if (*entry  __get_cpu_var(host_sid).entry[*entry] == entry)
+   return *entry;
+   return -1;
+}
+
+static void *kvmppc_e500_alloc_idm(void)
+{
+   return kzalloc(sizeof(struct id_mapping), GFP_KERNEL);
+}
+
+static void kvmppc_e500_free_idm(void *idm)
+{
+   kfree(idm);
+   return;
+}
+
+static inline void kvmppc_e500_reset_idm(void *idm)
+{
+   memset(idm, 0, sizeof(struct id_mapping));
+}
+
+static void inline kvmppc_e500_update_spid(struct kvmppc_vcpu_e500 *vcpu_e500)
+{
+   unsigned int as = !!(vcpu_e500-vcpu.arch.msr  (MSR_IS | MSR_DS));
+
+   vcpu_e500-vcpu.arch.shadow_pid = kvmppc_e500_get_sid(vcpu_e500, as,
+   get_cur_pid(vcpu_e500-vcpu));
+   vcpu_e500-vcpu.arch.swap_pid = kvmppc_e500_get_sid(vcpu_e500, as, 0);
+}
+
+/*
+ * Map guest (vcpu,as,id) to individual shadow id.
+ */
+unsigned int kvmppc_e500_get_sid(struct kvmppc_vcpu_e500 *vcpu_e500,
+ int as, int gid)
+{
+   struct id_mapping *idm = vcpu_e500-id_mapping;
+   int sid;
+
+   sid = e500_id_find_mapping(idm-id[as][gid]);
+
+   while (sid = 0) {
+   /* None mapping yet */
+   sid = e500_id_create_mapping(idm-id[as][gid]);
+   if(sid = 0) {
+   BUG_ON(sid == 0);
+   e500_id_destroy_all();
+   kvmppc_e500_update_spid(vcpu_e500);
+   }
+   }
+
+   return sid;
+}
+
 void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
@@ -653,8 +743,13 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 
*vcpu_e500)
if (vcpu_e500-shadow_refs[1] == NULL)
goto err_out_ref0;
 
+   if((vcpu_e500-id_mapping = kvmppc_e500_alloc_idm()) == NULL)
+   goto err_out_ref1;
+
return 0;
 
+err_out_ref1:
+   kfree(vcpu_e500-shadow_refs[1]);
 err_out_ref0:
kfree(vcpu_e500-shadow_refs[0]);
 err_out_guest1:
@@ -667,6 +762,7 @@ err_out:
 
 void kvmppc_e500_tlb_uninit(struct kvmppc_vcpu_e500 *vcpu_e500)
 {
+   kvmppc_e500_free_idm(vcpu_e500-id_mapping);
kfree(vcpu_e500-shadow_refs[1]);
kfree(vcpu_e500-shadow_refs[0]);
kfree(vcpu_e500-guest_tlb[1]);
diff --git a/arch/powerpc/kvm/e500_tlb.h b/arch/powerpc/kvm/e500_tlb.h
index 45b064b..eb36514 100644
--- a/arch/powerpc/kvm/e500_tlb.h
+++ b/arch/powerpc/kvm/e500_tlb.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
+ * Copyright (C) 2008 -

[PATCH 1/5] kvm/e500: Discard shadow TLB

2009-05-15 Thread Liu Yu

There are several reasons to discard shadow TLB.

1. After implement shadow ID support for E500,
keep shadow TLB may incur potential coherence problem.
(if shadow ID mappings change, shadow TLB cannot be updated intime)

2. We use shadow TLB restore hardware TLB in vcpu_load().
However, since after implement shadow ID there is no tlbia() in vcpu_put(),
it's no need to restore hardware TLB any more.

3. Discard shadow TLB saves a lot memory.

Signed-off-by: Liu Yu yu@freescale.com
---
 arch/powerpc/include/asm/kvm_e500.h |9 +-
 arch/powerpc/kvm/e500_tlb.c |  275 ---
 2 files changed, 103 insertions(+), 181 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_e500.h 
b/arch/powerpc/include/asm/kvm_e500.h
index 9d497ce..b248f31 100644
--- a/arch/powerpc/include/asm/kvm_e500.h
+++ b/arch/powerpc/include/asm/kvm_e500.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
+ * Copyright (C) 2008 - 2009 Freescale Semiconductor, Inc. All rights reserved.
  *
  * Author: Yu Liu, yu@freescale.com
  *
@@ -29,13 +29,18 @@ struct tlbe{
u32 mas7;
 };
 
+struct kvmppc_e500_shadow_ref {
+   struct page *page;
+   struct tlbe *gtlbe;
+};
+
 struct kvmppc_vcpu_e500 {
/* Unmodified copy of the guest's TLB. */
struct tlbe *guest_tlb[E500_TLB_NUM];
/* TLB that's actually used when the guest is running. */
struct tlbe *shadow_tlb[E500_TLB_NUM];
/* Pages which are referenced in the shadow TLB. */
-   struct page **shadow_pages[E500_TLB_NUM];
+   struct kvmppc_e500_shadow_ref *shadow_refs[E500_TLB_NUM];
 
unsigned int guest_tlb_size[E500_TLB_NUM];
unsigned int shadow_tlb_size[E500_TLB_NUM];
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 0e773fc..4952dba 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
+ * Copyright (C) 2008 - 2009 Freescale Semiconductor, Inc. All rights reserved.
  *
  * Author: Yu Liu, yu@freescale.com
  *
@@ -46,17 +46,6 @@ void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
tlbe-mas3, tlbe-mas7);
}
}
-
-   for (tlbsel = 0; tlbsel  2; tlbsel++) {
-   printk(Shadow TLB%d:\n, tlbsel);
-   for (i = 0; i  vcpu_e500-shadow_tlb_size[tlbsel]; i++) {
-   tlbe = vcpu_e500-shadow_tlb[tlbsel][i];
-   if (tlbe-mas1  MAS1_VALID)
-   printk( S[%d][%3d] |  %08X | %08X | %08X | 
%08X |\n,
-   tlbsel, i, tlbe-mas1, tlbe-mas2,
-   tlbe-mas3, tlbe-mas7);
-   }
-   }
 }
 
 static inline unsigned int tlb0_get_next_victim(
@@ -119,10 +108,8 @@ static inline void __write_host_tlbe(struct tlbe *stlbe)
 }
 
 static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
-   int tlbsel, int esel)
+   int tlbsel, int esel, struct tlbe *stlbe)
 {
-   struct tlbe *stlbe = vcpu_e500-shadow_tlb[tlbsel][esel];
-
local_irq_disable();
if (tlbsel == 0) {
__write_host_tlbe(stlbe);
@@ -137,28 +124,13 @@ static inline void write_host_tlbe(struct 
kvmppc_vcpu_e500 *vcpu_e500,
mtspr(SPRN_MAS0, mas0);
}
local_irq_enable();
+   KVMTRACE_5D(STLB_WRITE, vcpu_e500-vcpu, index_of(tlbsel, esel),
+   stlbe-mas1, stlbe-mas2, stlbe-mas3, stlbe-mas7,
+   handler);
 }
 
 void kvmppc_e500_tlb_load(struct kvm_vcpu *vcpu, int cpu)
 {
-   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
-   int i;
-   unsigned register mas0;
-
-   /* Load all valid TLB1 entries to reduce guest tlb miss fault */
-   local_irq_disable();
-   mas0 = mfspr(SPRN_MAS0);
-   for (i = 0; i  tlb1_max_shadow_size(); i++) {
-   struct tlbe *stlbe = vcpu_e500-shadow_tlb[1][i];
-
-   if (get_tlb_v(stlbe)) {
-   mtspr(SPRN_MAS0, MAS0_TLBSEL(1)
-   | MAS0_ESEL(to_htlb1_esel(i)));
-   __write_host_tlbe(stlbe);
-   }
-   }
-   mtspr(SPRN_MAS0, mas0);
-   local_irq_enable();
 }
 
 void kvmppc_e500_tlb_put(struct kvm_vcpu *vcpu)
@@ -200,16 +172,19 @@ static int kvmppc_e500_tlb_index(struct kvmppc_vcpu_e500 
*vcpu_e500,
 }
 
 static void kvmppc_e500_shadow_release(struct kvmppc_vcpu_e500 *vcpu_e500,
-   int tlbsel, int esel)
+   int stlbsel, int sesel)
 {
-   struct tlbe *stlbe = vcpu_e500-shadow_tlb[tlbsel][esel];
-   struct page *page = vcpu_e500-shadow_pages[tlbsel][esel];
+   struct kvmppc_e500_shadow_ref *ref;
+   struct page *page;
+
+   ref = vcpu_e500-shadow_refs[stlbsel][sesel];
+   page =

[PATCH 4/5] kvm/e500: minmize the TLB flush

2009-05-15 Thread Liu Yu

Only flush TLB when a reset to the shadow mappings is needed.

For guest tlbia, we can reset vcpu's mappings instead of a real flush.
And because different vcpu maps to different IDs,
no flush is needed at vcpu_put() and mmu_destroy().

Signed-off-by: Liu Yu yu@freescale.com
---
 arch/powerpc/kvm/e500_tlb.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index d090d97..570185c 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -109,6 +109,7 @@ unsigned int kvmppc_e500_get_sid(struct kvmppc_vcpu_e500 
*vcpu_e500,
sid = e500_id_create_mapping(idm-id[as][gid]);
if(sid = 0) {
BUG_ON(sid == 0);
+   _tlbil_all();
e500_id_destroy_all();
kvmppc_e500_update_spid(vcpu_e500);
}
@@ -229,7 +230,6 @@ void kvmppc_e500_tlb_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvmppc_e500_tlb_put(struct kvm_vcpu *vcpu)
 {
-   _tlbil_all();
 }
 
 /* Search the guest TLB for a matching entry. */
@@ -412,7 +412,6 @@ void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int 
usermode)
struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
 
if (usermode) {
-   _tlbil_all();
/* clear PID for guest kernel mapping */
vcpu-arch.swap_pid = 0;
} else {
@@ -458,7 +457,9 @@ int kvmppc_e500_emul_mt_mmucsr0(struct kvmppc_vcpu_e500 
*vcpu_e500, ulong value)
for (esel = 0; esel  vcpu_e500-guest_tlb_size[1]; esel++)
kvmppc_e500_gtlbe_invalidate(vcpu_e500, 1, esel);
 
-   _tlbil_all();
+   /* Reset vcpu shadow id mapping */
+   kvmppc_e500_reset_idm(vcpu_e500-id_mapping);
+   kvmppc_e500_update_spid(vcpu_e500);
 
return EMULATE_DONE;
 }
@@ -489,7 +490,9 @@ int kvmppc_e500_emul_tlbivax(struct kvm_vcpu *vcpu, int ra, 
int rb)
kvmppc_e500_gtlbe_invalidate(vcpu_e500, tlbsel, esel);
}
 
-   _tlbil_all();
+   /* Reset vcpu shadow id mapping */
+   kvmppc_e500_reset_idm(vcpu_e500-id_mapping);
+   kvmppc_e500_update_spid(vcpu_e500);
 
return EMULATE_DONE;
 }
@@ -668,9 +671,6 @@ void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
for (stlbsel = 0; stlbsel  2; stlbsel++)
for (i = 0; i  vcpu_e500-guest_tlb_size[stlbsel]; i++)
kvmppc_e500_shadow_release(vcpu_e500, stlbsel, i);
-
-   /* discard all guest mapping */
-   _tlbil_all();
 }
 
 void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
-- 
1.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] kvm/e500: Add tlb0 entry invalidate

2009-05-15 Thread Liu Yu

Invalidate TLB0 hardware entry when
the related guest TLB entry is invalidated.

It's a bug we didn't do this before.

It didn't make problem is because that
we flushed TLB every time when we enterred to guest's userspace.

Signed-off-by: Liu Yu yu@freescale.com
---
 arch/powerpc/kvm/e500_tlb.c |   32 ++--
 1 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 570185c..ce4a379 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -286,6 +286,30 @@ static void kvmppc_e500_shadow_release(struct 
kvmppc_vcpu_e500 *vcpu_e500,
}
 }
 
+static void kvmppc_e500_tlb0_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
+   int esel)
+{
+   struct tlbe *gtlbe = vcpu_e500-guest_tlb[0][esel];
+   u32 eaddr, pid, val;
+
+   eaddr = get_tlb_eaddr(gtlbe);
+   pid = kvmppc_e500_get_sid(vcpu_e500, get_tlb_ts(gtlbe),
+   get_tlb_tid(gtlbe));
+   val = (pid  16) | MAS6_SAS;
+
+   local_irq_disable();
+
+   mtspr(SPRN_MAS6, val);
+   asm volatile ( tlbsx 0, %[eaddr]\n : : [eaddr] a(eaddr));
+   val = mfspr(SPRN_MAS1);
+   if (val  MAS1_VALID) {
+   mtspr(SPRN_MAS1, val  ~MAS1_VALID);
+   asm volatile (tlbwe\n : : );
+   }
+
+   local_irq_enable();
+}
+
 static void kvmppc_e500_tlb1_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
int esel)
 {
@@ -575,8 +599,12 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
 
gtlbe = vcpu_e500-guest_tlb[tlbsel][esel];
 
-   if (get_tlb_v(gtlbe)  tlbsel == 1)
-   kvmppc_e500_tlb1_invalidate(vcpu_e500, esel);
+   if (get_tlb_v(gtlbe)) {
+   if (tlbsel == 0)
+   kvmppc_e500_tlb0_invalidate(vcpu_e500, esel);
+   else
+   kvmppc_e500_tlb1_invalidate(vcpu_e500, esel);
+   }
 
gtlbe-mas1 = vcpu_e500-mas1;
gtlbe-mas2 = vcpu_e500-mas2;
-- 
1.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

53 matches

Mail list logo