Re: [RFC PATCH 2/2] KVM: x86 emulator: Cleanup emulate_push() writebacks

2011-03-30 Thread Takuya Yoshikawa
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:
 @@ -1265,22 +1263,19 @@ int emulate_int_real(struct x86_emulate_ctxt *ctxt,
  
   /* TODO: Add limit checks */
   c-src.val = ctxt-eflags;
 - emulate_push(ctxt, ops);
 - rc = writeback(ctxt, ops);
 + rc = emulate_push(ctxt, ops);
   if (rc != X86EMUL_CONTINUE)
   return rc;
  
   ctxt-eflags = ~(EFLG_IF | EFLG_TF | EFLG_AC);
  
   c-src.val = ops-get_segment_selector(VCPU_SREG_CS, ctxt-vcpu);
 - emulate_push(ctxt, ops);
 - rc = writeback(ctxt, ops);
 + rc = emulate_push(ctxt, ops);
   if (rc != X86EMUL_CONTINUE)
   return rc;
  
   c-src.val = c-eip;
 - emulate_push(ctxt, ops);
 - rc = writeback(ctxt, ops);
 + rc = emulate_push(ctxt, ops);
   if (rc != X86EMUL_CONTINUE)
   return rc;

I could also delete c-dst.type = OP_NONE; line here.

...

  
  static int em_push(struct x86_emulate_ctxt *ctxt)
  {
 - emulate_push(ctxt, ctxt-ops);
 - return X86EMUL_CONTINUE;
 + return emulate_push(ctxt, ctxt-ops);
  }

em_push() can be used instead of emulate_push() if we rearrange
the place to put em_push().


There seems to be some conflicting patches around.  So If I get
some other comments, I'll rebase and resend later!

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio-blk.c handling of i/o which is not a 512 multiple

2011-03-30 Thread Conor Murphy
Hi,

I'm trying to write a virtio-blk driver for Solaris. I've gotten it to the point
where Solaris can see the device and create a ZFS file system on it.

However when I try and create a UFS filesystem on the device, the VM crashed
with the error
*** glibc detected *** /usr/bin/qemu-kvm: double free or corruption (!prev):
0x7f2d38000a00 ***

I can reproduce the problem with a simple dd, i.e.
dd if=/dev/zero of=/dev/rdsk/c2d10p0 bs=5000 count=1

My driver will create a virtio-blk request with two elements in the sg list, one
for the first 4096 byes and the other for the remaining 904.

From stepping through with gdb, virtio_blk_handle_write will sets n_sectors to 
9
(5000 / 512). Later on the code, n_sectors is used the calculate the size of the
buffer required but 9 * 512 is too small and so when the request is process it
ends up writing past the end of the buffer and I guest this triggers the glibc
error.

Is there a requirement for virtio-blk guest drivers that all i/o requests are
sized in multiples of 512 bytes?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-blk.c handling of i/o which is not a 512 multiple

2011-03-30 Thread Avi Kivity

On 03/30/2011 10:41 AM, Alexander Graf wrote:

On 30.03.2011, at 10:15, Conor Murphy wrote:

  Hi,

  I'm trying to write a virtio-blk driver for Solaris. I've gotten it to the 
point
  where Solaris can see the device and create a ZFS file system on it.

  However when I try and create a UFS filesystem on the device, the VM crashed
  with the error
  *** glibc detected *** /usr/bin/qemu-kvm: double free or corruption (!prev):
  0x7f2d38000a00 ***

Ouch.


  I can reproduce the problem with a simple dd, i.e.
  dd if=/dev/zero of=/dev/rdsk/c2d10p0 bs=5000 count=1

  My driver will create a virtio-blk request with two elements in the sg list, 
one
  for the first 4096 byes and the other for the remaining 904.

   From stepping through with gdb, virtio_blk_handle_write will sets n_sectors 
to 9
  (5000 / 512). Later on the code, n_sectors is used the calculate the size of 
the
  buffer required but 9 * 512 is too small and so when the request is process 
it
  ends up writing past the end of the buffer and I guest this triggers the 
glibc
  error.

  Is there a requirement for virtio-blk guest drivers that all i/o requests are
  sized in multiples of 512 bytes?

There should be some documentation on virtio-blk, but I can't seem to find it 
atm. Either way, you're posting this on the wrong mailing list. The correct 
ones are:

   Qemu: QEMU-devel Developersqemu-de...@nongnu.org
   Virtio: virtualizat...@lists.linux-foundation.org


Please repost it there and hope for someone to reply. I'd personally find it 
very odd to see that any HBA would accept data that's not aligned to sectors, 
but let's get some comments from the inventors of the protocol :)



The data should be aligned; but of course a misalignment shouldn't kill 
qemu, only fail the request.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-blk.c handling of i/o which is not a 512 multiple

2011-03-30 Thread Stefan Hajnoczi
On Wed, Mar 30, 2011 at 9:15 AM, Conor Murphy
conor_murphy_v...@hotmail.com wrote:
 I'm trying to write a virtio-blk driver for Solaris. I've gotten it to the 
 point
 where Solaris can see the device and create a ZFS file system on it.

 However when I try and create a UFS filesystem on the device, the VM crashed
 with the error
 *** glibc detected *** /usr/bin/qemu-kvm: double free or corruption (!prev):
 0x7f2d38000a00 ***

This is a bug in QEMU.  A guest must not be able to trigger a crash.

 I can reproduce the problem with a simple dd, i.e.
 dd if=/dev/zero of=/dev/rdsk/c2d10p0 bs=5000 count=1

I think this a raw character device, which is why you're even able to
perform non-blocksize accesses?  Have you looked at how other drivers
(like the Xen pv blkfront) handle this?

 My driver will create a virtio-blk request with two elements in the sg list, 
 one
 for the first 4096 byes and the other for the remaining 904.

 From stepping through with gdb, virtio_blk_handle_write will sets n_sectors 
 to 9
 (5000 / 512). Later on the code, n_sectors is used the calculate the size of 
 the
 buffer required but 9 * 512 is too small and so when the request is process it
 ends up writing past the end of the buffer and I guest this triggers the glibc
 error.

We need to validate that (qiov-size % BDRV_SECTOR_SIZE) == 0 and
reject invalid requests.

 Is there a requirement for virtio-blk guest drivers that all i/o requests are
 sized in multiples of 512 bytes?

There is no strict requirement according to the virtio specification,
but maybe there should be:

http://ozlabs.org/~rusty/virtio-spec/virtio-spec-0.8.9.pdf

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/29/2011 02:08 PM, Gleb Natapov wrote:

Currently we sync registers back and forth before/after exiting
to userspace for IO, but during IO device model shouldn't need to
read/write the registers, so we can as well skip those sync points. The
only exaception is broken vmware backdor interface. The new code sync
registers content during IO only if registers are read from/written to
by userspace in the middle of the IO operation and this almost never
happens in practise.


While this is a nice idea, how much does it save in practice?  It does 
introduce more complexity.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] numa: Don't limit node count by smp count

2011-03-30 Thread Avi Kivity

On 03/29/2011 02:56 AM, Sasha Levin wrote:

It is possible to create CPU-less NUMA nodes, node amount shouldn't be
limited by amount of CPUs.


Patch seems fine; but please send to qemu-de...@nongnu.org.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
 On 03/29/2011 02:08 PM, Gleb Natapov wrote:
 Currently we sync registers back and forth before/after exiting
 to userspace for IO, but during IO device model shouldn't need to
 read/write the registers, so we can as well skip those sync points. The
 only exaception is broken vmware backdor interface. The new code sync
 registers content during IO only if registers are read from/written to
 by userspace in the middle of the IO operation and this almost never
 happens in practise.
 
 While this is a nice idea, how much does it save in practice?  It
 does introduce more complexity.
 

I haven't measured, but can try to do so. It eliminates two copies of
all registers on each MMIO/PIO read. I expect this to be measurable in
workloads that do many such reads.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 12:47 PM, Gleb Natapov wrote:

On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
  On 03/29/2011 02:08 PM, Gleb Natapov wrote:
  Currently we sync registers back and forth before/after exiting
  to userspace for IO, but during IO device model shouldn't need to
  read/write the registers, so we can as well skip those sync points. The
  only exaception is broken vmware backdor interface. The new code sync
  registers content during IO only if registers are read from/written to
  by userspace in the middle of the IO operation and this almost never
  happens in practise.

  While this is a nice idea, how much does it save in practice?  It
  does introduce more complexity.


I haven't measured, but can try to do so. It eliminates two copies of
all registers on each MMIO/PIO read. I expect this to be measurable in
workloads that do many such reads.



I don't, especially if these are mmios to userspace.  Perhaps it's 
better to remove the copy on kernel mmio, since it's much faster, if the 
result is simpler (there can be no KVM_SET_REGS in that context).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 12:50:53PM +0200, Avi Kivity wrote:
 On 03/30/2011 12:47 PM, Gleb Natapov wrote:
 On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
   On 03/29/2011 02:08 PM, Gleb Natapov wrote:
   Currently we sync registers back and forth before/after exiting
   to userspace for IO, but during IO device model shouldn't need to
   read/write the registers, so we can as well skip those sync points. The
   only exaception is broken vmware backdor interface. The new code sync
   registers content during IO only if registers are read from/written to
   by userspace in the middle of the IO operation and this almost never
   happens in practise.
 
   While this is a nice idea, how much does it save in practice?  It
   does introduce more complexity.
 
 
 I haven't measured, but can try to do so. It eliminates two copies of
 all registers on each MMIO/PIO read. I expect this to be measurable in
 workloads that do many such reads.
 
 
 I don't, especially if these are mmios to userspace.  Perhaps it's
 better to remove the copy on kernel mmio, since it's much faster, if
 the result is simpler (there can be no KVM_SET_REGS in that
 context).
 

The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
save a lot comparing to time it takes to do one MMIO to userspace, but
do 1 million of those and you saved a lot of CPU cycles. I do not think
we should abandon the optimization so easily. Unfortunately I can't run
perf on 2.6.38 kernel right now. It gives me strange errors and when it
doesn't it makes kernel OOPS.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 01:22:43PM +0200, Gleb Natapov wrote:
 On Wed, Mar 30, 2011 at 12:50:53PM +0200, Avi Kivity wrote:
  On 03/30/2011 12:47 PM, Gleb Natapov wrote:
  On Wed, Mar 30, 2011 at 12:38:53PM +0200, Avi Kivity wrote:
On 03/29/2011 02:08 PM, Gleb Natapov wrote:
Currently we sync registers back and forth before/after exiting
to userspace for IO, but during IO device model shouldn't need to
read/write the registers, so we can as well skip those sync points. The
only exaception is broken vmware backdor interface. The new code sync
registers content during IO only if registers are read from/written to
by userspace in the middle of the IO operation and this almost never
happens in practise.
  
While this is a nice idea, how much does it save in practice?  It
does introduce more complexity.
  
  
  I haven't measured, but can try to do so. It eliminates two copies of
  all registers on each MMIO/PIO read. I expect this to be measurable in
  workloads that do many such reads.
  
  
  I don't, especially if these are mmios to userspace.  Perhaps it's
  better to remove the copy on kernel mmio, since it's much faster, if
  the result is simpler (there can be no KVM_SET_REGS in that
  context).
  
 
 The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
 save a lot comparing to time it takes to do one MMIO to userspace, but
 do 1 million of those and you saved a lot of CPU cycles. I do not think
 we should abandon the optimization so easily. Unfortunately I can't run
 perf on 2.6.38 kernel right now. It gives me strange errors and when it
 doesn't it makes kernel OOPS.
 
After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

With patch:
0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 01:43 PM, Gleb Natapov wrote:


  The patch saves copying of 256 bytes on each MMIO/PIO read. It may not
  save a lot comparing to time it takes to do one MMIO to userspace, but
  do 1 million of those and you saved a lot of CPU cycles. I do not think
  we should abandon the optimization so easily. Unfortunately I can't run
  perf on 2.6.38 kernel right now. It gives me strange errors and when it
  doesn't it makes kernel OOPS.

After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

With patch:
0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction



Well, that is probably significant.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 02:12 PM, Avi Kivity wrote:

On 03/30/2011 01:43 PM, Gleb Natapov wrote:


  The patch saves copying of 256 bytes on each MMIO/PIO read. It may 
not
  save a lot comparing to time it takes to do one MMIO to userspace, 
but
  do 1 million of those and you saved a lot of CPU cycles. I do not 
think
  we should abandon the optimization so easily. Unfortunately I 
can't run
  perf on 2.6.38 kernel right now. It gives me strange errors and 
when it

  doesn't it makes kernel OOPS.

After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction
1.51%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction
1.68%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction


With patch:
0.84%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction
0.96%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction
0.89%  qemu-system-x86  [kvm] [k] 
x86_emulate_instruction




Well, that is probably significant.



Please rebase the patch after sse-mmio goes in.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 01:43 PM, Gleb Natapov wrote:

After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

With patch:
0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction



The cause might be kvm_rip_write() using vmwrite.  Can you use perf to 
see where the hits are in x86_emulate_instruction?


If that's the case, we may be able to do local optimizations to 
kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity() instead 
of this global change.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
 On 03/30/2011 01:43 PM, Gleb Natapov wrote:
 After reboot perf started to work. I ran modified emulator.flat unit
 test. It was modified to run test_cmps() in an endless loop.
 
 Without patch:
 1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 
 With patch:
 0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 
 
 The cause might be kvm_rip_write() using vmwrite.  Can you use perf
 to see where the hits are in x86_emulate_instruction?
 
 If that's the case, we may be able to do local optimizations to
 kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
 instead of this global change.
 
I can leave copying there and eliminate only kvm_rip_write and see
perf data.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm/x86: fix XSAVE bit scanning

2011-03-30 Thread Andre Przywara
When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area
leaves, it assumes that the leaves are contigious and stops at the
first zero one. On AMD hardware there is a gap, though, as LWP uses
leaf 62 to announce it's state save area.
So lets iterate through all 64 possible leaves and simply skip zero
ones to also cover later features.

CC: sta...@kernel.org [2.6.38]
Signed-off-by: Andre Przywara andre.przyw...@amd.com
---
 arch/x86/kvm/x86.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfd7763..6e86cec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2395,9 +2395,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
int i;
 
entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-   for (i = 1; *nent  maxnent; ++i) {
-   if (entry[i - 1].eax == 0  i != 2)
-   break;
+   for (i = 1; *nent  maxnent  i  64; ++i) {
+   if (entry[i].eax == 0)
+   continue;
do_cpuid_1_ent(entry[i], function, i);
entry[i].flags |=
   KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-- 
1.6.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm/x86: remove unneeded substitute search for missing CPUID entries

2011-03-30 Thread Andre Przywara
If KVM cannot find an exact match for a requested CPUID leaf, the
code will try to find the closest match instead of simply confessing
it's failure. The heuristic is on one hand wrong nowadays,
since it does not take the KVM CPUID leaves (0x40xx) into
account. On the other hand the callers of this function can all deal
with the no-match situation. So lets remove this code, as it serves
no purpose.
This fixes a crash of newer Linux kernels as KVM guests on
AMD Bulldozer CPUs, where bogus values were returned in response to
a CPUID intercept.

CC: sta...@kernel.org [2.6.38]
Signed-off-by: Andre Przywara andre.przyw...@amd.com
---
 arch/x86/kvm/x86.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6e86cec..625143f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4959,12 +4959,6 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct 
kvm_vcpu *vcpu,
best = e;
break;
}
-   /*
-* Both basic or both extended?
-*/
-   if (((e-function ^ function)  0x8000) == 0)
-   if (!best || e-function  best-function)
-   best = e;
}
return best;
 }
-- 
1.6.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm/x86: remove unneeded substitute search for missing CPUID entries

2011-03-30 Thread Avi Kivity

On 03/30/2011 03:01 PM, Andre Przywara wrote:

If KVM cannot find an exact match for a requested CPUID leaf, the
code will try to find the closest match instead of simply confessing
it's failure. The heuristic is on one hand wrong nowadays,
since it does not take the KVM CPUID leaves (0x40xx) into
account. On the other hand the callers of this function can all deal
with the no-match situation. So lets remove this code, as it serves
no purpose.
This fixes a crash of newer Linux kernels as KVM guests on
AMD Bulldozer CPUs, where bogus values were returned in response to
a CPUID intercept.


@@ -4959,12 +4959,6 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct 
kvm_vcpu *vcpu,
best = e;
break;
}
-   /*
-* Both basic or both extended?
-*/
-   if (((e-function ^ function)  0x8000) == 0)
-   if (!best || e-function  best-function)
-   best = e;
}
return best;
  }



This behaviour is mandated by the spec (looking at the Intel one), 
though it is implemented incorrectly - should always return largest 
basic leaf, and ignore the kvm leaves.


I think the correct behaviour is:

   if (e-function  1  (!best || e-function  best-function))
best = e;

We probably need a find_exact_cpuid_entry() that returns NULL if it 
doesn't find a match, for internal use.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
 On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
  On 03/30/2011 01:43 PM, Gleb Natapov wrote:
  After reboot perf started to work. I ran modified emulator.flat unit
  test. It was modified to run test_cmps() in an endless loop.
  
  Without patch:
  1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  
  With patch:
  0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
  
  
  The cause might be kvm_rip_write() using vmwrite.  Can you use perf
  to see where the hits are in x86_emulate_instruction?
  
  If that's the case, we may be able to do local optimizations to
  kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
  instead of this global change.
  
 I can leave copying there and eliminate only kvm_rip_write and see
 perf data.
 

1.75%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.60%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.42%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

This is with copy in place, but those are under if (writeback):
toggle_interruptibility(vcpu,
vcpu-arch.emulate_ctxt.interruptibility);
kvm_set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu-arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 03:26 PM, Gleb Natapov wrote:

On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
  On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
On 03/30/2011 01:43 PM, Gleb Natapov wrote:
After reboot perf started to work. I ran modified emulator.flat unit
test. It was modified to run test_cmps() in an endless loop.

Without patch:
1.71%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.51%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.68%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

With patch:
0.84%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.96%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
0.89%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

  
The cause might be kvm_rip_write() using vmwrite.  Can you use perf
to see where the hits are in x86_emulate_instruction?
  
If that's the case, we may be able to do local optimizations to
kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
instead of this global change.
  
  I can leave copying there and eliminate only kvm_rip_write and see
  perf data.


1.75%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.60%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
1.42%  qemu-system-x86  [kvm] [k] x86_emulate_instruction

This is with copy in place, but those are under if (writeback):
 toggle_interruptibility(vcpu,
 vcpu-arch.emulate_ctxt.interruptibility);
 kvm_set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 kvm_make_request(KVM_REQ_EVENT, vcpu);
 vcpu-arch.emulate_regs_need_sync_to_vcpu = false;
 kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);



It's wierd.  Do you get perf hits in the copying?

Copying a couple of hot cache lines shouldn't take any measurable time 
compared to a heavyweight exit.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm/x86: remove unneeded substitute search for missing CPUID entries

2011-03-30 Thread Avi Kivity

On 03/30/2011 03:26 PM, Avi Kivity wrote:


This behaviour is mandated by the spec (looking at the Intel one), 
though it is implemented incorrectly - should always return largest 
basic leaf, and ignore the kvm leaves.


I think the correct behaviour is:

   if (e-function  1  (!best || e-function  best-function))
best = e;



Oh, and it should honor ecx.. what a great interface.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 03:29:02PM +0200, Avi Kivity wrote:
 On 03/30/2011 03:26 PM, Gleb Natapov wrote:
 On Wed, Mar 30, 2011 at 02:48:28PM +0200, Gleb Natapov wrote:
   On Wed, Mar 30, 2011 at 02:17:55PM +0200, Avi Kivity wrote:
 On 03/30/2011 01:43 PM, Gleb Natapov wrote:
 After reboot perf started to work. I ran modified emulator.flat unit
 test. It was modified to run test_cmps() in an endless loop.
 
 Without patch:
 1.71%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 1.51%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 1.68%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 
 With patch:
 0.84%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 0.96%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 0.89%  qemu-system-x86  [kvm] [k] 
  x86_emulate_instruction
 
   
 The cause might be kvm_rip_write() using vmwrite.  Can you use perf
 to see where the hits are in x86_emulate_instruction?
   
 If that's the case, we may be able to do local optimizations to
 kvm_rip_write(), kvm_set_rflags(), and toggle_interruptiblity()
 instead of this global change.
   
   I can leave copying there and eliminate only kvm_rip_write and see
   perf data.
 
 
 1.75%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 1.60%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 1.42%  qemu-system-x86  [kvm] [k] x86_emulate_instruction
 
 This is with copy in place, but those are under if (writeback):
  toggle_interruptibility(vcpu,
  vcpu-arch.emulate_ctxt.interruptibility);
  kvm_set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
  kvm_make_request(KVM_REQ_EVENT, vcpu);
  vcpu-arch.emulate_regs_need_sync_to_vcpu = false;
  kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 
 
 It's wierd.  Do you get perf hits in the copying?
 
How can I check. The memcpy is inlined.

 Copying a couple of hot cache lines shouldn't take any measurable
 time compared to a heavyweight exit.
 
The whole function takes only 1.5% CPU. Perf measures how much this
function become faster and heavyweight exit is not part of the function.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 03:36 PM, Gleb Natapov wrote:


  It's wierd.  Do you get perf hits in the copying?

How can I check. The memcpy is inlined.



perf annotate x86_emulate_instruction

(newer perf allows you to get there interactively from 'perf report')


  Copying a couple of hot cache lines shouldn't take any measurable
  time compared to a heavyweight exit.

The whole function takes only 1.5% CPU. Perf measures how much this
function become faster and heavyweight exit is not part of the function.


It's still relative to exit cost.  If the total exit was 2 us, then a 1% 
decrease in cost translates to 40 ns.


(well, that's not unlikely for a 256 byte memcpy, but let's be sure).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 03:41:09PM +0200, Avi Kivity wrote:
 On 03/30/2011 03:36 PM, Gleb Natapov wrote:
 
   It's wierd.  Do you get perf hits in the copying?
 
 How can I check. The memcpy is inlined.
 
 
 perf annotate x86_emulate_instruction
 
 (newer perf allows you to get there interactively from 'perf report')
 
   Copying a couple of hot cache lines shouldn't take any measurable
Ah, forgot about it:

First one:
27.71 :   1179f:   f3 a5   rep movsl 
%ds:(%rsi),%es:(%rdi)

Second one:
32.68 :   11888:   f3 a5   rep movsl 
%ds:(%rsi),%es:(%rdi)

   time compared to a heavyweight exit.
 
 The whole function takes only 1.5% CPU. Perf measures how much this
 function become faster and heavyweight exit is not part of the function.
 
 It's still relative to exit cost.  If the total exit was 2 us, then
 a 1% decrease in cost translates to 40 ns.
 
 (well, that's not unlikely for a 256 byte memcpy, but let's be sure).
 
 -- 
 error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu

2011-03-30 Thread Avi Kivity

On 03/30/2011 03:43 PM, Gleb Natapov wrote:

On Wed, Mar 30, 2011 at 03:41:09PM +0200, Avi Kivity wrote:
  On 03/30/2011 03:36 PM, Gleb Natapov wrote:
  
 It's wierd.  Do you get perf hits in the copying?
  
  How can I check. The memcpy is inlined.
  

  perf annotate x86_emulate_instruction

  (newer perf allows you to get there interactively from 'perf report')

 Copying a couple of hot cache lines shouldn't take any measurable
Ah, forgot about it:

First one:
27.71 :   1179f:   f3 a5   rep movsl 
%ds:(%rsi),%es:(%rdi)

Second one:
32.68 :   11888:   f3 a5   rep movsl 
%ds:(%rsi),%es:(%rdi)



Okay, I'm convinced.

(looks like a missed optimization, should use rep movsq there)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAT networking from guest not working

2011-03-30 Thread Marc Boorshtein
I apologize if this is the wrong list.  I have just installed Fedora
14 and gotten KVM up and running.  I installed a Windows 7 guest
without issue using NAT for networking.  The guest can ping the
default gateway, but can't reach the internet or the rest of the
network.  Here's the really odd thing, DNS resolution works.  I've had
no issues with VMWare images so I don't think its an issue with my
networking infrastructure.  Here's my uname:

Linux r2d2.tremolo.lan 2.6.38-1.fc15.x86_64 #1 SMP Tue Mar 15 05:29:00
UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

libvirtd and libvirtd-guests are both running.  Any help (including
where I should take this query if this isn't the right place) would be
greatly appreciated.

Thanks!
Marc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Justin P. Mattock
The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattock justinmatt...@gmail.com

---
 arch/x86/kvm/i8254.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
 };
 
 struct kvm_pit {
-   unsigned long base_addresss;
+   unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Avi Kivity

On 03/30/2011 06:19 PM, Justin P. Mattock wrote:

The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
  arch/x86/kvm/i8254.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
  };

  struct kvm_pit {
-   unsigned long base_addresss;
+   unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;


Why not remove the variable completely?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: better fix for race between nmi injection and enabling nmi window

2011-03-30 Thread Marcelo Tosatti

Based on Gleb's idea, fix race between nmi injection and enabling 
nmi window in a simpler way.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6a129f..9a7cc1be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5152,6 +5152,7 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
int r;
+   int nmi_pending;
bool req_int_win = !irqchip_in_kernel(vcpu-kvm) 
vcpu-run-request_interrupt_window;
 
@@ -5195,11 +5196,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;
 
+   nmi_pending = ACCESS_ONCE(vcpu-arch.nmi_pending);
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
inject_pending_event(vcpu);
 
/* enable NMI/IRQ window open exits if needed */
-   if (vcpu-arch.nmi_pending)
+   if (nmi_pending)
kvm_x86_ops-enable_nmi_window(vcpu);
else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
kvm_x86_ops-enable_irq_window(vcpu);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Justin P. Mattock

On 03/30/2011 09:26 AM, Avi Kivity wrote:

On 03/30/2011 06:19 PM, Justin P. Mattock wrote:

The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
arch/x86/kvm/i8254.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
};

struct kvm_pit {
- unsigned long base_addresss;
+ unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;


Why not remove the variable completely?



didnt even think to completely remove the variable(figured it was used 
somewhere).I will look at that and resend with removal of the variable 
for you..


Justin P. Mattock
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: better fix for race between nmi injection and enabling nmi window

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 01:30:28PM -0300, Marcelo Tosatti wrote:
 
 Based on Gleb's idea, fix race between nmi injection and enabling 
 nmi window in a simpler way.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
But we need to revert the patch that introduced use of request for NMI
first.  Otherwise NMI will not be delivered, no?

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index a6a129f..9a7cc1be 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5152,6 +5152,7 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  {
   int r;
 + int nmi_pending;
   bool req_int_win = !irqchip_in_kernel(vcpu-kvm) 
   vcpu-run-request_interrupt_window;
  
 @@ -5195,11 +5196,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   if (unlikely(r))
   goto out;
  
 + nmi_pending = ACCESS_ONCE(vcpu-arch.nmi_pending);
 +
   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
   inject_pending_event(vcpu);
  
   /* enable NMI/IRQ window open exits if needed */
 - if (vcpu-arch.nmi_pending)
 + if (nmi_pending)
   kvm_x86_ops-enable_nmi_window(vcpu);
   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
   kvm_x86_ops-enable_irq_window(vcpu);

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAT networking from guest not working

2011-03-30 Thread David Mair

Hi Marc,

On 03/30/2011 08:46 AM, Marc Boorshtein wrote:

I apologize if this is the wrong list.  I have just installed Fedora
14 and gotten KVM up and running.  I installed a Windows 7 guest
without issue using NAT for networking.  The guest can ping the
default gateway, but can't reach the internet or the rest of the
network.  Here's the really odd thing, DNS resolution works.  I've had
no issues with VMWare images so I don't think its an issue with my
networking infrastructure.  Here's my uname:

Linux r2d2.tremolo.lan 2.6.38-1.fc15.x86_64 #1 SMP Tue Mar 15 05:29:00
UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

libvirtd and libvirtd-guests are both running.  Any help (including
where I should take this query if this isn't the right place) would be
greatly appreciated.


You'd need to provide more information but to be fair kvm provides no 
part of any nat implementation I'm aware of that can be used by kvm 
hosted vms. That's usually done by qemu or iptables on the host. Which 
case is defined by qemu command line so at the very least, your problem 
probably also exists with a qemu command line that includes -no-kvm and 
in that case you'd be better asking on a qemu list.


--
David Mair.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2]arch:x86:kvm:i8254.h Remove base_addresss in kvm_pit since it is unused.

2011-03-30 Thread Justin P. Mattock
The patch below removes unsigned long base_addresss; in i8254.h 
since it is unused.

Signed-off-by: Justin P. Mattock justinmatt...@gmail.com

---
 arch/x86/kvm/i8254.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..b681a9f 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,6 @@ struct kvm_kpit_state {
 };
 
 struct kvm_pit {
-   unsigned long base_addresss;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: better fix for race between nmi injection and enabling nmi window

2011-03-30 Thread Marcelo Tosatti
On Wed, Mar 30, 2011 at 06:33:22PM +0200, Gleb Natapov wrote:
 On Wed, Mar 30, 2011 at 01:30:28PM -0300, Marcelo Tosatti wrote:
  
  Based on Gleb's idea, fix race between nmi injection and enabling 
  nmi window in a simpler way.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
 But we need to revert the patch that introduced use of request for NMI
 first.  Otherwise NMI will not be delivered, no?

Yes, pushed, can you verify please?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: better fix for race between nmi injection and enabling nmi window

2011-03-30 Thread Avi Kivity

On 03/30/2011 06:30 PM, Marcelo Tosatti wrote:

Based on Gleb's idea, fix race between nmi injection and enabling
nmi window in a simpler way.

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6a129f..9a7cc1be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5152,6 +5152,7 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  {
int r;
+   int nmi_pending;
bool req_int_win = !irqchip_in_kernel(vcpu-kvm)
vcpu-run-request_interrupt_window;

@@ -5195,11 +5196,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(r))
goto out;

+   nmi_pending = ACCESS_ONCE(vcpu-arch.nmi_pending);
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
inject_pending_event(vcpu);

/* enable NMI/IRQ window open exits if needed */
-   if (vcpu-arch.nmi_pending)
+   if (nmi_pending)
kvm_x86_ops-enable_nmi_window(vcpu);
else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
kvm_x86_ops-enable_irq_window(vcpu);



What about the check in inject_pending_events()?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAT networking from guest not working

2011-03-30 Thread Marc Boorshtein
OK, I''ll go ask them.

Thanks!

Marc

On Wed, Mar 30, 2011 at 12:44 PM, David Mair dm...@mair-family.org wrote:
 Hi Marc,

 On 03/30/2011 08:46 AM, Marc Boorshtein wrote:

 I apologize if this is the wrong list.  I have just installed Fedora
 14 and gotten KVM up and running.  I installed a Windows 7 guest
 without issue using NAT for networking.  The guest can ping the
 default gateway, but can't reach the internet or the rest of the
 network.  Here's the really odd thing, DNS resolution works.  I've had
 no issues with VMWare images so I don't think its an issue with my
 networking infrastructure.  Here's my uname:

 Linux r2d2.tremolo.lan 2.6.38-1.fc15.x86_64 #1 SMP Tue Mar 15 05:29:00
 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

 libvirtd and libvirtd-guests are both running.  Any help (including
 where I should take this query if this isn't the right place) would be
 greatly appreciated.

 You'd need to provide more information but to be fair kvm provides no part
 of any nat implementation I'm aware of that can be used by kvm hosted vms.
 That's usually done by qemu or iptables on the host. Which case is defined
 by qemu command line so at the very least, your problem probably also exists
 with a qemu command line that includes -no-kvm and in that case you'd be
 better asking on a qemu list.

 --
 David Mair.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Avi Kivity

On 03/30/2011 06:30 PM, Justin P. Mattock wrote:

On 03/30/2011 09:26 AM, Avi Kivity wrote:

On 03/30/2011 06:19 PM, Justin P. Mattock wrote:

The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
arch/x86/kvm/i8254.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
};

struct kvm_pit {
- unsigned long base_addresss;
+ unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;


Why not remove the variable completely?



didnt even think to completely remove the variable(figured it was used 
somewhere).I will look at that and resend with removal of the variable 
for you..


Well if it was used, you ought to have changed all of the users, no?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Justin P. Mattock

On 03/30/2011 10:17 AM, Avi Kivity wrote:

On 03/30/2011 06:30 PM, Justin P. Mattock wrote:

On 03/30/2011 09:26 AM, Avi Kivity wrote:

On 03/30/2011 06:19 PM, Justin P. Mattock wrote:

The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
arch/x86/kvm/i8254.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
};

struct kvm_pit {
- unsigned long base_addresss;
+ unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;


Why not remove the variable completely?



didnt even think to completely remove the variable(figured it was used
somewhere).I will look at that and resend with removal of the variable
for you..


Well if it was used, you ought to have changed all of the users, no?



at the moment I see:
(keep in mind my reading skills only go so far!)

 grep -Re base_address kvm/* -n
kvm/ioapic.c:276:   return ((addr = ioapic-base_address 
kvm/ioapic.c:277:(addr  ioapic-base_address + 
IOAPIC_MEM_LENGTH)));

kvm/ioapic.c:371:   ioapic-base_address = IOAPIC_DEFAULT_BASE_ADDRESS;
kvm/ioapic.h:38:u64 base_address;

so changing base_addresss; to base_address; gets kvm_ioapic_reset to 
function correctly as well as ioapic_in_range?

(but could be wrong)

Justin P. Mattock






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: better fix for race between nmi injection and enabling nmi window

2011-03-30 Thread Gleb Natapov
On Wed, Mar 30, 2011 at 07:16:34PM +0200, Avi Kivity wrote:
 On 03/30/2011 06:30 PM, Marcelo Tosatti wrote:
 Based on Gleb's idea, fix race between nmi injection and enabling
 nmi window in a simpler way.
 
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index a6a129f..9a7cc1be 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5152,6 +5152,7 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
   static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   {
  int r;
 +int nmi_pending;
  bool req_int_win = !irqchip_in_kernel(vcpu-kvm)
  vcpu-run-request_interrupt_window;
 
 @@ -5195,11 +5196,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  if (unlikely(r))
  goto out;
 
 +nmi_pending = ACCESS_ONCE(vcpu-arch.nmi_pending);
 +
  if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
  inject_pending_event(vcpu);
 
  /* enable NMI/IRQ window open exits if needed */
 -if (vcpu-arch.nmi_pending)
 +if (nmi_pending)
  kvm_x86_ops-enable_nmi_window(vcpu);
  else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
  kvm_x86_ops-enable_irq_window(vcpu);
 
 
 What about the check in inject_pending_events()?
 
Didn't we decide that this check is not a problem? Worst that can happen
is NMI injection will be delayed till next exit.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Jiri Kosina
On Wed, 30 Mar 2011, Avi Kivity wrote:

The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
arch/x86/kvm/i8254.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
};

struct kvm_pit {
- unsigned long base_addresss;
+ unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;
   
   Why not remove the variable completely?
   
  
  didnt even think to completely remove the variable(figured it was used
  somewhere).I will look at that and resend with removal of the variable for
  you..
 
 Well if it was used, you ought to have changed all of the users, no?

I am afraid Justin is not trying to compile-test his patches (I got this 
suspicion after last patchset trying to remove all the includes of 
version.h).

-- 
Jiri Kosina
SUSE Labs, Novell Inc.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]arch:x86:kvm:i8254.h Fix typo in kvm_pit

2011-03-30 Thread Justin P. Mattock

On 03/30/2011 03:21 PM, Jiri Kosina wrote:

On Wed, 30 Mar 2011, Avi Kivity wrote:


The below patch changes base_addresss to base_address.
Note: I have grepped for base_addresss and nothing shows up,
grepping for base_address gets me lots of output, telling me that
this is a typo, but could be wrong.

Signed-off-by: Justin P. Mattockjustinmatt...@gmail.com

---
arch/x86/kvm/i8254.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 46d08ca..c2fa48b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,7 +33,7 @@ struct kvm_kpit_state {
};

struct kvm_pit {
- unsigned long base_addresss;
+ unsigned long base_address;
struct kvm_io_device dev;
struct kvm_io_device speaker_dev;
struct kvm *kvm;


Why not remove the variable completely?



didnt even think to completely remove the variable(figured it was used
somewhere).I will look at that and resend with removal of the variable for
you..


Well if it was used, you ought to have changed all of the users, no?


I am afraid Justin is not trying to compile-test his patches (I got this
suspicion after last patchset trying to remove all the includes of
version.h).




I do remember to do that, but I will be honest there are ones that I 
totally forgot, then remembered after sending out the patch(I admit it I 
am guilty of that)


Think having a checklist is the best thing to follow when doing a patch
(telling yourself yeah Ill remember to do that, never is the best way.

Justin P. Mattock
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-30 Thread Andrew Morton
On Wed, 30 Mar 2011 11:00:26 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 Data from the previous patchsets can be found at
 https://lkml.org/lkml/2010/11/30/79

It would be nice if the data for the current patchset was present in
the current patchset's changelog!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v5)

2011-03-30 Thread Andrew Morton
On Wed, 30 Mar 2011 11:02:38 +0530
Balbir Singh bal...@linux.vnet.ibm.com wrote:

 Changelog v4
 1. Added documentation for max_unmapped_pages
 2. Better #ifdef'ing of max_unmapped_pages and min_unmapped_pages
 
 Changelog v2
 1. Use a config option to enable the code (Andrew Morton)
 2. Explain the magic tunables in the code or at-least attempt
to explain them (General comment)
 3. Hint uses of the boot parameter with unlikely (Andrew Morton)
 4. Use better names (balanced is not a good naming convention)
 
 Provide control using zone_reclaim() and a boot parameter. The
 code reuses functionality from zone_reclaim() to isolate unmapped
 pages and reclaim them as a priority, ahead of other mapped pages.
 

This:

akpm:/usr/src/25 grep '^+#' 
patches/provide-control-over-unmapped-pages-v5.patch 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#else
+#endif
+#ifdef CONFIG_NUMA
+#else
+#define zone_reclaim_mode 0
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+#endif
+#endif
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)

is getting out of control.  What happens if we just make the feature
non-configurable?

 +static int __init unmapped_page_control_parm(char *str)
 +{
 + unmapped_page_control = 1;
 + /*
 +  * XXX: Should we tweak swappiness here?
 +  */
 + return 1;
 +}
 +__setup(unmapped_page_control, unmapped_page_control_parm);

That looks like a pain - it requires a reboot to change the option,
which makes testing harder and slower.  Methinks you're being a bit
virtualization-centric here!

 +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
 +static inline void reclaim_unmapped_pages(int priority,
 + struct zone *zone, struct scan_control *sc)
 +{
 + return 0;
 +}
 +#endif
 +
  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 struct scan_control *sc)
  {
 @@ -2371,6 +2394,12 @@ loop_again:
   shrink_active_list(SWAP_CLUSTER_MAX, zone,
   sc, priority, 0);
  
 + /*
 +  * We do unmapped page reclaim once here and once
 +  * below, so that we don't lose out
 +  */
 + reclaim_unmapped_pages(priority, zone, sc);

Doing this here seems wrong.  balance_pgdat() does two passes across
the zones.  The first pass is a read-only work-out-what-to-do pass and
the second pass is a now-reclaim-some-stuff pass.  But here we've stuck
a do-some-reclaiming operation inside the first, work-out-what-to-do pass.


 @@ -2408,6 +2437,11 @@ loop_again:
   continue;
  
   sc.nr_scanned = 0;
 + /*
 +  * Reclaim unmapped pages upfront, this should be
 +  * really cheap

Comment is mysterious.  Why is it cheap?

 +  */
 + reclaim_unmapped_pages(priority, zone, sc);


I dunno, the whole thing seems rather nasty to me.

It sticks a magical reclaim-unmapped-pages operation right in the
middle of regular page reclaim.  This means that reclaim will walk the
LRU looking at mapped and unmapped pages.  Then it will walk some more,
looking at only unmapped pages and moving the mapped ones to the head
of the LRU.  Then it goes back to looking at mapped and unmapped pages.
So it rather screws up the LRU ordering and page aging, does it not?

Also, the special-case handling sticks out like a sore thumb.  Would it
not be better to manage the mapped/unmapped bias within the core of the
regular scanning?  ie: in shrink_page_list().
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-30 Thread Balbir Singh
* Andrew Morton a...@linux-foundation.org [2011-03-30 16:36:07]:

 On Wed, 30 Mar 2011 11:00:26 +0530
 Balbir Singh bal...@linux.vnet.ibm.com wrote:
 
  Data from the previous patchsets can be found at
  https://lkml.org/lkml/2010/11/30/79
 
 It would be nice if the data for the current patchset was present in
 the current patchset's changelog!


Sure, since there were no major changes, I put in a URL. The main
change was the documentation update. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-30 Thread Andrew Morton
On Thu, 31 Mar 2011 10:57:03 +0530 Balbir Singh bal...@linux.vnet.ibm.com 
wrote:

 * Andrew Morton a...@linux-foundation.org [2011-03-30 16:36:07]:
 
  On Wed, 30 Mar 2011 11:00:26 +0530
  Balbir Singh bal...@linux.vnet.ibm.com wrote:
  
   Data from the previous patchsets can be found at
   https://lkml.org/lkml/2010/11/30/79
  
  It would be nice if the data for the current patchset was present in
  the current patchset's changelog!
 
 
 Sure, since there were no major changes, I put in a URL. The main
 change was the documentation update. 

Well some poor schmuck has to copy and paste the data into the
changelog so it's still there in five years time.  It's better to carry
this info around in the patch's own metedata, and to maintain
and update it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Unmapped page cache control (v5)

2011-03-30 Thread KOSAKI Motohiro
 
 The following series implements page cache control,
 this is a split out version of patch 1 of version 3 of the
 page cache optimization patches posted earlier at
 Previous posting http://lwn.net/Articles/425851/ and analysis
 at http://lwn.net/Articles/419713/
 
 Detailed Description
 
 This patch implements unmapped page cache control via preferred
 page cache reclaim. The current patch hooks into kswapd and reclaims
 page cache if the user has requested for unmapped page control.
 This is useful in the following scenario
 - In a virtualized environment with cache=writethrough, we see
   double caching - (one in the host and one in the guest). As
   we try to scale guests, cache usage across the system grows.
   The goal of this patch is to reclaim page cache when Linux is running
   as a guest and get the host to hold the page cache and manage it.
   There might be temporary duplication, but in the long run, memory
   in the guests would be used for mapped pages.

 - The option is controlled via a boot option and the administrator
   can selectively turn it on, on a need to use basis.
 
 A lot of the code is borrowed from zone_reclaim_mode logic for
 __zone_reclaim(). One might argue that the with ballooning and
 KSM this feature is not very useful, but even with ballooning,
 we need extra logic to balloon multiple VM machines and it is hard
 to figure out the correct amount of memory to balloon. With these
 patches applied, each guest has a sufficient amount of free memory
 available, that can be easily seen and reclaimed by the balloon driver.
 The additional memory in the guest can be reused for additional
 applications or used to start additional guests/balance memory in
 the host.

If anyone think this series works, They are just crazy. This patch reintroduce
two old issues.

1) zone reclaim doesn't work if the system has multiple node and the
   workload is file cache oriented (eg file server, web server, mail server, et 
al). 
   because zone recliam make some much free pages than zone-pages_min and
   then new page cache request consume nearest node memory and then it
   bring next zone reclaim. Then, memory utilization is reduced and
   unnecessary LRU discard is increased dramatically.

   SGI folks added CPUSET specific solution in past. (cpuset.memory_spread_page)
   But global recliam still have its issue. zone recliam is HPC workload 
specific 
   feature and HPC folks has no motivation to don't use CPUSET.

2) Before 2.6.27, VM has only one LRU and calc_reclaim_mapped() is used to
   decide to filter out mapped pages. It made a lot of problems for DB servers
   and large application servers. Because, if the system has a lot of mapped
   pages, 1) LRU was churned and then reclaim algorithm become lotree one. 2)
   reclaim latency become terribly slow and hangup detectors misdetect its
   state and start to force reboot. That was big problem of RHEL5 based banking
   system.
   So, sc-may_unmap should be killed in future. Don't increase uses.

And, this patch introduce new allocator fast path overhead. I haven't seen
any justification for it.

In other words, you have to kill following three for getting ack 1) zone 
reclaim oriented reclaim 2) filter based LRU scanning (eg sc-may_unmap)
3) fastpath overhead. In other words, If you want a feature for vm guest,
Any hardcoded machine configration assumption and/or workload assumption 
are wrong.




But, I agree that now we have to concern slightly large VM change parhaps
(or parhaps not). Ok, it's good opportunity to fill out some thing.
Historically, Linux MM has free memory are waste memory policy, and It
worked completely fine. But now we have a few exceptions.

1) RT, embedded and finance systems. They really hope to avoid reclaim
   latency (ie avoid foreground reclaim completely) and they can accept 
   to make slightly much free pages before memory shortage.

2) VM guest
   VM host and VM guest naturally makes two level page cache model. and
   Linux page cache + two level don't work fine. It has two issues
   1) hard to visualize real memory consumption. That makes harder to 
  works baloon fine. And google want to visualize memory utilization
  to pack in more jobs.
   2) hard to make in kernel memory utilization improvement mechanism.


And, now we have four proposal of utilization related issues.

1) cleancache (from Oracle)
2) VirtFS (from IBM)
3) kstaled (from Google)
4) unmapped page reclaim (from you)

Probably, we can't merge all of them and we need to consolidate some 
requirement and implementations.


cleancache seems most straight forward two level cache handling for
virtalization. but it has soem xen specific mess and, currently, don't fit RT
usage. VirtFS has another interesting de-duplication idea. But filesystem based
implemenation naturally inherit some vfs interface limitations.
Google approach is more unique. memcg don't have double cache
issue, therefore they only want to visualize 

Re: [PATCH 3/3] Provide control over unmapped pages (v5)

2011-03-30 Thread Balbir Singh
* Andrew Morton a...@linux-foundation.org [2011-03-30 16:35:45]:

 On Wed, 30 Mar 2011 11:02:38 +0530
 Balbir Singh bal...@linux.vnet.ibm.com wrote:
 
  Changelog v4
  1. Added documentation for max_unmapped_pages
  2. Better #ifdef'ing of max_unmapped_pages and min_unmapped_pages
  
  Changelog v2
  1. Use a config option to enable the code (Andrew Morton)
  2. Explain the magic tunables in the code or at-least attempt
 to explain them (General comment)
  3. Hint uses of the boot parameter with unlikely (Andrew Morton)
  4. Use better names (balanced is not a good naming convention)
  
  Provide control using zone_reclaim() and a boot parameter. The
  code reuses functionality from zone_reclaim() to isolate unmapped
  pages and reclaim them as a priority, ahead of other mapped pages.
  
 
 This:
 
 akpm:/usr/src/25 grep '^+#' 
 patches/provide-control-over-unmapped-pages-v5.patch 
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#else
 +#endif
 +#ifdef CONFIG_NUMA
 +#else
 +#define zone_reclaim_mode 0
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 +#endif
 +#endif
 +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
 
 is getting out of control.  What happens if we just make the feature
 non-configurable?


I added the configuration based on review comments I received. If the
feature is made non-configurable, it should be easy to remove them or
just set the default value to y in the config.
 
  +static int __init unmapped_page_control_parm(char *str)
  +{
  +   unmapped_page_control = 1;
  +   /*
  +* XXX: Should we tweak swappiness here?
  +*/
  +   return 1;
  +}
  +__setup(unmapped_page_control, unmapped_page_control_parm);
 
 That looks like a pain - it requires a reboot to change the option,
 which makes testing harder and slower.  Methinks you're being a bit
 virtualization-centric here!

:-) The reason for the boot parameter is to ensure that people know
what they are doing.

 
  +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
  +static inline void reclaim_unmapped_pages(int priority,
  +   struct zone *zone, struct scan_control *sc)
  +{
  +   return 0;
  +}
  +#endif
  +
   static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
struct scan_control *sc)
   {
  @@ -2371,6 +2394,12 @@ loop_again:
  shrink_active_list(SWAP_CLUSTER_MAX, zone,
  sc, priority, 0);
   
  +   /*
  +* We do unmapped page reclaim once here and once
  +* below, so that we don't lose out
  +*/
  +   reclaim_unmapped_pages(priority, zone, sc);
 
 Doing this here seems wrong.  balance_pgdat() does two passes across
 the zones.  The first pass is a read-only work-out-what-to-do pass and
 the second pass is a now-reclaim-some-stuff pass.  But here we've stuck
 a do-some-reclaiming operation inside the first, work-out-what-to-do pass.


The reason is primarily for balancing, zone_watermark's do not give us
a good idea of whether unmapped pages are balanced, hence the code.
 
 
  @@ -2408,6 +2437,11 @@ loop_again:
  continue;
   
  sc.nr_scanned = 0;
  +   /*
  +* Reclaim unmapped pages upfront, this should be
  +* really cheap
 
 Comment is mysterious.  Why is it cheap?

Cheap because we do a quick check to see if unmapped pages exceed a
threshold. If selective users enable this functionality (which is
expected), the use case is primarily for embedded and virtualization
folks, this should be a simple check.

 
  +*/
  +   reclaim_unmapped_pages(priority, zone, sc);
 
 
 I dunno, the whole thing seems rather nasty to me.
 
 It sticks a magical reclaim-unmapped-pages operation right in the
 middle of regular page reclaim.  This means that reclaim will walk the
 LRU looking at mapped and unmapped pages.  Then it will walk some more,
 looking at only unmapped pages 

Re: [PATCH v2] KVM: PPC: e500: emulate SVR

2011-03-30 Thread Alexander Graf

On 29.03.2011, at 23:49, Scott Wood wrote:

 Return the actual host SVR for now, as we already do for PVR.  Eventually
 we may support Qemu overriding PVR/SVR if the situation is appropriate,
 once we implement KVM_SET_SREGS on e500.

Haha - maybe I should have read your v2 before replying to the other mail :). 
Thanks!
I'll take this patch into my staging tree.

Avi, please ack.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: PPC: e500: emulate SVR

2011-03-30 Thread Avi Kivity

On 03/30/2011 09:28 AM, Alexander Graf wrote:

On 29.03.2011, at 23:49, Scott Wood wrote:

  Return the actual host SVR for now, as we already do for PVR.  Eventually
  we may support Qemu overriding PVR/SVR if the situation is appropriate,
  once we implement KVM_SET_SREGS on e500.

Haha - maybe I should have read your v2 before replying to the other mail :). 
Thanks!
I'll take this patch into my staging tree.

Avi, please ack.



Looks fine.  As I mentioned on IRC, I'm primarily interested in 
guest/host and kernel/user interfaces.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: PPC: e500: emulate SVR

2011-03-30 Thread Alexander Graf

On 30.03.2011, at 09:56, Avi Kivity wrote:

 On 03/30/2011 09:28 AM, Alexander Graf wrote:
 On 29.03.2011, at 23:49, Scott Wood wrote:
 
   Return the actual host SVR for now, as we already do for PVR.  Eventually
   we may support Qemu overriding PVR/SVR if the situation is appropriate,
   once we implement KVM_SET_SREGS on e500.
 
 Haha - maybe I should have read your v2 before replying to the other mail 
 :). Thanks!
 I'll take this patch into my staging tree.
 
 Avi, please ack.
 
 
 Looks fine.  As I mentioned on IRC, I'm primarily interested in guest/host 
 and kernel/user interfaces.

Ah, that part flew in after I left that client already O_o. Either way, thanks!


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] KVM: PPC: e500: Save/restore SPE state

2011-03-30 Thread Scott Wood
On Wed, 30 Mar 2011 10:17:55 +0200
Alexander Graf ag...@suse.de wrote:

 On 30.03.2011, at 01:43, Scott Wood wrote:
 
  +   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
  +   extern void kvmppc_vcpu_spe_load(struct kvm_vcpu *vcpu);
  +
  +   /* reload the SPE env if guest first use SPE since last save */
  +   if (vcpu-arch.msr_block  MSR_SPE)
  +   kvmppc_vcpu_spe_load(vcpu);
  +
  +   if (!(vcpu-arch.shared-msr  MSR_SPE))
  +   kvmppc_booke_queue_irqprio(vcpu, 
  BOOKE_IRQPRIO_SPE_UNAVAIL);
 
 These are in the wrong order, no? If the guest doesn't do SPE, we can
 save ourselves the loading.

The most likely thing that's going to happen is that the guest turns on
SPE shortly after it receives the SPE unavailable exception.

We could skip it here if guest MSR[SPE] is not set, but then we have to
do it in kvmppc_set_msr() once guest MSR[SPE] gets set.  Otherwise we take an
additional trap in the normal case.

 I was wondering why we need to take the guest's MSR value into account
 in the first place though.  Are there any bits in there that we
 wouldn't catch through a lazy trap?  At the end of the day, the MSR
 will be shared between the guest and the host without traps, so we will
 depend on trapping for events anyways.  Why can't we just keep a guest
 mode MSR field in the vcpu that contains the bits activated inside the
 guest?  This way no bit just accidently slips through and we save a
 good bunch of instructions in the lightweight exit path.

So update things like the real guest-mode MSR[SPE] in kvmppc_set_msr()?

In that case, doing the state transfer there would be cleaner.

  +#ifdef CONFIG_SPE
  +#define KVMPPC_SAVE_EVR(n,s,base)  evmergehi s,s,n; stw s,(4*(n))(base)
 
 Can't we somehow combine those with the normal Linux SAVE_EVR macros?

 All we need to do is define a new set of macros that take a generic
 offset and make the normal macros use that with offset=THREAD_EVR0
 while we use offset=0.  This means we could probably even specify the
 offset directly inside the kvm code.  And maybe even the offset in the
 code that currently uses SAVE_EVR.

Yeah, we could.  I wasn't sure whether it was worth hacking up the
existing macros to break the dependency on THREAD_EVR0, but it shouldn't
be too bad.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/4] KVM: PPC: booke: use shadow_msr

2011-03-30 Thread Scott Wood
Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.

Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu-arch.shared-msr to vcpu-arch.shadow_msr in
kvmppc_set_msr().

While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.

Signed-off-by: Scott Wood scottw...@freescale.com
---
v4 of patchset, first version of this patch

 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/kernel/asm-offsets.c   |2 +-
 arch/powerpc/kvm/booke.c|2 ++
 arch/powerpc/kvm/booke_interrupts.S |   17 ++---
 4 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bba3b9b..072ec7b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -219,11 +219,11 @@ struct kvm_vcpu_arch {
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S
-   ulong shadow_msr;
ulong hflags;
ulong guest_owned_ext;
 #endif
u32 mmucr;
+   ulong shadow_msr;
ulong sprg4;
ulong sprg5;
ulong sprg6;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 23e6a93..5120a63 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -402,12 +402,12 @@ int main(void)
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
+   DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
-   DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, 
arch.trampoline_lowmem));
DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, 
arch.trampoline_enter));
DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, 
arch.highmem_handler));
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ef76acb..1204e1d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -514,6 +514,8 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 
vcpu-arch.pc = 0;
vcpu-arch.shared-msr = 0;
+   vcpu-arch.shadow_msr = MSR_CE | MSR_EE | MSR_PR | MSR_DE |
+   MSR_ME | MSR_IS | MSR_DS;
kvmppc_set_gpr(vcpu, 1, (1620) - 8); /* -8 for the callee-save LR 
slot */
 
vcpu-arch.shadow_pid = 1;
diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index 1cc471f..307771c 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -24,8 +24,6 @@
 #include asm/page.h
 #include asm/asm-offsets.h
 
-#define KVMPPC_MSR_MASK (MSR_CE|MSR_EE|MSR_PR|MSR_DE|MSR_ME|MSR_IS|MSR_DS)
-
 #define VCPU_GPR(n) (VCPU_GPRS + (n * 4))
 
 /* The host stack layout: */
@@ -406,20 +404,17 @@ lightweight_exit:
 
/* Finish loading guest volatiles and jump to guest. */
lwz r3, VCPU_CTR(r4)
+   lwz r5, VCPU_CR(r4)
+   lwz r6, VCPU_PC(r4)
+   lwz r7, VCPU_SHADOW_MSR(r4)
mtctr   r3
-   lwz r3, VCPU_CR(r4)
-   mtcrr3
+   mtcrr5
+   mtsrr0  r6
+   mtsrr1  r7
lwz r5, VCPU_GPR(r5)(r4)
lwz r6, VCPU_GPR(r6)(r4)
lwz r7, VCPU_GPR(r7)(r4)
lwz r8, VCPU_GPR(r8)(r4)
-   lwz r3, VCPU_PC(r4)
-   mtsrr0  r3
-   lwz r3, VCPU_SHARED(r4)
-   lwz r3, (VCPU_SHARED_MSR + 4)(r3)
-   orisr3, r3, KVMPPC_MSR_MASK@h
-   ori r3, r3, KVMPPC_MSR_MASK@l
-   mtsrr1  r3
 
/* Clear any debug events which occurred since we disabled MSR[DE].
 * XXX This gives us a 3-instruction window in which a breakpoint
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/4] powerpc/e500: SPE register saving: take arbitrary struct offset

2011-03-30 Thread Scott Wood
This allows reuse for saving/restoring KVM SPE state.

Signed-off-by: Scott Wood scottw...@freescale.com
---
v4 of patchset, first version of this patch

Kumar, please ack (or comment).

 arch/powerpc/include/asm/ppc_asm.h   |   28 
 arch/powerpc/kernel/head_fsl_booke.S |6 +++---
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 9821006..ba0cd33 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -150,18 +150,22 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
 #define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); 
REST_16VSRSU(n+16,b,base)
 
-#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
-#define SAVE_2EVRS(n,s,base)   SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
-#define SAVE_4EVRS(n,s,base)   SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
-#define SAVE_8EVRS(n,s,base)   SAVE_4EVRS(n,s,base); SAVE_4EVRS(n+4,s,base)
-#define SAVE_16EVRS(n,s,base)  SAVE_8EVRS(n,s,base); SAVE_8EVRS(n+8,s,base)
-#define SAVE_32EVRS(n,s,base)  SAVE_16EVRS(n,s,base); SAVE_16EVRS(n+16,s,base)
-#define REST_EVR(n,s,base) lwz s,THREAD_EVR0+4*(n)(base); evmergelo n,s,n
-#define REST_2EVRS(n,s,base)   REST_EVR(n,s,base); REST_EVR(n+1,s,base)
-#define REST_4EVRS(n,s,base)   REST_2EVRS(n,s,base); REST_2EVRS(n+2,s,base)
-#define REST_8EVRS(n,s,base)   REST_4EVRS(n,s,base); REST_4EVRS(n+4,s,base)
-#define REST_16EVRS(n,s,base)  REST_8EVRS(n,s,base); REST_8EVRS(n+8,s,base)
-#define REST_32EVRS(n,s,base)  REST_16EVRS(n,s,base); REST_16EVRS(n+16,s,base)
+/*
+ * b = base register for addressing, o = base offset from register of 1st EVR
+ * n = first EVR, s = scratch
+ */
+#define SAVE_EVR(n,s,b,o)  evmergehi s,s,n; stw s,o+4*(n)(b)
+#define SAVE_2EVRS(n,s,b,o)SAVE_EVR(n,s,b,o); SAVE_EVR(n+1,s,b,o)
+#define SAVE_4EVRS(n,s,b,o)SAVE_2EVRS(n,s,b,o); SAVE_2EVRS(n+2,s,b,o)
+#define SAVE_8EVRS(n,s,b,o)SAVE_4EVRS(n,s,b,o); SAVE_4EVRS(n+4,s,b,o)
+#define SAVE_16EVRS(n,s,b,o)   SAVE_8EVRS(n,s,b,o); SAVE_8EVRS(n+8,s,b,o)
+#define SAVE_32EVRS(n,s,b,o)   SAVE_16EVRS(n,s,b,o); SAVE_16EVRS(n+16,s,b,o)
+#define REST_EVR(n,s,b,o)  lwz s,o+4*(n)(b); evmergelo n,s,n
+#define REST_2EVRS(n,s,b,o)REST_EVR(n,s,b,o); REST_EVR(n+1,s,b,o)
+#define REST_4EVRS(n,s,b,o)REST_2EVRS(n,s,b,o); REST_2EVRS(n+2,s,b,o)
+#define REST_8EVRS(n,s,b,o)REST_4EVRS(n,s,b,o); REST_4EVRS(n+4,s,b,o)
+#define REST_16EVRS(n,s,b,o)   REST_8EVRS(n,s,b,o); REST_8EVRS(n+8,s,b,o)
+#define REST_32EVRS(n,s,b,o)   REST_16EVRS(n,s,b,o); REST_16EVRS(n+16,s,b,o)
 
 /* Macros to adjust thread priority for hardware multithreading */
 #define HMT_VERY_LOW   or  31,31,31# very low priority
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b84fc5e..e234153 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -656,7 +656,7 @@ load_up_spe:
cmpi0,r4,0
beq 1f
addir4,r4,THREAD/* want THREAD of last_task_used_spe */
-   SAVE_32EVRS(0,r10,r4)
+   SAVE_32EVRS(0,r10,r4,THREAD_EVR0)
evxor   evr10, evr10, evr10 /* clear out evr10 */
evmwumiaa evr10, evr10, evr10   /* evr10 - ACC = 0 * 0 + ACC */
li  r5,THREAD_ACC
@@ -676,7 +676,7 @@ load_up_spe:
stw r4,THREAD_USED_SPE(r5)
evlddx  evr4,r10,r5
evmra   evr4,evr4
-   REST_32EVRS(0,r10,r5)
+   REST_32EVRS(0,r10,r5,THREAD_EVR0)
 #ifndef CONFIG_SMP
subir4,r5,THREAD
stw r4,last_task_used_spe@l(r3)
@@ -787,7 +787,7 @@ _GLOBAL(giveup_spe)
addir3,r3,THREAD/* want THREAD of task */
lwz r5,PT_REGS(r3)
cmpi0,r5,0
-   SAVE_32EVRS(0, r4, r3)
+   SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
evxor   evr6, evr6, evr6/* clear out evr6 */
evmwumiaa evr6, evr6, evr6  /* evr6 - ACC = 0 * 0 + ACC */
li  r4,THREAD_ACC
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/4] powerpc/e500: Save SPEFCSR in flush_spe_to_thread()

2011-03-30 Thread yu liu
giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().

This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.

Signed-off-by: Liu Yu yu@freescale.com
Signed-off-by: Scott Wood scottw...@freescale.com
---
v4: no change (other than subject prefix)

Kumar, please ack (or comment).

 arch/powerpc/kernel/head_fsl_booke.S |2 --
 arch/powerpc/kernel/process.c|1 +
 arch/powerpc/kernel/traps.c  |5 +
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 3e02710..b84fc5e 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -792,8 +792,6 @@ _GLOBAL(giveup_spe)
evmwumiaa evr6, evr6, evr6  /* evr6 - ACC = 0 * 0 + ACC */
li  r4,THREAD_ACC
evstddx evr6, r4, r3/* save off accumulator */
-   mfspr   r6,SPRN_SPEFSCR
-   stw r6,THREAD_SPEFSCR(r3)   /* save spefscr register value */
beq 1f
lwz r4,_MSR-STACK_FRAME_OVERHEAD(r5)
lis r3,MSR_SPE@h
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index f74f355..138e7dd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -213,6 +213,7 @@ void flush_spe_to_thread(struct task_struct *tsk)
 #ifdef CONFIG_SMP
BUG_ON(tsk != current);
 #endif
+   tsk-thread.spefscr = mfspr(SPRN_SPEFSCR);
giveup_spe(tsk);
}
preempt_enable();
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index bd74fac..0ed23d1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1356,10 +1356,7 @@ void SPEFloatingPointException(struct pt_regs *regs)
int code = 0;
int err;
 
-   preempt_disable();
-   if (regs-msr  MSR_SPE)
-   giveup_spe(current);
-   preempt_enable();
+   flush_spe_to_thread(current);
 
spefscr = current-thread.spefscr;
fpexc_mode = current-thread.fpexc_mode;
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] KVM: PPC: e500: Save/restore SPE state

2011-03-30 Thread Scott Wood
On Thu, 31 Mar 2011 00:43:27 +0200
Alexander Graf ag...@suse.de wrote:

  @@ -78,6 +79,43 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
  }
  }
  
  +#ifdef CONFIG_SPE
  +static void kvmppc_vcpu_enable_spe(struct kvm_vcpu *vcpu)
  +{
  +   enable_kernel_spe();
  +   kvmppc_load_guest_spe(vcpu);
 
 Are you sure this is only ever called from !preempt code?

Hmm, I guess not.  We should always have the preempt notifier in place,
though, so it should suffice to stick preempt_disable/enable() at the
beginning/end of this function.  If we get preempted after that the
notifier will clean up.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] KVM: PPC: e500: Save/restore SPE state

2011-03-30 Thread Alexander Graf

On 31.03.2011, at 00:56, Scott Wood wrote:

 On Thu, 31 Mar 2011 00:43:27 +0200
 Alexander Graf ag...@suse.de wrote:
 
 @@ -78,6 +79,43 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
 }
 }
 
 +#ifdef CONFIG_SPE
 +static void kvmppc_vcpu_enable_spe(struct kvm_vcpu *vcpu)
 +{
 +   enable_kernel_spe();
 +   kvmppc_load_guest_spe(vcpu);
 
 Are you sure this is only ever called from !preempt code?
 
 Hmm, I guess not.  We should always have the preempt notifier in place,
 though, so it should suffice to stick preempt_disable/enable() at the
 beginning/end of this function.  If we get preempted after that the
 notifier will clean up.

If we stick preempt_enable()/preempt_disable in the function, preemption will 
be enabled when being disabled before, right? So we wouldn't be able to call 
this from vcpu_load for example, as that occurs within the preempt notifier 
callback.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/4] KVM: PPC: e500: Save/restore SPE state

2011-03-30 Thread Scott Wood
This is done lazily.  The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit.  Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.

For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.

Signed-off-by: Liu Yu yu@freescale.com
Signed-off-by: Scott Wood scottw...@freescale.com
---
v5: disable preemption when restoring SPE state

Saving SPE state is only done from a preempt notifier or vcpu_put(),
where preemption is already disabled.

The other patches in this series are the same as v4.

 arch/powerpc/include/asm/kvm_host.h  |6 +++
 arch/powerpc/include/asm/reg_booke.h |1 +
 arch/powerpc/kernel/asm-offsets.c|6 +++
 arch/powerpc/kvm/booke.c |   72 +-
 arch/powerpc/kvm/booke.h |   18 ++---
 arch/powerpc/kvm/booke_interrupts.S  |   40 +++
 arch/powerpc/kvm/e500.c  |   19 -
 7 files changed, 145 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 072ec7b..a3810ab 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -195,6 +195,12 @@ struct kvm_vcpu_arch {
u64 fpr[32];
u64 fpscr;
 
+#ifdef CONFIG_SPE
+   ulong evr[32];
+   ulong spefscr;
+   ulong host_spefscr;
+   u64 acc;
+#endif
 #ifdef CONFIG_ALTIVEC
vector128 vr[32];
vector128 vscr;
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 3b1a9b7..2705f9a 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -312,6 +312,7 @@
 #define ESR_ILK0x0010  /* Instr. Cache Locking */
 #define ESR_PUO0x0004  /* Unimplemented Operation 
exception */
 #define ESR_BO 0x0002  /* Byte Ordering */
+#define ESR_SPV0x0080  /* Signal Processing operation 
*/
 
 /* Bit definitions related to the DBCR0. */
 #if defined(CONFIG_40x)
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5120a63..4d39f2d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -494,6 +494,12 @@ int main(void)
DEFINE(TLBCAM_MAS3, offsetof(struct tlbcam, MAS3));
DEFINE(TLBCAM_MAS7, offsetof(struct tlbcam, MAS7));
 #endif
+#ifdef CONFIG_SPE
+   DEFINE(VCPU_EVR, offsetof(struct kvm_vcpu, arch.evr[0]));
+   DEFINE(VCPU_ACC, offsetof(struct kvm_vcpu, arch.acc));
+   DEFINE(VCPU_SPEFSCR, offsetof(struct kvm_vcpu, arch.spefscr));
+   DEFINE(VCPU_HOST_SPEFSCR, offsetof(struct kvm_vcpu, arch.host_spefscr));
+#endif /* CONFIG_SPE */
 
 #ifdef CONFIG_KVM_EXIT_TIMING
DEFINE(VCPU_TIMING_EXIT_TBU, offsetof(struct kvm_vcpu,
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1204e1d..0cab0b7 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -13,6 +13,7 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  *
  * Copyright IBM Corp. 2007
+ * Copyright 2010-2011 Freescale Semiconductor, Inc.
  *
  * Authors: Hollis Blanchard holl...@us.ibm.com
  *  Christian Ehrhardt ehrha...@linux.vnet.ibm.com
@@ -78,6 +79,45 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
}
 }
 
+#ifdef CONFIG_SPE
+static void kvmppc_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+   preempt_disable();
+   enable_kernel_spe();
+   kvmppc_load_guest_spe(vcpu);
+   vcpu-arch.shadow_msr |= MSR_SPE;
+   preempt_enable();
+}
+
+static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
+{
+   if (!(vcpu-arch.shadow_msr  MSR_SPE) 
+   vcpu-arch.shared-msr  MSR_SPE)
+   kvmppc_vcpu_enable_spe(vcpu);
+}
+#else
+static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
+{
+}
+#endif
+
+/* Helper function for full MSR writes. No need to call this if only EE is
+ * changing. */
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
+{
+   if ((new_msr  MSR_PR) != (vcpu-arch.shared-msr  MSR_PR))
+   kvmppc_mmu_priv_switch(vcpu, new_msr  MSR_PR);
+
+   vcpu-arch.shared-msr = new_msr;
+
+   if (vcpu-arch.shared-msr  MSR_WE) {
+   kvm_vcpu_block(vcpu);
+   kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
+   };
+
+   kvmppc_vcpu_sync_spe(vcpu);
+}
+
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
unsigned int priority)
 {
@@ -344,10 +384,16 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
 
-   case BOOKE_INTERRUPT_SPE_UNAVAIL:
-   kvmppc_booke_queue_irqprio(vcpu, 

RE: [PATCH v5 4/4] KVM: PPC: e500: Save/restore SPE state

2011-03-30 Thread Liu Yu-B13201
 

 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org 
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Scott Wood
 Sent: Thursday, March 31, 2011 7:35 AM
 To: ag...@suse.de
 Cc: kvm-ppc@vger.kernel.org
 Subject: [PATCH v5 4/4] KVM: PPC: e500: Save/restore SPE state
 
 This is done lazily.  The SPE save will be done only if the guest has
 used SPE since the last preemption or heavyweight exit.  
 Restore will be
 done only on demand, when enabling MSR_SPE in the shadow MSR, 
 in response
 to an SPE fault or mtmsr emulation.
 
 For SPEFSCR, Linux already switches it on context switch 
 (non-lazily), so
 the only remaining bit is to save it between qemu and the guest.
 
 Signed-off-by: Liu Yu yu@freescale.com
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
 v5: disable preemption when restoring SPE state
 
 Saving SPE state is only done from a preempt notifier or vcpu_put(),
 where preemption is already disabled.
 
 The other patches in this series are the same as v4.
 
  arch/powerpc/include/asm/kvm_host.h  |6 +++
  arch/powerpc/include/asm/reg_booke.h |1 +
  arch/powerpc/kernel/asm-offsets.c|6 +++
  arch/powerpc/kvm/booke.c |   72 
 +-
  arch/powerpc/kvm/booke.h |   18 ++---
  arch/powerpc/kvm/booke_interrupts.S  |   40 +++
  arch/powerpc/kvm/e500.c  |   19 -
  7 files changed, 145 insertions(+), 17 deletions(-)
 

I think the patch miss the bit to handle the case that
if guest clear the MSR_SPE.

Thanks,
Yu

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html