RE: Installation of Windows 8 hangs with KVM

2013-01-07 Thread Ren, Yongjie
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
 On Behalf Of Stefan Pietsch
 Sent: Monday, January 07, 2013 2:25 AM
 To: Gleb Natapov
 Cc: kvm@vger.kernel.org
 Subject: Re: Installation of Windows 8 hangs with KVM
 
 * Gleb Natapov g...@redhat.com [2013-01-06 11:11]:
  On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote:
   Hi all,
  
   when I run KVM with this command the Windows 8 installation stops
 with
   error code 0x005D:
   kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso
  
   After adding the option -cpu host the installation proceeds to a black
   screen and hangs.
  
   With Virtualbox the installation succeeds.
   The host CPU is an Intel Core Duo L2400.
  
   Do you have any suggestions?
  
  
  What is your kernel/qemu version?
 
 I'm using Debian unstable.
 
 qemu-kvm 1.1.2+dfsg-3
 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc
 version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2
 
you met issue only for 32bit Win8 (not 64 bit Win8), right?
I think it's the same issue as the below bug I reported.
https://bugs.launchpad.net/qemu/+bug/1007269
You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line.

This should be a known issue which is caused by missing 'SEP' CPU flag.
See another bug in Redhat bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=821741


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Installation of Windows 8 hangs with KVM

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote:
  -Original Message-
  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
  On Behalf Of Stefan Pietsch
  Sent: Monday, January 07, 2013 2:25 AM
  To: Gleb Natapov
  Cc: kvm@vger.kernel.org
  Subject: Re: Installation of Windows 8 hangs with KVM
  
  * Gleb Natapov g...@redhat.com [2013-01-06 11:11]:
   On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote:
Hi all,
   
when I run KVM with this command the Windows 8 installation stops
  with
error code 0x005D:
kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso
   
After adding the option -cpu host the installation proceeds to a black
screen and hangs.
   
With Virtualbox the installation succeeds.
The host CPU is an Intel Core Duo L2400.
   
Do you have any suggestions?
   
   
   What is your kernel/qemu version?
  
  I'm using Debian unstable.
  
  qemu-kvm 1.1.2+dfsg-3
  Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc
  version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2
  
 you met issue only for 32bit Win8 (not 64 bit Win8), right?
 I think it's the same issue as the below bug I reported.
 https://bugs.launchpad.net/qemu/+bug/1007269
 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line.
 
 This should be a known issue which is caused by missing 'SEP' CPU flag.
 See another bug in Redhat bugzilla.
 https://bugzilla.redhat.com/show_bug.cgi?id=821741
 
That was RHEL kernel bug. Doubt Debian one has it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)

2013-01-07 Thread Stefan Hajnoczi
On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote:
 What's the different with below combos?

The difference is historical, it's just how the command-line options
evolved over time.

 1)qemu --enable-kvm

The old way.  Still useful because it's slightly easier to type than
--machine accel=kvm.

 2)qemu accel=kvm

The modern way.

 3)qemu without above parameters when kvm kmod has been load

There is a difference in behavior between QEMU and qemu-kvm here:

QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko
module has been loaded or not.  qemu-kvm uses KVM by default, if
available.

The qemu-kvm fork has been retired so it's best not to rely on this
behavior.  Future distro packages will be built from QEMU and unless a
code change is made, the default accelerator is TCG.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)

2013-01-07 Thread lei yang
On Mon, Jan 7, 2013 at 4:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote:
 What's the different with below combos?

 The difference is historical, it's just how the command-line options
 evolved over time.

 1)qemu --enable-kvm

 The old way.  Still useful because it's slightly easier to type than
 --machine accel=kvm.

 2)qemu accel=kvm

 The modern way.

 3)qemu without above parameters when kvm kmod has been load

 There is a difference in behavior between QEMU and qemu-kvm here:

 QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko
 module has been loaded or not.  qemu-kvm uses KVM by default, if
 available.

 The qemu-kvm fork has been retired so it's best not to rely on this
 behavior.  Future distro packages will be built from QEMU and unless a
 code change is made, the default accelerator is TCG.


Thanks fro the explain

So if we want use kvm we need to explicitly add --enable-kvm or   accel=kvm
regardless kvm.ko load or not

How can we check we are using TCG or KVM, can we check this in guestos
or check this with monitor
can you show me the exactly command?

Lei


 Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Installation of Windows 8 hangs with KVM

2013-01-07 Thread Ren, Yongjie
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
 On Behalf Of Gleb Natapov
 Sent: Monday, January 07, 2013 4:54 PM
 To: Ren, Yongjie
 Cc: Stefan Pietsch; kvm@vger.kernel.org
 Subject: Re: Installation of Windows 8 hangs with KVM
 
 On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote:
   -Original Message-
   From: kvm-ow...@vger.kernel.org
 [mailto:kvm-ow...@vger.kernel.org]
   On Behalf Of Stefan Pietsch
   Sent: Monday, January 07, 2013 2:25 AM
   To: Gleb Natapov
   Cc: kvm@vger.kernel.org
   Subject: Re: Installation of Windows 8 hangs with KVM
  
   * Gleb Natapov g...@redhat.com [2013-01-06 11:11]:
On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote:
 Hi all,

 when I run KVM with this command the Windows 8 installation
 stops
   with
 error code 0x005D:
 kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso

 After adding the option -cpu host the installation proceeds to a
 black
 screen and hangs.

 With Virtualbox the installation succeeds.
 The host CPU is an Intel Core Duo L2400.

 Do you have any suggestions?


What is your kernel/qemu version?
  
   I'm using Debian unstable.
  
   qemu-kvm 1.1.2+dfsg-3
   Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc
   version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2
  
  you met issue only for 32bit Win8 (not 64 bit Win8), right?
  I think it's the same issue as the below bug I reported.
  https://bugs.launchpad.net/qemu/+bug/1007269
  You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command
 line.
 
  This should be a known issue which is caused by missing 'SEP' CPU flag.
  See another bug in Redhat bugzilla.
  https://bugzilla.redhat.com/show_bug.cgi?id=821741
 
 That was RHEL kernel bug. Doubt Debian one has it.
 
I don't think so. It should be a qemu bug (also described in that RHEL 
bugzilla).
In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu 
SandyBridge,+sep' in qemu-kvm CLI.
But it can't boot up with '-cpu SandyBridge'.

 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

2013-01-07 Thread Gleb Natapov
On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
 Reading the spec, it is clear that most modes normally leave the IRQ
 output line high, and only pulse it low to generate a leading edge.
 Especially the most commonly used mode 2.
 
 The KVM i8254 model does not try to emulate the duration of the pulse at
 all, so just swap the high/low settings it to leave it high most of
 the time.
 
 This fix is a prerequisite to improving the i8259 model to handle
 the trailing edge of an interupt request as indicated in its spec:
 If it gets a trailing edge of an IRQ line before it starts to service
 the interrupt, the request should be canceled.
 
 See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
 or search the net for 23124406.pdf.
 
 Risks:
 
 There is a risk that migrating a running guest between versions
 with and without this patch will lose or gain a single timer
 interrupt during the migration process.  The only case where
Can you elaborate on how exactly this can happen? Do not see it.

 this is likely to be serious is probably losing a single-shot (mode 4)
 interrupt, but if my understanding of how things work is good, then
 that should only be possible if a whole slew of conditions are
 all met:
 
  1. The guest is configured to run in a tickless mode (like
 modern Linux).
  2. The guest is for some reason still using the i8254 rather
 than something more modern like an HPET.  (The combination
 of 1 and 2 should be rare.)
This is not so rare. For performance reason it is better to not have
HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.

  3. The migration is going from a fixed version back to the
 old version.  (Not sure how common this is, but it should
 be rarer than migrating from old to new.)
  4. There are not going to be any timely events/interrupts
 (keyboard, network, process sleeps, etc) that cause the guest
 to reset the PIT mode 4 one-shot counter soon enough.
 
 This combination should be rare enough that more complicated
 solutions are not worth the effort.
 
 Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
 ---
  arch/x86/kvm/i8254.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index c1d30b2..cd4ec60 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
   }
   spin_unlock(ps-inject_lock);
   if (inject) {
 - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
 + /* Clear previous interrupt, then create a rising
 +  * edge to request another interupt, and leave it at
 +  * level=1 until time to inject another one.
 +  */
   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
 + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
  
   /*
* Provides NMI watchdog support via Virtual Wire mode.
 -- 
 1.7.10.2.484.gcd07cc5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Installation of Windows 8 hangs with KVM

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 09:13:37AM +, Ren, Yongjie wrote:
  -Original Message-
  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
  On Behalf Of Gleb Natapov
  Sent: Monday, January 07, 2013 4:54 PM
  To: Ren, Yongjie
  Cc: Stefan Pietsch; kvm@vger.kernel.org
  Subject: Re: Installation of Windows 8 hangs with KVM
  
  On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote:
-Original Message-
From: kvm-ow...@vger.kernel.org
  [mailto:kvm-ow...@vger.kernel.org]
On Behalf Of Stefan Pietsch
Sent: Monday, January 07, 2013 2:25 AM
To: Gleb Natapov
Cc: kvm@vger.kernel.org
Subject: Re: Installation of Windows 8 hangs with KVM
   
* Gleb Natapov g...@redhat.com [2013-01-06 11:11]:
 On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote:
  Hi all,
 
  when I run KVM with this command the Windows 8 installation
  stops
with
  error code 0x005D:
  kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso
 
  After adding the option -cpu host the installation proceeds to a
  black
  screen and hangs.
 
  With Virtualbox the installation succeeds.
  The host CPU is an Intel Core Duo L2400.
 
  Do you have any suggestions?
 
 
 What is your kernel/qemu version?
   
I'm using Debian unstable.
   
qemu-kvm 1.1.2+dfsg-3
Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2
   
   you met issue only for 32bit Win8 (not 64 bit Win8), right?
   I think it's the same issue as the below bug I reported.
   https://bugs.launchpad.net/qemu/+bug/1007269
   You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command
  line.
  
   This should be a known issue which is caused by missing 'SEP' CPU flag.
   See another bug in Redhat bugzilla.
   https://bugzilla.redhat.com/show_bug.cgi?id=821741
  
  That was RHEL kernel bug. Doubt Debian one has it.
  
 I don't think so. It should be a qemu bug (also described in that RHEL 
 bugzilla).
https://bugzilla.redhat.com/show_bug.cgi?id=821463 is the kernel one.

 In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu 
 SandyBridge,+sep' in qemu-kvm CLI.
 But it can't boot up with '-cpu SandyBridge'.
 
Which qemu version? Master has sep in SandyBridge definition. In any
case -cpu host should have sep enabled.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Eduardo Habkost
On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
  This is a cleanup that tries to solve two small issues:
  
   - We don't need a separate kvm_pv_eoi_features variable just to keep a
 constant calculated at compile-time, and this style would require
 adding a separate variable (that's declared twice because of the
 CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
 by machine-type compat code.
   - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
 even when KVM is disabled at runtime. This small incosistency in
 the cpuid_kvm_features field isn't a problem today because
 cpuid_kvm_features is ignored by the TCG code, but it may cause
 unexpected problems later when refactoring the CPUID handling code.
  
  This patch eliminates the kvm_pv_eoi_features variable and simply uses
  CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
  function, so it enables kvm_pv_eoi only if KVM is enabled. I believe
  this makes the behavior of enable_kvm_pv_eoi() clearer and easier to
  understand.
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Cc: kvm@vger.kernel.org
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  
  Changes v2:
   - Coding style fix
  ---
   target-i386/cpu.c | 8 +---
   1 file changed, 5 insertions(+), 3 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 82685dc..e6435da 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1  
  KVM_FEATURE_CLOCKSOURCE) |
   (1  KVM_FEATURE_ASYNC_PF) |
   (1  KVM_FEATURE_STEAL_TIME) |
   (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
  -static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
   #else
   static uint32_t kvm_default_features = 0;
  -static const uint32_t kvm_pv_eoi_features = 0;
   #endif
   
   void enable_kvm_pv_eoi(void)
   {
  -kvm_default_features |= kvm_pv_eoi_features;
  +#ifdef CONFIG_KVM
 You do not need ifdef here.

We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is
set.

I could also write it as:

if (kvm_enabled()) {
#ifdef CONFIG_KVM
kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
#endif
}

But I find it less readable.


 
  +if (kvm_enabled()) {
  +kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
  +}
  +#endif
   }
   
   void host_cpuid(uint32_t function, uint32_t count,
  -- 
  1.7.11.7
 
 --
   Gleb.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
 On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
  On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
   This is a cleanup that tries to solve two small issues:
   
- We don't need a separate kvm_pv_eoi_features variable just to keep a
  constant calculated at compile-time, and this style would require
  adding a separate variable (that's declared twice because of the
  CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
  by machine-type compat code.
- The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
  even when KVM is disabled at runtime. This small incosistency in
  the cpuid_kvm_features field isn't a problem today because
  cpuid_kvm_features is ignored by the TCG code, but it may cause
  unexpected problems later when refactoring the CPUID handling code.
   
   This patch eliminates the kvm_pv_eoi_features variable and simply uses
   CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
   function, so it enables kvm_pv_eoi only if KVM is enabled. I believe
   this makes the behavior of enable_kvm_pv_eoi() clearer and easier to
   understand.
   
   Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   ---
   Cc: kvm@vger.kernel.org
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Gleb Natapov g...@redhat.com
   Cc: Marcelo Tosatti mtosa...@redhat.com
   
   Changes v2:
- Coding style fix
   ---
target-i386/cpu.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
   
   diff --git a/target-i386/cpu.c b/target-i386/cpu.c
   index 82685dc..e6435da 100644
   --- a/target-i386/cpu.c
   +++ b/target-i386/cpu.c
   @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1  
   KVM_FEATURE_CLOCKSOURCE) |
(1  KVM_FEATURE_ASYNC_PF) |
(1  KVM_FEATURE_STEAL_TIME) |
(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
   -static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
#else
static uint32_t kvm_default_features = 0;
   -static const uint32_t kvm_pv_eoi_features = 0;
#endif

void enable_kvm_pv_eoi(void)
{
   -kvm_default_features |= kvm_pv_eoi_features;
   +#ifdef CONFIG_KVM
  You do not need ifdef here.
 
 We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is
 set.
 
 I could also write it as:
 
 if (kvm_enabled()) {
 #ifdef CONFIG_KVM
 kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
 #endif
 }
 
 But I find it less readable.
 
 
Why not define KVM_FEATURE_PV_EOI unconditionally?

  
   +if (kvm_enabled()) {
   +kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
   +}
   +#endif
}

void host_cpuid(uint32_t function, uint32_t count,
   -- 
   1.7.11.7
  
  --
  Gleb.
 
 -- 
 Eduardo

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 02/11] target-i386: Disable kvm_mmu_op by default on pc-1.4

2013-01-07 Thread Eduardo Habkost
On Sun, Jan 06, 2013 at 03:38:28PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:03PM -0200, Eduardo Habkost wrote:
  The kvm_mmu_op feature was removed from the kernel since v3.3 (released
  in March 2012), it was marked for removal since January 2011 and it's
  slower than shadow or hardware assisted paging (see kernel commit
  fb92045843). It doesn't make sense to keep it enabled by default.
  
 Actually it was effectively removed Oct 1 2009 by a68a6a7282373. After 3
 and a half years of not having it I think we can safely drop it without
 trying to preserve it in older machine types.

Agreed. Especially considering that the check/enforce code for KVM flags
is currently broken. So probably people using pc-1.0, pc-1.1, pc-1.2 are
probably _not_ getting the kvm_mmu feature exposed to the guest.

 
  Also, keeping it enabled by default would cause unnecessary hassle when
  libvirt start using the enforce option.
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Cc: kvm@vger.kernel.org
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  Cc: libvir-l...@redhat.com
  Cc: Jiri Denemark jdene...@redhat.com
  
  I was planning to reverse the logic of the compat init functions and
  make pc_init_pci_1_3() enable kvm_mmu_op and then call pc_init_pci_1_4()
  instead. But that would require changing pc_init_pci_no_kvmclock() and
  pc_init_isa() as well. So to keep the changes simple, I am keeping the
  pattern used when pc_init_pci_1_3() was introduced, making
  pc_init_pci_1_4() disable kvm_mmu_op and then call pc_init_pci_1_3().
  
  Changes v2:
   - Coding style fix
   - Removed redundant comments above machine init functions
  ---
   hw/pc_piix.c  | 9 -
   target-i386/cpu.c | 9 +
   target-i386/cpu.h | 1 +
   3 files changed, 18 insertions(+), 1 deletion(-)
  
  diff --git a/hw/pc_piix.c b/hw/pc_piix.c
  index 99747a7..a32af6a 100644
  --- a/hw/pc_piix.c
  +++ b/hw/pc_piix.c
  @@ -217,6 +217,7 @@ static void pc_init1(MemoryRegion *system_memory,
   }
   }
   
  +/* machine init function for pc-0.14 - pc-1.2 */
   static void pc_init_pci(QEMUMachineInitArgs *args)
   {
   ram_addr_t ram_size = args-ram_size;
  @@ -238,6 +239,12 @@ static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
   pc_init_pci(args);
   }
   
  +static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
  +{
  +disable_kvm_mmu_op();
  +pc_init_pci_1_3(args);
  +}
  +
   static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)
   {
   ram_addr_t ram_size = args-ram_size;
  @@ -285,7 +292,7 @@ static QEMUMachine pc_machine_v1_4 = {
   .name = pc-1.4,
   .alias = pc,
   .desc = Standard PC,
  -.init = pc_init_pci_1_3,
  +.init = pc_init_pci_1_4,
   .max_cpus = 255,
   .is_default = 1,
   };
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index e6435da..c83a566 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -158,6 +158,15 @@ void enable_kvm_pv_eoi(void)
   #endif
   }
   
  +void disable_kvm_mmu_op(void)
  +{
  +#ifdef CONFIG_KVM
 No need for ifdef here too.

Same case of the previous patch: KVM_FEATURE_MMU_OP is available only if
CONFIG_KVM is set.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 05/11] target-i386: check/enforce: Fix CPUID leaf numbers on error messages

2013-01-07 Thread Eduardo Habkost
On Sun, Jan 06, 2013 at 04:12:54PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:06PM -0200, Eduardo Habkost wrote:
  The -cpu check/enforce warnings are printing incorrect information about the
  missing flags. There are no feature flags on CPUID leaves 0 and 0x8000, 
  but
  there were references to 0 and 0x8000 in the table at
  kvm_check_features_against_host().
  
  This changes the model_features_t struct to contain the register number as
  well, so the error messages print the correct CPUID leaf+register 
  information,
  instead of wrong CPUID leaf numbers.
  
  This also changes the format of the error messages, so they follow the
  CPUID.leaf.register.name [bit offset] convention used on Intel
  documentation. Example output:
  
  $ qemu-system-x86_64 -machine pc-1.0,accel=kvm -cpu 
  Opteron_G4,+ia64,enforce
  warning: host doesn't support requested feature: CPUID.01H:EDX.ia64 
  [bit 30]
  warning: host doesn't support requested feature: CPUID.01H:ECX.xsave 
  [bit 26]
  warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 
  28]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.abm [bit 5]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.sse4a [bit 6]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.misalignsse [bit 7]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.3dnowprefetch [bit 8]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.xop [bit 11]
  warning: host doesn't support requested feature: 
  CPUID.8001H:ECX.fma4 [bit 16]
  Unable to find x86 CPU definition
  $
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 Reviewed-by: Gleb Natapov g...@redhat.com
 But see the question below.
 
  ---
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  Cc: kvm@vger.kernel.org
  
  Changes v2:
   - Coding style fixes
   - Add assert() for invalid register numbers on
 unavailable_host_feature()
  ---
   target-i386/cpu.c | 42 +-
   target-i386/cpu.h |  3 +++
   2 files changed, 36 insertions(+), 9 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index e916ae0..c3e5db8 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -124,6 +124,25 @@ static const char *cpuid_7_0_ebx_feature_name[] = {
   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
   };
   
  +const char *get_register_name_32(unsigned int reg)
  +{
  +static const char *reg_names[CPU_NB_REGS32] = {
  +[R_EAX] = EAX,
  +[R_ECX] = ECX,
  +[R_EDX] = EDX,
  +[R_EBX] = EBX,
  +[R_ESP] = ESP,
  +[R_EBP] = EBP,
  +[R_ESI] = ESI,
  +[R_EDI] = EDI,
  +};
  +
  +if (reg  CPU_NB_REGS32) {
  +return NULL;
  +}
  +return reg_names[reg];
  +}
  +
   /* collects per-function cpuid data
*/
   typedef struct model_features_t {
  @@ -132,7 +151,8 @@ typedef struct model_features_t {
   uint32_t check_feat;
   const char **flag_names;
   uint32_t cpuid;
  -} model_features_t;
  +int reg;
  +} model_features_t;
   
   int check_cpuid = 0;
   int enforce_cpuid = 0;
  @@ -923,10 +943,13 @@ static int unavailable_host_feature(struct 
  model_features_t *f, uint32_t mask)
   
   for (i = 0; i  32; ++i)
   if (1  i  mask) {
  -fprintf(stderr, warning: host cpuid %04x_%04x lacks requested
  - flag '%s' [0x%08x]\n,
  -f-cpuid  16, f-cpuid  0x,
  -f-flag_names[i] ? f-flag_names[i] : [reserved], mask);
  +const char *reg = get_register_name_32(f-reg);
  +assert(reg);
  +fprintf(stderr, warning: host doesn't support requested 
  feature: 
  +CPUID.%02XH:%s%s%s [bit %d]\n,
  +f-cpuid, reg,
  +f-flag_names[i] ? . : ,
  +f-flag_names[i] ? f-flag_names[i] : , i);
   break;
   }
   return 0;
  @@ -945,13 +968,14 @@ static int kvm_check_features_against_host(x86_def_t 
  *guest_def)
   int rv, i;
   struct model_features_t ft[] = {
   {guest_def-features, host_def.features,
  -~0, feature_name, 0x},
  +~0, feature_name, 0x0001, R_EDX},
   {guest_def-ext_features, host_def.ext_features,
  -~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001},
  +~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001, R_ECX},
   {guest_def-ext2_features, host_def.ext2_features,
  -~PPRO_FEATURES, ext2_feature_name, 0x8000},
  +~PPRO_FEATURES, ext2_feature_name, 0x8001, R_EDX},
   {guest_def-ext3_features, host_def.ext3_features,
  -~CPUID_EXT3_SVM, ext3_feature_name, 0x8001}};
  +~CPUID_EXT3_SVM, 

Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set

2013-01-07 Thread Eduardo Habkost
On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote:
  This will be necessary once kvm_check_features_against_host() starts
  using KVM-specific definitions (so it won't compile anymore if
  CONFIG_KVM is not set).
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
   target-i386/cpu.c | 4 
   1 file changed, 4 insertions(+)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 1c3c7e1..876b0f6 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
   #endif /* CONFIG_KVM */
   }
   
  +#ifdef CONFIG_KVM
   static int unavailable_host_feature(struct model_features_t *f, uint32_t 
  mask)
   {
   int i;
  @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t 
  *guest_def)
   }
   return rv;
   }
  +#endif
   
   static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void 
  *opaque,
const char *name, Error **errp)
  @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t 
  *x86_cpu_def, char *features)
   x86_cpu_def-kvm_features = ~minus_kvm_features;
   x86_cpu_def-svm_features = ~minus_svm_features;
   x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
  +#ifdef CONFIG_KVM
   if (check_cpuid  kvm_enabled()) {
   if (kvm_check_features_against_host(x86_cpu_def)  enforce_cpuid)
   goto error;
   }
  +#endif
 Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop
 ifdef here.

I will do. Igor probably will have to change his target-i386: move
kvm_check_features_against_host() check to realize time patch to use
the same approach, too.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes

2013-01-07 Thread Gleb Natapov
On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote:
 Make git_get_out() consistent with spec.  Currently pit_get_out()
 doesn't affect IRQ0, but it can be read by the guest in other ways.
 This makes it consistent with proposed changes in qemu's i8254 model
 as well.
 
 See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
 or search the net for 23124406.pdf.
 
 Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
 ---
  arch/x86/kvm/i8254.c | 44 ++--
  1 file changed, 34 insertions(+), 10 deletions(-)
 
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index cd4ec60..fd38938 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int channel)
  
   WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock));
  
 + /* FIXME: Add some way to represent a paused timer and return
 +  *   the paused-at counter value, to better model gate pausing,
 +  *   wait until next CLK pulse to load counter logic, etc.
 +  */
   t = kpit_elapsed(kvm, c, channel);
   d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
  
 @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int channel)
   counter = (c-count - d)  0x;
   break;
   case 3:
 - /* XXX: may be incorrect for odd counts */
 - counter = c-count - (mod_64((2 * d), c-count));
 + counter = (c-count - (mod_64((2 * d), c-count)))  0xfffe;
   break;
   default:
   counter = c-count - mod_64(d, c-count);
 @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int channel)
   switch (c-mode) {
   default:
   case 0:
 - out = (d = c-count);
 - break;
   case 1:
 - out = (d  c-count);
 + out = (d = c-count);
   break;
   case 2:
 - out = ((mod_64(d, c-count) == 0)  (d != 0));
 + out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0);
   break;
   case 3:
 - out = (mod_64(d, c-count)  ((c-count + 1)  1));
 + out = (mod_64(d, c-count)  ((c-count + 1)  1) || c-gate 
 == 0);
   break;
   case 4:
   case 5:
 - out = (d == c-count);
 + out = (d != c-count);
   break;
   }
  
 @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int channel, 
 u32 val)
  
   /*
* The largest possible initial count is 0; this is equivalent
 -  * to 216 for binary counting and 104 for BCD counting.
 +  * to pow(2,16) for binary counting and pow(10,4) for BCD counting.
*/
   if (val == 0)
   val = 0x1;
 @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int channel, 
 u32 val)
  
   if (channel != 0) {
   ps-channels[channel].count_load_time = ktime_get();
 +
 + /* In gate-triggered one-shot modes,
 +  * indirectly model some pit_get_out()
 +  * cases by setting the load time way
 +  * back until gate-triggered.
 +  * (Generally only affects reading status
 +  * from channel 2 speaker,
 +  * due to hard-wired gates on other
 +  * channels.)
 +  *
 +  * FIXME: This might be redesigned if a paused
 +  * timer state is added for pit_get_count().
 +  */
 + if (ps-channels[channel].mode == 1 ||
 + ps-channels[channel].mode == 5) {
 + u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ);
 + ps-channels[channel].count_load_time =
 +ktime_sub(ps-channels[channel].count_load_time,
 +  ns_to_ktime(delta));
I do not understand what are you trying to do here. You assume that
trigger will happen 2 clocks after counter is loaded?

 + }
   return;
   }
  
 @@ -383,7 +404,6 @@ static void pit_load_count(struct kvm *kvm, int channel, 
 u32 val)
* mode 1 is one shot, mode 2 is period, otherwise del timer */
   switch (ps-channels[0].mode) {
   case 0:
 - case 1:
  /* FIXME: enhance mode 4 precision */
   case 4:
   create_pit_timer(kvm, val, 0);
 @@ -393,6 +413,10 @@ static void pit_load_count(struct kvm *kvm, int channel, 
 u32 val)
   create_pit_timer(kvm, val, 1);
   break;
   default:
 + /* Modes 1 and 5 are triggered by gate leading edge,
 +  * but channel 0's gate is hard-wired high and has
 +  * no edges (on normal real hardware).
 +  */
   destroy_pit_timer(kvm-arch.vpit);
   }
  }
 -- 
 1.7.10.2.484.gcd07cc5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body 

Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words

2013-01-07 Thread Eduardo Habkost
On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote:
  This adds the following feature words to the list of flags to be checked
  by kvm_check_features_against_host():
  
   - cpuid_7_0_ebx_features
   - ext4_features
   - kvm_features
   - svm_features
  
  This will ensure the enforce flag works as it should: it won't allow
  QEMU to be started unless every flag that was requested by the user or
  defined in the CPU model is supported by the host.
  
  This patch may cause existing configurations where enforce wasn't
  preventing QEMU from being started to abort QEMU. But that's exactly the
  point of this patch: if a flag was not supported by the host and QEMU
  wasn't aborting, it was a bug in the enforce code.
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  Cc: kvm@vger.kernel.org
  Cc: libvir-l...@redhat.com
  Cc: Jiri Denemark jdene...@redhat.com
  
  CCing libvirt people, as this is directly related to the planned usage
  of the enforce flag by libvirt.
  
  The libvirt team probably has a problem in their hands: libvirt should
  use enforce to make sure all requested flags are making their way into
  the guest (so the resulting CPU is always the same, on any host), but
  users may have existing working configurations where a flag is not
  supported by the guest and the user really doesn't care about it. Those
  configurations will necessarily break when libvirt starts using
  enforce.
  
  One example where it may cause trouble for common setups: pc-1.3 wants
  the kvm_pv_eoi flag enabled by default (so enforce will make sure it
  is enabled), but the user may have an existing VM running on a host
  without pv_eoi support. That setup is unsafe today because
  live-migration between different host kernel versions may enable/disable
  pv_eoi silently (that's why we need the enforce flag to be used by
  libvirt), but the user probably would like to be able to live-migrate
  that VM anyway (and have libvirt to just do the right thing).
  
  One possible solution to libvirt is to use enforce only on newer
  machine-types, so existing machines with older machine-types will keep
  the unsafe host-dependent-ABI behavior, but at least would keep
  live-migration working in case the user is careful.
  
  I really don't know what the libvirt team prefers, but that's the
  situation today. The longer we take to make enforce strict as it
  should and make libvirt finally use it, more users will have VMs with
  migration-unsafe unpredictable guest ABIs.
  
  Changes v2:
   - Coding style fix
  ---
   target-i386/cpu.c | 15 ---
   1 file changed, 12 insertions(+), 3 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 876b0f6..52727ad 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct 
  model_features_t *f, uint32_t mask)
   return 0;
   }
   
  -/* best effort attempt to inform user requested cpu flags aren't making
  - * their way to the guest.
  +/* Check if all requested cpu flags are making their way to the guest
  + *
  + * Returns 0 if all flags are supported by the host, non-zero otherwise.
*
* This function may be called only if KVM is enabled.
*/
  @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t 
  *guest_def)
   {guest_def-ext2_features, host_def.ext2_features,
   ext2_feature_name, 0x8001, R_EDX},
   {guest_def-ext3_features, host_def.ext3_features,
  -ext3_feature_name, 0x8001, R_ECX}
  +ext3_feature_name, 0x8001, R_ECX},
  +{guest_def-ext4_features, host_def.ext4_features,
  +NULL, 0xC001, R_EDX},
 Since there is not name array for ext4_features they cannot be added or
 removed on the command line hence no need to check them, no?

In theory, yes. But it won't hurt to check it, and it will be useful to
unify the list of feature words in a single place, so we can be sure the
checking/filtering/setting code at
kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(),
will all check/filter/set exactly the same feature words.

 
  +{guest_def-cpuid_7_0_ebx_features, 
  host_def.cpuid_7_0_ebx_features,
  +cpuid_7_0_ebx_feature_name, 7, R_EBX},
  +{guest_def-svm_features, host_def.svm_features,
  +svm_feature_name, 0x800A, R_EDX},
  +{guest_def-kvm_features, host_def.kvm_features,
  +kvm_feature_name, KVM_CPUID_FEATURES, R_EAX},
   };
   
   assert(kvm_enabled());
  -- 
  1.7.11.7
 
 --
   Gleb.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote:
 On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote:
  On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote:
   This adds the following feature words to the list of flags to be checked
   by kvm_check_features_against_host():
   
- cpuid_7_0_ebx_features
- ext4_features
- kvm_features
- svm_features
   
   This will ensure the enforce flag works as it should: it won't allow
   QEMU to be started unless every flag that was requested by the user or
   defined in the CPU model is supported by the host.
   
   This patch may cause existing configurations where enforce wasn't
   preventing QEMU from being started to abort QEMU. But that's exactly the
   point of this patch: if a flag was not supported by the host and QEMU
   wasn't aborting, it was a bug in the enforce code.
   
   Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   ---
   Cc: Gleb Natapov g...@redhat.com
   Cc: Marcelo Tosatti mtosa...@redhat.com
   Cc: kvm@vger.kernel.org
   Cc: libvir-l...@redhat.com
   Cc: Jiri Denemark jdene...@redhat.com
   
   CCing libvirt people, as this is directly related to the planned usage
   of the enforce flag by libvirt.
   
   The libvirt team probably has a problem in their hands: libvirt should
   use enforce to make sure all requested flags are making their way into
   the guest (so the resulting CPU is always the same, on any host), but
   users may have existing working configurations where a flag is not
   supported by the guest and the user really doesn't care about it. Those
   configurations will necessarily break when libvirt starts using
   enforce.
   
   One example where it may cause trouble for common setups: pc-1.3 wants
   the kvm_pv_eoi flag enabled by default (so enforce will make sure it
   is enabled), but the user may have an existing VM running on a host
   without pv_eoi support. That setup is unsafe today because
   live-migration between different host kernel versions may enable/disable
   pv_eoi silently (that's why we need the enforce flag to be used by
   libvirt), but the user probably would like to be able to live-migrate
   that VM anyway (and have libvirt to just do the right thing).
   
   One possible solution to libvirt is to use enforce only on newer
   machine-types, so existing machines with older machine-types will keep
   the unsafe host-dependent-ABI behavior, but at least would keep
   live-migration working in case the user is careful.
   
   I really don't know what the libvirt team prefers, but that's the
   situation today. The longer we take to make enforce strict as it
   should and make libvirt finally use it, more users will have VMs with
   migration-unsafe unpredictable guest ABIs.
   
   Changes v2:
- Coding style fix
   ---
target-i386/cpu.c | 15 ---
1 file changed, 12 insertions(+), 3 deletions(-)
   
   diff --git a/target-i386/cpu.c b/target-i386/cpu.c
   index 876b0f6..52727ad 100644
   --- a/target-i386/cpu.c
   +++ b/target-i386/cpu.c
   @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct 
   model_features_t *f, uint32_t mask)
return 0;
}

   -/* best effort attempt to inform user requested cpu flags aren't making
   - * their way to the guest.
   +/* Check if all requested cpu flags are making their way to the guest
   + *
   + * Returns 0 if all flags are supported by the host, non-zero otherwise.
 *
 * This function may be called only if KVM is enabled.
 */
   @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t 
   *guest_def)
{guest_def-ext2_features, host_def.ext2_features,
ext2_feature_name, 0x8001, R_EDX},
{guest_def-ext3_features, host_def.ext3_features,
   -ext3_feature_name, 0x8001, R_ECX}
   +ext3_feature_name, 0x8001, R_ECX},
   +{guest_def-ext4_features, host_def.ext4_features,
   +NULL, 0xC001, R_EDX},
  Since there is not name array for ext4_features they cannot be added or
  removed on the command line hence no need to check them, no?
 
 In theory, yes. But it won't hurt to check it, and it will be useful to
 unify the list of feature words in a single place, so we can be sure the
 checking/filtering/setting code at
 kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(),
 will all check/filter/set exactly the same feature words.
 
May be add a name array for the leaf? :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Eduardo Habkost
On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote:
 On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
  On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
   On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
This is a cleanup that tries to solve two small issues:

 - We don't need a separate kvm_pv_eoi_features variable just to keep a
   constant calculated at compile-time, and this style would require
   adding a separate variable (that's declared twice because of the
   CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
   by machine-type compat code.
 - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
   even when KVM is disabled at runtime. This small incosistency in
   the cpuid_kvm_features field isn't a problem today because
   cpuid_kvm_features is ignored by the TCG code, but it may cause
   unexpected problems later when refactoring the CPUID handling code.

This patch eliminates the kvm_pv_eoi_features variable and simply uses
CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
function, so it enables kvm_pv_eoi only if KVM is enabled. I believe
this makes the behavior of enable_kvm_pv_eoi() clearer and easier to
understand.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Coding style fix
---
 target-i386/cpu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 82685dc..e6435da 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1  
KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
-static const uint32_t kvm_pv_eoi_features = (0x1  
KVM_FEATURE_PV_EOI);
 #else
 static uint32_t kvm_default_features = 0;
-static const uint32_t kvm_pv_eoi_features = 0;
 #endif
 
 void enable_kvm_pv_eoi(void)
 {
-kvm_default_features |= kvm_pv_eoi_features;
+#ifdef CONFIG_KVM
   You do not need ifdef here.
  
  We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is
  set.
  
  I could also write it as:
  
  if (kvm_enabled()) {
  #ifdef CONFIG_KVM
  kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
  #endif
  }
  
  But I find it less readable.
  
  
 Why not define KVM_FEATURE_PV_EOI unconditionally?

It comes from the KVM kernel headers, that are included only if
CONFIG_KVM is set, and probably won't even compile in non-Linux systems.

I have a dejavu feeling. I believe we had this exact problem before,
maybe about some other #defines that come from the Linux KVM headers and
won't be available in non-Linux systems.

 
   
+if (kvm_enabled()) {
+kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
+}
+#endif
 }
 
 void host_cpuid(uint32_t function, uint32_t count,
-- 
1.7.11.7
   
   --
 Gleb.
  
  -- 
  Eduardo
 
 --
   Gleb.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote:
 On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote:
  On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
   On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
 This is a cleanup that tries to solve two small issues:
 
  - We don't need a separate kvm_pv_eoi_features variable just to keep 
 a
constant calculated at compile-time, and this style would require
adding a separate variable (that's declared twice because of the
CONFIG_KVM ifdef) for each feature that's going to be 
 enabled/disable
by machine-type compat code.
  - The pc-1.3 code is setting the kvm_pv_eoi flag on 
 cpuid_kvm_features
even when KVM is disabled at runtime. This small incosistency in
the cpuid_kvm_features field isn't a problem today because
cpuid_kvm_features is ignored by the TCG code, but it may cause
unexpected problems later when refactoring the CPUID handling code.
 
 This patch eliminates the kvm_pv_eoi_features variable and simply uses
 CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
 function, so it enables kvm_pv_eoi only if KVM is enabled. I believe
 this makes the behavior of enable_kvm_pv_eoi() clearer and easier to
 understand.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
 Cc: kvm@vger.kernel.org
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Marcelo Tosatti mtosa...@redhat.com
 
 Changes v2:
  - Coding style fix
 ---
  target-i386/cpu.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 82685dc..e6435da 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1  
 KVM_FEATURE_CLOCKSOURCE) |
  (1  KVM_FEATURE_ASYNC_PF) |
  (1  KVM_FEATURE_STEAL_TIME) |
  (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 -static const uint32_t kvm_pv_eoi_features = (0x1  
 KVM_FEATURE_PV_EOI);
  #else
  static uint32_t kvm_default_features = 0;
 -static const uint32_t kvm_pv_eoi_features = 0;
  #endif
  
  void enable_kvm_pv_eoi(void)
  {
 -kvm_default_features |= kvm_pv_eoi_features;
 +#ifdef CONFIG_KVM
You do not need ifdef here.
   
   We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is
   set.
   
   I could also write it as:
   
   if (kvm_enabled()) {
   #ifdef CONFIG_KVM
   kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
   #endif
   }
   
   But I find it less readable.
   
   
  Why not define KVM_FEATURE_PV_EOI unconditionally?
 
 It comes from the KVM kernel headers, that are included only if
 CONFIG_KVM is set, and probably won't even compile in non-Linux systems.
 
 I have a dejavu feeling. I believe we had this exact problem before,
 maybe about some other #defines that come from the Linux KVM headers and
 won't be available in non-Linux systems.

It is better to hide all KVM related differences somewhere in the
headers where no one sees them instead of sprinkle them all over the
code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM
part. Or have one ifdef CONFIG_KVM at the beginning of the file and
define enable_kvm_pv_eoi() there and provide empty stub otherwise.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words

2013-01-07 Thread Eduardo Habkost
On Mon, Jan 07, 2013 at 02:06:38PM +0200, Gleb Natapov wrote:
 On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote:
  On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote:
   On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote:
This adds the following feature words to the list of flags to be checked
by kvm_check_features_against_host():

 - cpuid_7_0_ebx_features
 - ext4_features
 - kvm_features
 - svm_features

This will ensure the enforce flag works as it should: it won't allow
QEMU to be started unless every flag that was requested by the user or
defined in the CPU model is supported by the host.

This patch may cause existing configurations where enforce wasn't
preventing QEMU from being started to abort QEMU. But that's exactly the
point of this patch: if a flag was not supported by the host and QEMU
wasn't aborting, it was a bug in the enforce code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: libvir-l...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com

CCing libvirt people, as this is directly related to the planned usage
of the enforce flag by libvirt.

The libvirt team probably has a problem in their hands: libvirt should
use enforce to make sure all requested flags are making their way into
the guest (so the resulting CPU is always the same, on any host), but
users may have existing working configurations where a flag is not
supported by the guest and the user really doesn't care about it. Those
configurations will necessarily break when libvirt starts using
enforce.

One example where it may cause trouble for common setups: pc-1.3 wants
the kvm_pv_eoi flag enabled by default (so enforce will make sure it
is enabled), but the user may have an existing VM running on a host
without pv_eoi support. That setup is unsafe today because
live-migration between different host kernel versions may enable/disable
pv_eoi silently (that's why we need the enforce flag to be used by
libvirt), but the user probably would like to be able to live-migrate
that VM anyway (and have libvirt to just do the right thing).

One possible solution to libvirt is to use enforce only on newer
machine-types, so existing machines with older machine-types will keep
the unsafe host-dependent-ABI behavior, but at least would keep
live-migration working in case the user is careful.

I really don't know what the libvirt team prefers, but that's the
situation today. The longer we take to make enforce strict as it
should and make libvirt finally use it, more users will have VMs with
migration-unsafe unpredictable guest ABIs.

Changes v2:
 - Coding style fix
---
 target-i386/cpu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 876b0f6..52727ad 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -955,8 +955,9 @@ static int unavailable_host_feature(struct 
model_features_t *f, uint32_t mask)
 return 0;
 }
 
-/* best effort attempt to inform user requested cpu flags aren't making
- * their way to the guest.
+/* Check if all requested cpu flags are making their way to the guest
+ *
+ * Returns 0 if all flags are supported by the host, non-zero 
otherwise.
  *
  * This function may be called only if KVM is enabled.
  */
@@ -973,7 +974,15 @@ static int 
kvm_check_features_against_host(x86_def_t *guest_def)
 {guest_def-ext2_features, host_def.ext2_features,
 ext2_feature_name, 0x8001, R_EDX},
 {guest_def-ext3_features, host_def.ext3_features,
-ext3_feature_name, 0x8001, R_ECX}
+ext3_feature_name, 0x8001, R_ECX},
+{guest_def-ext4_features, host_def.ext4_features,
+NULL, 0xC001, R_EDX},
   Since there is not name array for ext4_features they cannot be added or
   removed on the command line hence no need to check them, no?
  
  In theory, yes. But it won't hurt to check it, and it will be useful to
  unify the list of feature words in a single place, so we can be sure the
  checking/filtering/setting code at
  kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(),
  will all check/filter/set exactly the same feature words.
  
 May be add a name array for the leaf? :)

If anybody find reliable documentation about the 0xC001 CPUID bits,
I would happily do it.  :-)

While we don't have the docs and feature names, I still believe that
having the complete list of feature words in the
kvm_check_features_against_host() code will save us trouble later, 

Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 10:19:15AM -0200, Eduardo Habkost wrote:
 On Mon, Jan 07, 2013 at 02:06:38PM +0200, Gleb Natapov wrote:
  On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote:
   On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote:
On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote:
 This adds the following feature words to the list of flags to be 
 checked
 by kvm_check_features_against_host():
 
  - cpuid_7_0_ebx_features
  - ext4_features
  - kvm_features
  - svm_features
 
 This will ensure the enforce flag works as it should: it won't allow
 QEMU to be started unless every flag that was requested by the user or
 defined in the CPU model is supported by the host.
 
 This patch may cause existing configurations where enforce wasn't
 preventing QEMU from being started to abort QEMU. But that's exactly 
 the
 point of this patch: if a flag was not supported by the host and QEMU
 wasn't aborting, it was a bug in the enforce code.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
 Cc: Gleb Natapov g...@redhat.com
 Cc: Marcelo Tosatti mtosa...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: libvir-l...@redhat.com
 Cc: Jiri Denemark jdene...@redhat.com
 
 CCing libvirt people, as this is directly related to the planned usage
 of the enforce flag by libvirt.
 
 The libvirt team probably has a problem in their hands: libvirt should
 use enforce to make sure all requested flags are making their way 
 into
 the guest (so the resulting CPU is always the same, on any host), but
 users may have existing working configurations where a flag is not
 supported by the guest and the user really doesn't care about it. 
 Those
 configurations will necessarily break when libvirt starts using
 enforce.
 
 One example where it may cause trouble for common setups: pc-1.3 wants
 the kvm_pv_eoi flag enabled by default (so enforce will make sure it
 is enabled), but the user may have an existing VM running on a host
 without pv_eoi support. That setup is unsafe today because
 live-migration between different host kernel versions may 
 enable/disable
 pv_eoi silently (that's why we need the enforce flag to be used by
 libvirt), but the user probably would like to be able to live-migrate
 that VM anyway (and have libvirt to just do the right thing).
 
 One possible solution to libvirt is to use enforce only on newer
 machine-types, so existing machines with older machine-types will keep
 the unsafe host-dependent-ABI behavior, but at least would keep
 live-migration working in case the user is careful.
 
 I really don't know what the libvirt team prefers, but that's the
 situation today. The longer we take to make enforce strict as it
 should and make libvirt finally use it, more users will have VMs with
 migration-unsafe unpredictable guest ABIs.
 
 Changes v2:
  - Coding style fix
 ---
  target-i386/cpu.c | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 876b0f6..52727ad 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct 
 model_features_t *f, uint32_t mask)
  return 0;
  }
  
 -/* best effort attempt to inform user requested cpu flags aren't 
 making
 - * their way to the guest.
 +/* Check if all requested cpu flags are making their way to the guest
 + *
 + * Returns 0 if all flags are supported by the host, non-zero 
 otherwise.
   *
   * This function may be called only if KVM is enabled.
   */
 @@ -973,7 +974,15 @@ static int 
 kvm_check_features_against_host(x86_def_t *guest_def)
  {guest_def-ext2_features, host_def.ext2_features,
  ext2_feature_name, 0x8001, R_EDX},
  {guest_def-ext3_features, host_def.ext3_features,
 -ext3_feature_name, 0x8001, R_ECX}
 +ext3_feature_name, 0x8001, R_ECX},
 +{guest_def-ext4_features, host_def.ext4_features,
 +NULL, 0xC001, R_EDX},
Since there is not name array for ext4_features they cannot be added or
removed on the command line hence no need to check them, no?
   
   In theory, yes. But it won't hurt to check it, and it will be useful to
   unify the list of feature words in a single place, so we can be sure the
   checking/filtering/setting code at
   kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(),
   will all check/filter/set exactly the same feature words.
   
  May be add a name array for the leaf? :)
 
 If anybody find reliable documentation about the 0xC001 CPUID bits,
 I would 

Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Eduardo Habkost
On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote:
 On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote:
  On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote:
   On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
 On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
  This is a cleanup that tries to solve two small issues:
  
   - We don't need a separate kvm_pv_eoi_features variable just to 
  keep a
 constant calculated at compile-time, and this style would require
 adding a separate variable (that's declared twice because of the
 CONFIG_KVM ifdef) for each feature that's going to be 
  enabled/disable
 by machine-type compat code.
   - The pc-1.3 code is setting the kvm_pv_eoi flag on 
  cpuid_kvm_features
 even when KVM is disabled at runtime. This small incosistency in
 the cpuid_kvm_features field isn't a problem today because
 cpuid_kvm_features is ignored by the TCG code, but it may cause
 unexpected problems later when refactoring the CPUID handling 
  code.
  
  This patch eliminates the kvm_pv_eoi_features variable and simply 
  uses
  CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
  function, so it enables kvm_pv_eoi only if KVM is enabled. I believe
  this makes the behavior of enable_kvm_pv_eoi() clearer and easier to
  understand.
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Cc: kvm@vger.kernel.org
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Gleb Natapov g...@redhat.com
  Cc: Marcelo Tosatti mtosa...@redhat.com
  
  Changes v2:
   - Coding style fix
  ---
   target-i386/cpu.c | 8 +---
   1 file changed, 5 insertions(+), 3 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 82685dc..e6435da 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1  
  KVM_FEATURE_CLOCKSOURCE) |
   (1  KVM_FEATURE_ASYNC_PF) |
   (1  KVM_FEATURE_STEAL_TIME) |
   (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
  -static const uint32_t kvm_pv_eoi_features = (0x1  
  KVM_FEATURE_PV_EOI);
   #else
   static uint32_t kvm_default_features = 0;
  -static const uint32_t kvm_pv_eoi_features = 0;
   #endif
   
   void enable_kvm_pv_eoi(void)
   {
  -kvm_default_features |= kvm_pv_eoi_features;
  +#ifdef CONFIG_KVM
 You do not need ifdef here.

We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is
set.

I could also write it as:

if (kvm_enabled()) {
#ifdef CONFIG_KVM
kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
#endif
}

But I find it less readable.


   Why not define KVM_FEATURE_PV_EOI unconditionally?
  
  It comes from the KVM kernel headers, that are included only if
  CONFIG_KVM is set, and probably won't even compile in non-Linux systems.
  
  I have a dejavu feeling. I believe we had this exact problem before,
  maybe about some other #defines that come from the Linux KVM headers and
  won't be available in non-Linux systems.
 
 It is better to hide all KVM related differences somewhere in the
 headers where no one sees them instead of sprinkle them all over the
 code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM
 part. Or have one ifdef CONFIG_KVM at the beginning of the file and
 define enable_kvm_pv_eoi() there and provide empty stub otherwise.

If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef
around the real implementation. I mean, I don't think this:

  #ifdef CONFIG_KVM
  int enable_kvm_pv_eoi() {
[...]
  }
  #endif

is any better than this:

  int enable_kvm_pv_eoi() {
  #ifdef CONFIG_KVM
[...]
  #endif
  }

So this is probably a good reason to duplicate the KVM_FEATURE_*
#defines in the QEMU code, instead?

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 10:30:40AM -0200, Eduardo Habkost wrote:
 On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote:
  On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote:
   On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote:
On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
 On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
  On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
   This is a cleanup that tries to solve two small issues:
   
- We don't need a separate kvm_pv_eoi_features variable just to 
   keep a
  constant calculated at compile-time, and this style would 
   require
  adding a separate variable (that's declared twice because of 
   the
  CONFIG_KVM ifdef) for each feature that's going to be 
   enabled/disable
  by machine-type compat code.
- The pc-1.3 code is setting the kvm_pv_eoi flag on 
   cpuid_kvm_features
  even when KVM is disabled at runtime. This small incosistency 
   in
  the cpuid_kvm_features field isn't a problem today because
  cpuid_kvm_features is ignored by the TCG code, but it may cause
  unexpected problems later when refactoring the CPUID handling 
   code.
   
   This patch eliminates the kvm_pv_eoi_features variable and simply 
   uses
   CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat
   function, so it enables kvm_pv_eoi only if KVM is enabled. I 
   believe
   this makes the behavior of enable_kvm_pv_eoi() clearer and easier 
   to
   understand.
   
   Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   ---
   Cc: kvm@vger.kernel.org
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Gleb Natapov g...@redhat.com
   Cc: Marcelo Tosatti mtosa...@redhat.com
   
   Changes v2:
- Coding style fix
   ---
target-i386/cpu.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
   
   diff --git a/target-i386/cpu.c b/target-i386/cpu.c
   index 82685dc..e6435da 100644
   --- a/target-i386/cpu.c
   +++ b/target-i386/cpu.c
   @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 
KVM_FEATURE_CLOCKSOURCE) |
(1  KVM_FEATURE_ASYNC_PF) |
(1  KVM_FEATURE_STEAL_TIME) |
(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
   -static const uint32_t kvm_pv_eoi_features = (0x1  
   KVM_FEATURE_PV_EOI);
#else
static uint32_t kvm_default_features = 0;
   -static const uint32_t kvm_pv_eoi_features = 0;
#endif

void enable_kvm_pv_eoi(void)
{
   -kvm_default_features |= kvm_pv_eoi_features;
   +#ifdef CONFIG_KVM
  You do not need ifdef here.
 
 We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM 
 is
 set.
 
 I could also write it as:
 
 if (kvm_enabled()) {
 #ifdef CONFIG_KVM
 kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
 #endif
 }
 
 But I find it less readable.
 
 
Why not define KVM_FEATURE_PV_EOI unconditionally?
   
   It comes from the KVM kernel headers, that are included only if
   CONFIG_KVM is set, and probably won't even compile in non-Linux systems.
   
   I have a dejavu feeling. I believe we had this exact problem before,
   maybe about some other #defines that come from the Linux KVM headers and
   won't be available in non-Linux systems.
  
  It is better to hide all KVM related differences somewhere in the
  headers where no one sees them instead of sprinkle them all over the
  code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM
  part. Or have one ifdef CONFIG_KVM at the beginning of the file and
  define enable_kvm_pv_eoi() there and provide empty stub otherwise.
 
 If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef
 around the real implementation. I mean, I don't think this:
 
   #ifdef CONFIG_KVM
   int enable_kvm_pv_eoi() {
 [...]
   }
   #endif
 
You already have #ifdef CONFIG_KVM just above enable_kvm_pv_eoi(). Put
everything KVM related there instead of adding #ifdef CONFIG_KVM all
over the file.

 is any better than this:
 
   int enable_kvm_pv_eoi() {
   #ifdef CONFIG_KVM
 [...]
   #endif
   }
 
 So this is probably a good reason to duplicate the KVM_FEATURE_*
 #defines in the QEMU code, instead?
 
Not even duplicate, they can be fake just to keep compiler happy.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 11:39:18AM +0200, Gleb Natapov wrote:
 On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
  Reading the spec, it is clear that most modes normally leave the IRQ
  output line high, and only pulse it low to generate a leading edge.
  Especially the most commonly used mode 2.
  
  The KVM i8254 model does not try to emulate the duration of the pulse at
  all, so just swap the high/low settings it to leave it high most of
  the time.
  
  This fix is a prerequisite to improving the i8259 model to handle
  the trailing edge of an interupt request as indicated in its spec:
  If it gets a trailing edge of an IRQ line before it starts to service
  the interrupt, the request should be canceled.
  
  See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
  or search the net for 23124406.pdf.
  
  Risks:
  
  There is a risk that migrating a running guest between versions
  with and without this patch will lose or gain a single timer
  interrupt during the migration process.  The only case where
 Can you elaborate on how exactly this can happen? Do not see it.
 
  this is likely to be serious is probably losing a single-shot (mode 4)
  interrupt, but if my understanding of how things work is good, then
  that should only be possible if a whole slew of conditions are
  all met:
  
   1. The guest is configured to run in a tickless mode (like
  modern Linux).
   2. The guest is for some reason still using the i8254 rather
  than something more modern like an HPET.  (The combination
  of 1 and 2 should be rare.)
 This is not so rare. For performance reason it is better to not have
 HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
 
It looks like Linux prefer to use APIC timer anyway.

   3. The migration is going from a fixed version back to the
  old version.  (Not sure how common this is, but it should
  be rarer than migrating from old to new.)
   4. There are not going to be any timely events/interrupts
  (keyboard, network, process sleeps, etc) that cause the guest
  to reset the PIT mode 4 one-shot counter soon enough.
  
  This combination should be rare enough that more complicated
  solutions are not worth the effort.
  
  Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
  ---
   arch/x86/kvm/i8254.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
  
  diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
  index c1d30b2..cd4ec60 100644
  --- a/arch/x86/kvm/i8254.c
  +++ b/arch/x86/kvm/i8254.c
  @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
  }
  spin_unlock(ps-inject_lock);
  if (inject) {
  -   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
  +   /* Clear previous interrupt, then create a rising
  +* edge to request another interupt, and leave it at
  +* level=1 until time to inject another one.
  +*/
  kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
  +   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
   
  /*
   * Provides NMI watchdog support via Virtual Wire mode.
  -- 
  1.7.10.2.484.gcd07cc5
 
 --
   Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-07 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-01-07:
 On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
 manually, which is fully taken care of by the hardware. This needs
 some special awareness into existing interrupr injection path:
 
 - for pending interrupt, instead of direct injection, we may need
   update architecture specific indicators before resuming to guest.
 - A pending interrupt, which is masked by ISR, should be also
   considered in above update action, since hardware will decide
   when to inject it at right time. Current has_interrupt and
   get_interrupt only returns a valid vector from injection p.o.v.
 Signed-off-by: Kevin Tian kevin.t...@intel.com
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/ia64/kvm/lapic.h   |6 ++
  arch/x86/include/asm/kvm_host.h |8 ++ arch/x86/include/asm/vmx.h  
 |   11 +++ arch/x86/kvm/irq.c  |   56 +++-
  arch/x86/kvm/lapic.c|   87 +++---
  arch/x86/kvm/lapic.h|   29 +- arch/x86/kvm/svm.c  
 |   36  arch/x86/kvm/vmx.c  |  190
  ++- arch/x86/kvm/x86.c
   |   11 ++- include/linux/kvm_host.h|2 + virt/kvm/ioapic.c
|   41 + virt/kvm/ioapic.h   |1
  + virt/kvm/irq_comm.c |   20  13 files changed, 451
  insertions(+), 47 deletions(-)
 diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
 index c5f92a9..cb59eb4 100644
 --- a/arch/ia64/kvm/lapic.h
 +++ b/arch/ia64/kvm/lapic.h
 @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
 kvm_lapic_irq *irq);
  #define kvm_apic_present(x) (true)
  #define kvm_lapic_enabled(x) (true)
 +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
 +struct kvm_lapic_irq *irq)
 +{
 +/* IA64 has no apicv supporting, do nothing here */
 +}
 +
  #endif
 diff --git a/arch/x86/include/asm/kvm_host.h
 b/arch/x86/include/asm/kvm_host.h index c431b33..135603f 100644 ---
 a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
 @@ -697,6 +697,13 @@ struct kvm_x86_ops {
  void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
  void (*enable_irq_window)(struct kvm_vcpu *vcpu);
  void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 +int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
 +void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
 +void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
 +void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
 +void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
 +void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
 The amount of callbacks to update exit bitmap start to become insane.
As your suggestion below, if using global lock, then three callbacks is enough

 +void (*restore_rvi)(struct kvm_vcpu *vcpu);
 rvi? Call it set_svi() and make it do just that - set svi.
Typo.

 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 0664c13..e1baf37 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic)
  return kvm_apic_sw_enabled(apic)  kvm_apic_hw_enabled(apic);
  }
 +bool kvm_apic_present(struct kvm_vcpu *vcpu) +{ +   return
 kvm_vcpu_has_lapic(vcpu)  kvm_apic_hw_enabled(vcpu-arch.apic); +}
 +EXPORT_SYMBOL_GPL(kvm_apic_present); +
 Why is this change? Drop it.
I cannot remember why. But it seems this change is needless now.
 
  #define LVT_MASK\
  (APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
 @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
  return (kvm_apic_get_reg(apic, APIC_ID)  24)  0xff;
  }
 -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr)
 -{
 -u16 cid;
 -ldr = 32 - map-ldr_bits;
 -cid = (ldr  map-cid_shift)  map-cid_mask;
 -
 -BUG_ON(cid = ARRAY_SIZE(map-logical_map));
 -
 -return cid;
 -}
 -
 -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr)
 -{
 -ldr = (32 - map-ldr_bits);
 -return ldr  map-lid_mask;
 -}
 -
  static void recalculate_apic_map(struct kvm *kvm)
  {
  struct kvm_apic_map *new, *old = NULL;
 @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic
 *apic, u8 id)
  {   apic_set_reg(apic, APIC_ID, id  24);
  recalculate_apic_map(apic-vcpu-kvm);
  +   ioapic_update_eoi_exitmap(apic-vcpu-kvm); }
  
  static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id) {
  apic_set_reg(apic, APIC_LDR, id);
  recalculate_apic_map(apic-vcpu-kvm);
  +   ioapic_update_eoi_exitmap(apic-vcpu-kvm); }
  
  static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
 @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct 

Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set

2013-01-07 Thread Igor Mammedov
On Mon, 7 Jan 2013 10:00:09 -0200
Eduardo Habkost ehabk...@redhat.com wrote:

 On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote:
  On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote:
   This will be necessary once kvm_check_features_against_host() starts
   using KVM-specific definitions (so it won't compile anymore if
   CONFIG_KVM is not set).
   
   Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   ---
target-i386/cpu.c | 4 
1 file changed, 4 insertions(+)
   
   diff --git a/target-i386/cpu.c b/target-i386/cpu.c
   index 1c3c7e1..876b0f6 100644
   --- a/target-i386/cpu.c
   +++ b/target-i386/cpu.c
   @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
#endif /* CONFIG_KVM */
}

   +#ifdef CONFIG_KVM
static int unavailable_host_feature(struct model_features_t *f, uint32_t 
   mask)
{
int i;
   @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t 
   *guest_def)
}
return rv;
}
   +#endif

static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void 
   *opaque,
 const char *name, Error **errp)
   @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t 
   *x86_cpu_def, char *features)
x86_cpu_def-kvm_features = ~minus_kvm_features;
x86_cpu_def-svm_features = ~minus_svm_features;
x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
   +#ifdef CONFIG_KVM
if (check_cpuid  kvm_enabled()) {
if (kvm_check_features_against_host(x86_cpu_def)  
   enforce_cpuid)
goto error;
}
   +#endif
  Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop
  ifdef here.
 
 I will do. Igor probably will have to change his target-i386: move
 kvm_check_features_against_host() check to realize time patch to use
 the same approach, too.


Gleb,

Why do stub here? As result we will be adding more ifdef-s just in other
places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and
kvm_check_features_against_host() are bundled together in cpu.c so we could
instead ifdef whole block. Like here:
http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html

For me code looks more readable with ifdef here, if we have stub, a reader
would have to look at kvm_check_features_against_host() body to see if it does
anything.

 
 -- 
 Eduardo
 


-- 
Regards,
  Igor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set

2013-01-07 Thread Eduardo Habkost
On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote:
 On Mon, 7 Jan 2013 10:00:09 -0200
 Eduardo Habkost ehabk...@redhat.com wrote:
 
  On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote:
   On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote:
This will be necessary once kvm_check_features_against_host() starts
using KVM-specific definitions (so it won't compile anymore if
CONFIG_KVM is not set).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 1c3c7e1..876b0f6 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t 
*x86_cpu_def)
 #endif /* CONFIG_KVM */
 }
 
+#ifdef CONFIG_KVM
 static int unavailable_host_feature(struct model_features_t *f, 
uint32_t mask)
 {
 int i;
@@ -987,6 +988,7 @@ static int 
kvm_check_features_against_host(x86_def_t *guest_def)
 }
 return rv;
 }
+#endif
 
 static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void 
*opaque,
  const char *name, Error 
**errp)
@@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 x86_cpu_def-kvm_features = ~minus_kvm_features;
 x86_cpu_def-svm_features = ~minus_svm_features;
 x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
+#ifdef CONFIG_KVM
 if (check_cpuid  kvm_enabled()) {
 if (kvm_check_features_against_host(x86_cpu_def)  
enforce_cpuid)
 goto error;
 }
+#endif
   Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop
   ifdef here.
  
  I will do. Igor probably will have to change his target-i386: move
  kvm_check_features_against_host() check to realize time patch to use
  the same approach, too.
 
 
 Gleb,
 
 Why do stub here? As result we will be adding more ifdef-s just in other
 places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and
 kvm_check_features_against_host() are bundled together in cpu.c so we could
 instead ifdef whole block. Like here:
 http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html
 
 For me code looks more readable with ifdef here, if we have stub, a reader
 would have to look at kvm_check_features_against_host() body to see if it does
 anything.

If CONFIG_KVM is not set, kvm_enabled() is always zero, so the function
would never be called, so I find the ifdef-less code more readable and
obvious.

What I don't know is if we should do this:

  #ifdef CONFIG_KVM
  
  static int kvm_check_features_against_host(...)
  {
  /* real implementation here */
  }
  
  static int kvm_do_something_else(...)
  {
  /* real implementation here */
  }
  
  /* Other kvm_* functions here */
  
  #else
  
  static int kvm_check_features_against_host(...)
  {
  }
  
  static int kvm_do_something_else(...)
  {
  }
  
  /* Other kvm_* stubs here */
  
  #endif /* CONFIG_KVM */


Or this:

  static int kvm_check_features_against_host(...)
  {
  #ifdef CONFIG_KVM
  /* real implementation here */
  #endif /* CONFIG_KVM */
  }
  
  static int kvm_do_something_else(...)
  {
  #ifdef CONFIG_KVM
  /* real implementation here */
  #endif /* CONFIG_KVM */
  }


I believe the latter is better, but based on Gleb's comments about
enable_kvm_pv_eoi(), he seems to prefer the former.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote:
 On Mon, 7 Jan 2013 10:00:09 -0200
 Eduardo Habkost ehabk...@redhat.com wrote:
 
  On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote:
   On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote:
This will be necessary once kvm_check_features_against_host() starts
using KVM-specific definitions (so it won't compile anymore if
CONFIG_KVM is not set).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 1c3c7e1..876b0f6 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t 
*x86_cpu_def)
 #endif /* CONFIG_KVM */
 }
 
+#ifdef CONFIG_KVM
 static int unavailable_host_feature(struct model_features_t *f, 
uint32_t mask)
 {
 int i;
@@ -987,6 +988,7 @@ static int 
kvm_check_features_against_host(x86_def_t *guest_def)
 }
 return rv;
 }
+#endif
 
 static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void 
*opaque,
  const char *name, Error 
**errp)
@@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 x86_cpu_def-kvm_features = ~minus_kvm_features;
 x86_cpu_def-svm_features = ~minus_svm_features;
 x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
+#ifdef CONFIG_KVM
 if (check_cpuid  kvm_enabled()) {
 if (kvm_check_features_against_host(x86_cpu_def)  
enforce_cpuid)
 goto error;
 }
+#endif
   Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop
   ifdef here.
  
  I will do. Igor probably will have to change his target-i386: move
  kvm_check_features_against_host() check to realize time patch to use
  the same approach, too.
 
 
 Gleb,
 
 Why do stub here? As result we will be adding more ifdef-s just in other
 places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and
Why will we be adding more ifdef-s in other places?

 kvm_check_features_against_host() are bundled together in cpu.c so we could
 instead ifdef whole block. Like here:
 http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html
 
That's fine, but you can avoid things like:

 if (kvm_enabled()  name  strcmp(name, host) == 0) {
+#ifdef CONFIG_KVM
 kvm_cpu_fill_host(x86_cpu_def);
+#endif

in your patch by providing stub for kvm_cpu_fill_host() for !CONFIG_KVM
case. This is common practice really. Avoid ifdefs in the code.

 For me code looks more readable with ifdef here, if we have stub, a reader
 would have to look at kvm_check_features_against_host() body to see if it does
 anything.
 
If reader cares about kvm it has to anyway. If he does not, there is
friendly kvm_enabled() (which is stub in case of !CONFIG_KVM BTW) to
tell him that he does not care. No need additional ifdef there.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-07 Thread Marcelo Tosatti
On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
 From: Yang Zhang yang.z.zh...@intel.com
 
 Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
 manually, which is fully taken care of by the hardware. This needs
 some special awareness into existing interrupr injection path:
 
 - for pending interrupt, instead of direct injection, we may need
   update architecture specific indicators before resuming to guest.
 
 - A pending interrupt, which is masked by ISR, should be also
   considered in above update action, since hardware will decide
   when to inject it at right time. Current has_interrupt and
   get_interrupt only returns a valid vector from injection p.o.v.
 
 Signed-off-by: Kevin Tian kevin.t...@intel.com
 Signed-off-by: Yang Zhang yang.z.zh...@intel.com
 ---
  arch/ia64/kvm/lapic.h   |6 ++
  arch/x86/include/asm/kvm_host.h |8 ++
  arch/x86/include/asm/vmx.h  |   11 +++
  arch/x86/kvm/irq.c  |   56 +++-
  arch/x86/kvm/lapic.c|   87 +++---
  arch/x86/kvm/lapic.h|   29 +-
  arch/x86/kvm/svm.c  |   36 
  arch/x86/kvm/vmx.c  |  190 
 ++-
  arch/x86/kvm/x86.c  |   11 ++-
  include/linux/kvm_host.h|2 +
  virt/kvm/ioapic.c   |   41 +
  virt/kvm/ioapic.h   |1 +
  virt/kvm/irq_comm.c |   20 
  13 files changed, 451 insertions(+), 47 deletions(-)
 
 diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
 index c5f92a9..cb59eb4 100644
 --- a/arch/ia64/kvm/lapic.h
 +++ b/arch/ia64/kvm/lapic.h
 @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
 kvm_lapic_irq *irq);
  #define kvm_apic_present(x) (true)
  #define kvm_lapic_enabled(x) (true)
  
 +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
 + struct kvm_lapic_irq *irq)
 +{
 + /* IA64 has no apicv supporting, do nothing here */
 +}
 +
  #endif
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c431b33..135603f 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -697,6 +697,13 @@ struct kvm_x86_ops {
   void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
   void (*enable_irq_window)(struct kvm_vcpu *vcpu);
   void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
 + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
 + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
 + void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
 + void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
 + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
 + void (*restore_rvi)(struct kvm_vcpu *vcpu);
   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
   int (*get_tdp_level)(void);
   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
  int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
 index 44c3f7e..d1ab331 100644
 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -62,6 +62,7 @@
  #define EXIT_REASON_MCE_DURING_VMENTRY  41
  #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
  #define EXIT_REASON_APIC_ACCESS 44
 +#define EXIT_REASON_EOI_INDUCED 45
  #define EXIT_REASON_EPT_VIOLATION   48
  #define EXIT_REASON_EPT_MISCONFIG   49
  #define EXIT_REASON_WBINVD  54
 @@ -143,6 +144,7 @@
  #define SECONDARY_EXEC_WBINVD_EXITING0x0040
  #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080
  #define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200
  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400
  #define SECONDARY_EXEC_ENABLE_INVPCID0x1000
  
 @@ -180,6 +182,7 @@ enum vmcs_field {
   GUEST_GS_SELECTOR   = 0x080a,
   GUEST_LDTR_SELECTOR = 0x080c,
   GUEST_TR_SELECTOR   = 0x080e,
 + GUEST_INTR_STATUS   = 0x0810,
   HOST_ES_SELECTOR= 0x0c00,
   HOST_CS_SELECTOR= 0x0c02,
   HOST_SS_SELECTOR= 0x0c04,
 @@ -207,6 +210,14 @@ enum vmcs_field {
   APIC_ACCESS_ADDR_HIGH   = 0x2015,
   EPT_POINTER = 

Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Eduardo Habkost
On Mon, Jan 07, 2013 at 02:33:25PM +0200, Gleb Natapov wrote:
 On Mon, Jan 07, 2013 at 10:30:40AM -0200, Eduardo Habkost wrote:
  On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote:
   On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote:
On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote:
 On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote:
  On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote:
   On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote:
This is a cleanup that tries to solve two small issues:

 - We don't need a separate kvm_pv_eoi_features variable just 
to keep a
   constant calculated at compile-time, and this style would 
require
   adding a separate variable (that's declared twice because of 
the
   CONFIG_KVM ifdef) for each feature that's going to be 
enabled/disable
   by machine-type compat code.
 - The pc-1.3 code is setting the kvm_pv_eoi flag on 
cpuid_kvm_features
   even when KVM is disabled at runtime. This small 
incosistency in
   the cpuid_kvm_features field isn't a problem today because
   cpuid_kvm_features is ignored by the TCG code, but it may 
cause
   unexpected problems later when refactoring the CPUID 
handling code.

This patch eliminates the kvm_pv_eoi_features variable and 
simply uses
CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() 
compat
function, so it enables kvm_pv_eoi only if KVM is enabled. I 
believe
this makes the behavior of enable_kvm_pv_eoi() clearer and 
easier to
understand.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Coding style fix
---
 target-i386/cpu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 82685dc..e6435da 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 
 KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
-static const uint32_t kvm_pv_eoi_features = (0x1  
KVM_FEATURE_PV_EOI);
 #else
 static uint32_t kvm_default_features = 0;
-static const uint32_t kvm_pv_eoi_features = 0;
 #endif
 
 void enable_kvm_pv_eoi(void)
 {
-kvm_default_features |= kvm_pv_eoi_features;
+#ifdef CONFIG_KVM
   You do not need ifdef here.
  
  We need it because KVM_FEATURE_PV_EOI is available only if 
  CONFIG_KVM is
  set.
  
  I could also write it as:
  
  if (kvm_enabled()) {
  #ifdef CONFIG_KVM
  kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
  #endif
  }
  
  But I find it less readable.
  
  
 Why not define KVM_FEATURE_PV_EOI unconditionally?

It comes from the KVM kernel headers, that are included only if
CONFIG_KVM is set, and probably won't even compile in non-Linux systems.

I have a dejavu feeling. I believe we had this exact problem before,
maybe about some other #defines that come from the Linux KVM headers and
won't be available in non-Linux systems.
   
   It is better to hide all KVM related differences somewhere in the
   headers where no one sees them instead of sprinkle them all over the
   code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM
   part. Or have one ifdef CONFIG_KVM at the beginning of the file and
   define enable_kvm_pv_eoi() there and provide empty stub otherwise.
  
  If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef
  around the real implementation. I mean, I don't think this:
  
#ifdef CONFIG_KVM
int enable_kvm_pv_eoi() {
  [...]
}
#endif
  
 You already have #ifdef CONFIG_KVM just above enable_kvm_pv_eoi(). Put
 everything KVM related there instead of adding #ifdef CONFIG_KVM all
 over the file.

But it also creates the need to write a separate stub function somewhere
else, while we could have a ready-to-use stub function automatically by
simply #ifdefing the whole function body. But anyway: this won't matter
if we choose the duplicate/fake #defines approach mentioned below.


 
  is any better than this:
  
int enable_kvm_pv_eoi() {
#ifdef CONFIG_KVM
  [...]
#endif
}
  
  So this is probably a good reason to duplicate the 

Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set

2013-01-07 Thread Igor Mammedov
On Mon, 7 Jan 2013 15:30:26 +0200
Gleb Natapov g...@redhat.com wrote:

 On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote:
  On Mon, 7 Jan 2013 10:00:09 -0200
  Eduardo Habkost ehabk...@redhat.com wrote:
  
   On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote:
On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote:
 This will be necessary once kvm_check_features_against_host() starts
 using KVM-specific definitions (so it won't compile anymore if
 CONFIG_KVM is not set).
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  target-i386/cpu.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 1c3c7e1..876b0f6 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t 
 *x86_cpu_def)
  #endif /* CONFIG_KVM */
  }
  
 +#ifdef CONFIG_KVM
  static int unavailable_host_feature(struct model_features_t *f, 
 uint32_t mask)
  {
  int i;
 @@ -987,6 +988,7 @@ static int 
 kvm_check_features_against_host(x86_def_t *guest_def)
  }
  return rv;
  }
 +#endif
  
  static void x86_cpuid_version_get_family(Object *obj, Visitor *v, 
 void *opaque,
   const char *name, Error 
 **errp)
 @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t 
 *x86_cpu_def, char *features)
  x86_cpu_def-kvm_features = ~minus_kvm_features;
  x86_cpu_def-svm_features = ~minus_svm_features;
  x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features;
 +#ifdef CONFIG_KVM
  if (check_cpuid  kvm_enabled()) {
  if (kvm_check_features_against_host(x86_cpu_def)  
 enforce_cpuid)
  goto error;
  }
 +#endif
Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop
ifdef here.
   
   I will do. Igor probably will have to change his target-i386: move
   kvm_check_features_against_host() check to realize time patch to use
   the same approach, too.
  
  
  Gleb,
  
  Why do stub here? As result we will be adding more ifdef-s just in other
  places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and
 Why will we be adding more ifdef-s in other places?
unavailable_host_feature() is being ifdef-ed above

 
  kvm_check_features_against_host() are bundled together in cpu.c so we could
  instead ifdef whole block. Like here:
  http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html
  
 That's fine, but you can avoid things like:
 
  if (kvm_enabled()  name  strcmp(name, host) == 0) {
 +#ifdef CONFIG_KVM
  kvm_cpu_fill_host(x86_cpu_def);
 +#endif
 
 in your patch by providing stub for kvm_cpu_fill_host() for !CONFIG_KVM
 case. This is common practice really. Avoid ifdefs in the code.
This ifdef could be eliminated later when cpus are converted into sub-classes.
Then we would put host subclass close to kvm_cpu_fill_host inside of the same
ifdef. that would leave ifdef around kvm_check_features_against_host() in
cpu_x86_parse_featurestr().

 
  For me code looks more readable with ifdef here, if we have stub, a reader
  would have to look at kvm_check_features_against_host() body to see if it 
  does
  anything.
  
 If reader cares about kvm it has to anyway. If he does not, there is
 friendly kvm_enabled() (which is stub in case of !CONFIG_KVM BTW) to
 tell him that he does not care. No need additional ifdef there.

both ways would work, but if stubs are preferred style then there is no
point arguing.

 
 --
   Gleb.


-- 
Regards,
  Igor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-07 Thread Paolo Bonzini
Il 07/01/2013 01:02, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 02/01/2013 06:03, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 The virtqueue_add_buf function has two limitations:

 1) it requires the caller to provide all the buffers in a single call;

 2) it does not support chained scatterlists: the buffers must be
 provided as an array of struct scatterlist;

 Chained scatterlists are a horrible interface, but that doesn't mean we
 shouldn't support them if there's a need.

 I think I once even had a patch which passed two chained sgs, rather
 than a combo sg and two length numbers.  It's very old, but I've pasted
 it below.

 Duplicating the implementation by having another interface is pretty
 nasty; I think I'd prefer the chained scatterlists, if that's optimal
 for you.

 Unfortunately, that cannot work because not all architectures support
 chained scatterlists.
 
 WHAT?  I can't figure out what an arch needs to do to support this?

It needs to use the iterator functions in its DMA driver.

 All archs we care about support them, though, so I think we can ignore
 this issue for now.

Kind of... In principle all QEMU-supported arches can use virtio, and
the speedup can be quite useful.  And there is no Kconfig symbol for SG
chains that I can use to disable virtio-scsi on unsupported arches. :/

Paolo

 (Also, as you mention chained scatterlists are horrible.  They'd happen
 to work for virtio-scsi, but not for virtio-blk where the response
 status is part of the footer, not the header).
 
 We lost that debate 5 years ago, so we hack around it as needed.  We can
 add helpers to append if we need.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: s390: Gracefully handle busy conditions on ccw_device_start

2013-01-07 Thread Cornelia Huck
From: Christian Borntraeger borntrae...@de.ibm.com

In rare cases a virtio command might try to issue a ccw before a former
ccw was answered with a tsch. This will cause CC=2 (busy). Lets just
retry in that case.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/s390/kvm/virtio_ccw.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 70419a7..2edd94a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -132,11 +132,14 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
unsigned long flags;
int flag = intparm  VIRTIO_CCW_INTPARM_MASK;
 
-   spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags);
-   ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0);
-   if (!ret)
-   vcdev-curr_io |= flag;
-   spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags);
+   do {
+   spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags);
+   ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0);
+   if (!ret)
+   vcdev-curr_io |= flag;
+   spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags);
+   cpu_relax();
+   } while (ret == -EBUSY);
wait_event(vcdev-wait_q, doing_io(vcdev, flag) == 0);
return ret ? ret : vcdev-err;
 }
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: s390: Dynamic allocation of virtio-ccw I/O data.

2013-01-07 Thread Cornelia Huck
Dynamically allocate any data structures like ccw used when
doing channel I/O. Otherwise, we'd need to add extra serialization
for the different callbacks using the same data structures.

Reported-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/s390/kvm/virtio_ccw.c | 280 ++
 1 file changed, 174 insertions(+), 106 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 1a5aff3..70419a7 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -46,11 +46,9 @@ struct vq_config_block {
 
 struct virtio_ccw_device {
struct virtio_device vdev;
-   __u8 status;
+   __u8 *status;
__u8 config[VIRTIO_CCW_CONFIG_SIZE];
struct ccw_device *cdev;
-   struct ccw1 *ccw;
-   __u32 area;
__u32 curr_io;
int err;
wait_queue_head_t wait_q;
@@ -127,14 +125,15 @@ static int doing_io(struct virtio_ccw_device *vcdev, 
__u32 flag)
return ret;
 }
 
-static int ccw_io_helper(struct virtio_ccw_device *vcdev, __u32 intparm)
+static int ccw_io_helper(struct virtio_ccw_device *vcdev,
+struct ccw1 *ccw, __u32 intparm)
 {
int ret;
unsigned long flags;
int flag = intparm  VIRTIO_CCW_INTPARM_MASK;
 
spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags);
-   ret = ccw_device_start(vcdev-cdev, vcdev-ccw, intparm, 0, 0);
+   ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0);
if (!ret)
vcdev-curr_io |= flag;
spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags);
@@ -167,18 +166,19 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq)
do_kvm_notify(schid, virtqueue_get_queue_index(vq));
 }
 
-static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int index)
+static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev,
+  struct ccw1 *ccw, int index)
 {
vcdev-config_block-index = index;
-   vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF;
-   vcdev-ccw-flags = 0;
-   vcdev-ccw-count = sizeof(struct vq_config_block);
-   vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block);
-   ccw_io_helper(vcdev, VIRTIO_CCW_DOING_READ_VQ_CONF);
+   ccw-cmd_code = CCW_CMD_READ_VQ_CONF;
+   ccw-flags = 0;
+   ccw-count = sizeof(struct vq_config_block);
+   ccw-cda = (__u32)(unsigned long)(vcdev-config_block);
+   ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_READ_VQ_CONF);
return vcdev-config_block-num;
 }
 
-static void virtio_ccw_del_vq(struct virtqueue *vq)
+static void virtio_ccw_del_vq(struct virtqueue *vq, struct ccw1 *ccw)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vq-vdev);
struct virtio_ccw_vq_info *info = vq-priv;
@@ -197,11 +197,12 @@ static void virtio_ccw_del_vq(struct virtqueue *vq)
info-info_block-align = 0;
info-info_block-index = index;
info-info_block-num = 0;
-   vcdev-ccw-cmd_code = CCW_CMD_SET_VQ;
-   vcdev-ccw-flags = 0;
-   vcdev-ccw-count = sizeof(*info-info_block);
-   vcdev-ccw-cda = (__u32)(unsigned long)(info-info_block);
-   ret = ccw_io_helper(vcdev, VIRTIO_CCW_DOING_SET_VQ | index);
+   ccw-cmd_code = CCW_CMD_SET_VQ;
+   ccw-flags = 0;
+   ccw-count = sizeof(*info-info_block);
+   ccw-cda = (__u32)(unsigned long)(info-info_block);
+   ret = ccw_io_helper(vcdev, ccw,
+   VIRTIO_CCW_DOING_SET_VQ | index);
/*
 * -ENODEV isn't considered an error: The device is gone anyway.
 * This may happen on device detach.
@@ -220,14 +221,23 @@ static void virtio_ccw_del_vq(struct virtqueue *vq)
 static void virtio_ccw_del_vqs(struct virtio_device *vdev)
 {
struct virtqueue *vq, *n;
+   struct ccw1 *ccw;
+
+   ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
+   if (!ccw)
+   return;
+
 
list_for_each_entry_safe(vq, n, vdev-vqs, list)
-   virtio_ccw_del_vq(vq);
+   virtio_ccw_del_vq(vq, ccw);
+
+   kfree(ccw);
 }
 
 static struct virtqueue *virtio_ccw_setup_vq(struct virtio_device *vdev,
 int i, vq_callback_t *callback,
-const char *name)
+const char *name,
+struct ccw1 *ccw)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
int err;
@@ -250,7 +260,7 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
err = -ENOMEM;
goto out_err;
}
-   info-num = virtio_ccw_read_vq_conf(vcdev, i);
+   info-num = virtio_ccw_read_vq_conf(vcdev, ccw, i);
size = PAGE_ALIGN(vring_size(info-num, KVM_VIRTIO_CCW_RING_ALIGN));

[PATCH 0/2] KVM: s390: Bugfixes for virtio-ccw.

2013-01-07 Thread Cornelia Huck
Hi,

Christian discovered some problems with regard to serialization
in the virtio-ccw guest driver. Per-device data structures might
contain data obtained by channel programs issued later on, leading
to confusing behaviour. We cannot rely on the common I/O layer
serialization here.

Rather than adding extra serialization, we decided to keep it simple
with per-request allocated data structures and retries on busy.
These patches have been run in our internal testing without problems
for a bit now.

Please apply to kvm-next.

Christian Borntraeger (1):
  KVM: s390: Gracefully handle busy conditions on ccw_device_start

Cornelia Huck (1):
  KVM: s390: Dynamic allocation of virtio-ccw I/O data.

 drivers/s390/kvm/virtio_ccw.c | 291 ++
 1 file changed, 181 insertions(+), 110 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/2] vhost: handle polling errors

2013-01-07 Thread Michael S. Tsirkin
On Mon, Jan 07, 2013 at 12:38:17PM +0800, Jason Wang wrote:
 On 01/06/2013 09:22 PM, Michael S. Tsirkin wrote:
  On Sun, Jan 06, 2013 at 03:18:38PM +0800, Jason Wang wrote:
  Polling errors were ignored by vhost/vhost_net, this may lead to crash when
  trying to remove vhost from waitqueue when after the polling is failed. 
  Solve
  this problem by:
 
  - checking the poll-wqh before trying to remove from waitqueue
  - report an error when poll() returns a POLLERR in vhost_start_poll()
  - report an error when vhost_start_poll() fails in
vhost_vring_ioctl()/vhost_net_set_backend() which is used to notify the
failure to userspace.
  - report an error in the data path in vhost_net when meet polling errors.
 
  After those changes, we can safely drop the tx polling state in vhost_net 
  since
  it was replaced by the checking of poll-wqh.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   drivers/vhost/net.c   |   74 
  
   drivers/vhost/vhost.c |   31 +++-
   drivers/vhost/vhost.h |2 +-
   3 files changed, 49 insertions(+), 58 deletions(-)
 
  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index d10ad6f..125c1e5 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -64,20 +64,10 @@ enum {
 VHOST_NET_VQ_MAX = 2,
   };
   
  -enum vhost_net_poll_state {
  -  VHOST_NET_POLL_DISABLED = 0,
  -  VHOST_NET_POLL_STARTED = 1,
  -  VHOST_NET_POLL_STOPPED = 2,
  -};
  -
   struct vhost_net {
 struct vhost_dev dev;
 struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
 struct vhost_poll poll[VHOST_NET_VQ_MAX];
  -  /* Tells us whether we are polling a socket for TX.
  -   * We only do this when socket buffer fills up.
  -   * Protected by tx vq lock. */
  -  enum vhost_net_poll_state tx_poll_state;
 /* Number of TX recently submitted.
  * Protected by tx vq lock. */
 unsigned tx_packets;
  @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, 
  struct iovec *to,
 }
   }
   
  -/* Caller must have TX VQ lock */
  -static void tx_poll_stop(struct vhost_net *net)
  -{
  -  if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED))
  -  return;
  -  vhost_poll_stop(net-poll + VHOST_NET_VQ_TX);
  -  net-tx_poll_state = VHOST_NET_POLL_STOPPED;
  -}
  -
  -/* Caller must have TX VQ lock */
  -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
  -{
  -  if (unlikely(net-tx_poll_state != VHOST_NET_POLL_STOPPED))
  -  return;
  -  vhost_poll_start(net-poll + VHOST_NET_VQ_TX, sock-file);
  -  net-tx_poll_state = VHOST_NET_POLL_STARTED;
  -}
  -
   /* In case of DMA done not in order in lower device driver for some 
  reason.
* upend_idx is used to track end of used idx, done_idx is used to track 
  head
* of used idx. Once lower device DMA done contiguously, we will signal 
  KVM
  @@ -227,6 +199,7 @@ static void vhost_zerocopy_callback(struct ubuf_info 
  *ubuf, bool success)
   static void handle_tx(struct vhost_net *net)
   {
 struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +  struct vhost_poll *poll = net-poll + VHOST_NET_VQ_TX;
 unsigned out, in, s;
 int head;
 struct msghdr msg = {
  @@ -252,7 +225,8 @@ static void handle_tx(struct vhost_net *net)
 wmem = atomic_read(sock-sk-sk_wmem_alloc);
 if (wmem = sock-sk-sk_sndbuf) {
 mutex_lock(vq-mutex);
  -  tx_poll_start(net, sock);
  +  if (vhost_poll_start(poll, sock-file))
  +  vq_err(vq, Fail to start TX polling\n);
  s/Fail/Failed/
 
  A question though: how can this happen? Could you clarify please?
  Maybe we can find a way to prevent this error?
 
 Two conditions I think this can happen:
 
 1) a buggy userspace disable a queue through TUNSETQUEUE
 2) the net device were gone
 
 For 1, looks like we can delay the disabling until the refcnt goes to
 zero. For 2 may needs more changes.

I'd expect keeping a socket reference would prevent both issues.
Doesn't it?

 Not sure it's worth to do this work,
 maybe a warning is enough just like other failure.

With other failures, you normally can correct the error then
kick to have it restart. This is soomething thagt would not
work here.

 
 mutex_unlock(vq-mutex);
 return;
 }
  @@ -261,7 +235,7 @@ static void handle_tx(struct vhost_net *net)
 vhost_disable_notify(net-dev, vq);
   
 if (wmem  sock-sk-sk_sndbuf / 2)
  -  tx_poll_stop(net);
  +  vhost_poll_stop(poll);
 hdr_size = vq-vhost_hlen;
 zcopy = vq-ubufs;
   
  @@ -283,8 +257,10 @@ static void handle_tx(struct vhost_net *net)
   
 wmem = atomic_read(sock-sk-sk_wmem_alloc);
 if (wmem = sock-sk-sk_sndbuf * 3 / 4) {
  -  tx_poll_start(net, sock);
  -  set_bit(SOCK_ASYNC_NOSPACE, sock-flags);
  +  if (vhost_poll_start(poll, sock-file))
  + 

Re: [PATCH V3 2/2] vhost: handle polling errors

2013-01-07 Thread Jason Wang
On 01/07/2013 10:55 PM, Michael S. Tsirkin wrote:
 On Mon, Jan 07, 2013 at 12:38:17PM +0800, Jason Wang wrote:
 On 01/06/2013 09:22 PM, Michael S. Tsirkin wrote:
 On Sun, Jan 06, 2013 at 03:18:38PM +0800, Jason Wang wrote:
 Polling errors were ignored by vhost/vhost_net, this may lead to crash when
 trying to remove vhost from waitqueue when after the polling is failed. 
 Solve
 this problem by:

 - checking the poll-wqh before trying to remove from waitqueue
 - report an error when poll() returns a POLLERR in vhost_start_poll()
 - report an error when vhost_start_poll() fails in
   vhost_vring_ioctl()/vhost_net_set_backend() which is used to notify the
   failure to userspace.
 - report an error in the data path in vhost_net when meet polling errors.

 After those changes, we can safely drop the tx polling state in vhost_net 
 since
 it was replaced by the checking of poll-wqh.

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/vhost/net.c   |   74 
 
  drivers/vhost/vhost.c |   31 +++-
  drivers/vhost/vhost.h |2 +-
  3 files changed, 49 insertions(+), 58 deletions(-)

 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index d10ad6f..125c1e5 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -64,20 +64,10 @@ enum {
VHOST_NET_VQ_MAX = 2,
  };
  
 -enum vhost_net_poll_state {
 -  VHOST_NET_POLL_DISABLED = 0,
 -  VHOST_NET_POLL_STARTED = 1,
 -  VHOST_NET_POLL_STOPPED = 2,
 -};
 -
  struct vhost_net {
struct vhost_dev dev;
struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
struct vhost_poll poll[VHOST_NET_VQ_MAX];
 -  /* Tells us whether we are polling a socket for TX.
 -   * We only do this when socket buffer fills up.
 -   * Protected by tx vq lock. */
 -  enum vhost_net_poll_state tx_poll_state;
/* Number of TX recently submitted.
 * Protected by tx vq lock. */
unsigned tx_packets;
 @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, 
 struct iovec *to,
}
  }
  
 -/* Caller must have TX VQ lock */
 -static void tx_poll_stop(struct vhost_net *net)
 -{
 -  if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED))
 -  return;
 -  vhost_poll_stop(net-poll + VHOST_NET_VQ_TX);
 -  net-tx_poll_state = VHOST_NET_POLL_STOPPED;
 -}
 -
 -/* Caller must have TX VQ lock */
 -static void tx_poll_start(struct vhost_net *net, struct socket *sock)
 -{
 -  if (unlikely(net-tx_poll_state != VHOST_NET_POLL_STOPPED))
 -  return;
 -  vhost_poll_start(net-poll + VHOST_NET_VQ_TX, sock-file);
 -  net-tx_poll_state = VHOST_NET_POLL_STARTED;
 -}
 -
  /* In case of DMA done not in order in lower device driver for some 
 reason.
   * upend_idx is used to track end of used idx, done_idx is used to track 
 head
   * of used idx. Once lower device DMA done contiguously, we will signal 
 KVM
 @@ -227,6 +199,7 @@ static void vhost_zerocopy_callback(struct ubuf_info 
 *ubuf, bool success)
  static void handle_tx(struct vhost_net *net)
  {
struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +  struct vhost_poll *poll = net-poll + VHOST_NET_VQ_TX;
unsigned out, in, s;
int head;
struct msghdr msg = {
 @@ -252,7 +225,8 @@ static void handle_tx(struct vhost_net *net)
wmem = atomic_read(sock-sk-sk_wmem_alloc);
if (wmem = sock-sk-sk_sndbuf) {
mutex_lock(vq-mutex);
 -  tx_poll_start(net, sock);
 +  if (vhost_poll_start(poll, sock-file))
 +  vq_err(vq, Fail to start TX polling\n);
 s/Fail/Failed/

 A question though: how can this happen? Could you clarify please?
 Maybe we can find a way to prevent this error?
 Two conditions I think this can happen:

 1) a buggy userspace disable a queue through TUNSETQUEUE
 2) the net device were gone

 For 1, looks like we can delay the disabling until the refcnt goes to
 zero. For 2 may needs more changes.
 I'd expect keeping a socket reference would prevent both issues.
 Doesn't it?

Doesn't work for 2 I think, the socket didn't hold a refcnt of the
device, so the device can go away at anytime. Although we can change
this, but it's the behaviour before multiqueue support.

 Not sure it's worth to do this work,
 maybe a warning is enough just like other failure.
 With other failures, you normally can correct the error then
 kick to have it restart. This is soomething thagt would not
 work here.

If userspace is wrote correctly, (e.g passing a fd with correct state)
it can also be corrected.

mutex_unlock(vq-mutex);
return;
}
 @@ -261,7 +235,7 @@ static void handle_tx(struct vhost_net *net)
vhost_disable_notify(net-dev, vq);
  
if (wmem  sock-sk-sk_sndbuf / 2)
 -  tx_poll_stop(net);
 +  vhost_poll_stop(poll);
hdr_size = vq-vhost_hlen;
zcopy = vq-ubufs;
  
 @@ -283,8 +257,10 @@ static void handle_tx(struct vhost_net *net)
  
wmem = atomic_read(sock-sk-sk_wmem_alloc);
if (wmem = 

Re: [RESEND PATCH] pci-assign: Enable MSIX on device to match guest

2013-01-07 Thread Michael S. Tsirkin
On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote:
 When a guest enables MSIX on a device we evaluate the MSIX vector
 table, typically find no unmasked vectors and don't switch the device
 to MSIX mode.  This generally works fine and the device will be
 switched once the guest enables and therefore unmasks a vector.
 Unfortunately some drivers enable MSIX, then use interfaces to send
 commands between VF  PF or PF  firmware that act based on the host
 state of the device.  These therefore may break when MSIX is managed
 lazily.  This change re-enables the previous test used to enable MSIX
 (see qemu-kvm a6b402c9), which basically guesses whether a vector
 will be used based on the data field of the vector table.
 
 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Acked-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Michael has now ack'd this patch as the correct initial first step,
 so I'm resending with that included.  I'm actually not sure what the
 expected upstream path is for this file now that it's part of qemu.
 There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically
 for this file.  Is kvm still upstream for this, through the uq branch
 or is it qemu for anything not specifically part of a kvm interface?
 Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree,
 any path is fine by me.  Thanks,
 
 Alex

I can merge this if there are no other takers.

  hw/kvm/pci-assign.c |   17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)
 
 diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
 index 8ee9428..896cfe8 100644
 --- a/hw/kvm/pci-assign.c
 +++ b/hw/kvm/pci-assign.c
 @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry 
 *entry)
  return (entry-ctrl  cpu_to_le32(0x1)) != 0;
  }
  
 +/*
 + * When MSI-X is first enabled the vector table typically has all the
 + * vectors masked, so we can't use that as the obvious test to figure out
 + * how many vectors to initially enable.  Instead we look at the data field
 + * because this is what worked for pci-assign for a long time.  This makes
 + * sure the physical MSI-X state tracks the guest's view, which is important
 + * for some VF/PF and PF/fw communication channels.
 + */
 +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
 +{
 +return !entry-data;
 +}
 +
  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
  {
  AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  
  /* Get the usable entry number for allocating */
  for (i = 0; i  adev-msix_max; i++, entry++) {
 -if (assigned_dev_msix_masked(entry)) {
 +if (assigned_dev_msix_skipped(entry)) {
  continue;
  }
  entries_nr++;
 @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  for (i = 0; i  adev-msix_max; i++, entry++) {
  adev-msi_virq[i] = -1;
  
 -if (assigned_dev_msix_masked(entry)) {
 +if (assigned_dev_msix_skipped(entry)) {
  continue;
  }
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rotate graphical output of vm

2013-01-07 Thread Dennis Böck
Dear KVM-list,
 
I would like to rotate the graphical output of a vm. I tried it with a Fedora 
17 and a OpenSuse 12.2 guest with vga option -vga -qxl and -vmvga and: xrandr 
-o left, but I got the error message:
 
X Error of failed request: BadMatch (invalid parameter attributes)
Major opcode of failed request: 129 (RANDR)
Minor opcode of failed request: 2 (RRsetScreenConfig)
Serial number of failed request: 12
Current serial number in output stream: 12
 
Also my try to change the xorg.conf in SLES11SP1 did not change the rotation, 
because the Xserver log says: Option Rotate is not used
The only vga option which worked partly is: cirrus. But with cirrus the 
graphical output is not displayed correctly in my vnc-viewer.
I tried it with a SLES11SP2 host (Kernel 3.0.13-0.27) and a Fedora17 host 
(Kernel 3.3.4).
Any ideas?
 
Best regards and thanks in advance
Dennis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2013-01-08

2013-01-07 Thread Juan Quintela

Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FreeBSD-amd64 fails to start with SMP on quemu-kvm

2013-01-07 Thread Artur Samborski

Hello,

When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm (Fedora 
Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with:


qemu-kvm -m 1024m -cpu host -smp 2 -cdrom 
/storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso


it freezes KVM with:

KVM internal error. Suberror: 1
emulation failure
RAX=80b0d4c0 RBX=0009f000 RCX=c080 
RDX=
RSI=d238 RDI= RBP= 
RSP=
R8 = R9 = R10= 
R11=
R12= R13= R14= 
R15=

RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   f300 DPL=3 DS16 [-WA]
CS =0008   00209900 DPL=0 CS64 [--A]
SS =9f00 0009f000  f300 DPL=3 DS16 [-WA]
DS =0018   00c09300 DPL=0 DS   [-WA]
FS =   f300 DPL=3 DS16 [-WA]
GS =   f300 DPL=3 DS16 [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS64-busy
GDT= 0009f080 0020
IDT=  
CR0=8011 CR2= CR3=0009c000 CR4=0030
DR0= DR1= DR2= 
DR3=

DR6=0ff0 DR7=0400
EFER=0501
Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80 ff ff 
ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 99 20 00 
ff ff 00 00


Freeze occurs immediately after FreeBSD kernel messages:

Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012
r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
CPU: Intel(R) Xeon(R) CPU   X5570  @ 2.93GHz (2925.91-MHz 
K8-class CPU)
  Origin = GenuineIntel  Id = 0x106a5  Family = 6  Model = 1a 
Stepping = 5


Features=0xf83fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS
  Features2=0x80982201SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV
  AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM
  AMD Features2=0x1LAHF
real memory  = 1073741824 (1024 MB)
avail memory = 1011343360 (964 MB)
Event timer LAPIC quality 400
ACPI APIC Table: BOCHS  BXPCAPIC

so just prior to probing of SMP.

This also applies to FreeBSD-7.3-RELEASE-amd64 and FreeBSD-9.1-RC3-amd64 
(other releases not tested).


When quemu-kvm is started without SMP (1 vpcu) amd64 FreeBSD kernel 
boots correctly. I did not notice this problem (SMP) for the i386 
versions (FreeBSD-7.3-RELEASE-i386, FreeBSD-9.0-RELEASE-i386, 
FreeBSD-9.1-RC3-i386).


Additional info:

- KVM Host OS:
Fedora Core 17

- CPUs on my KVM host -- Xeons X5570

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Xeon(R) CPU   X5570  @ 2.93GHz
stepping: 5
microcode   : 0x11
cpu MHz : 2926.183
cache size  : 8192 KB
physical id : 1
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 16
initial apicid  : 16
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 
ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dtherm 
tpr_shadow vnmi flexpriority ept vpid

bogomips: 5852.36
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

- kernel (from FC17 repo):
3.6.9 (kernel-3.6.9-2.fc17.x86_64)
- quemu version:
qemu-kvm 1.0.1 (qemu-kvm-1.0.1-2.fc17.x86_64)
- neither the -no-kvm-irqchip nor -no-kvm-pit switch helps
- with he -no-kvm switch FreeBSD boots correctly
- linux guest (x86_64 with SMP) works perfectly ok

I suspect that this bug is related in some way with the hardware. I 
tested the same KVM-host system (exact clone) with the same guest 
(FreeBSD-amd64) on another machine (i3-2120 workstation) and have not 
noticed a similar problems witch SMP.


I will be grateful for any hints.

Regards,
Artur Samborski
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit

2013-01-07 Thread Scott Wood

On 01/04/2013 05:41:42 PM, Alexander Graf wrote:
@@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct  
kvm_vcpu *vcpu,

set_guest_esr(vcpu, vcpu-arch.queued_esr);
if (update_dear == true)
set_guest_dear(vcpu, vcpu-arch.queued_dear);
+   if (update_epr == true) {
+   kvm_make_request(KVM_REQ_EPR_EXIT, vcpu);
+   /* Indicate that we want to recheck requests */
+   allowed = 2;
+   }


We shouldn't need allowed = 2 anymore.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 11:52:21AM -0200, Marcelo Tosatti wrote:
 On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
  manually, which is fully taken care of by the hardware. This needs
  some special awareness into existing interrupr injection path:
  
  - for pending interrupt, instead of direct injection, we may need
update architecture specific indicators before resuming to guest.
  
  - A pending interrupt, which is masked by ISR, should be also
considered in above update action, since hardware will decide
when to inject it at right time. Current has_interrupt and
get_interrupt only returns a valid vector from injection p.o.v.
  
  Signed-off-by: Kevin Tian kevin.t...@intel.com
  Signed-off-by: Yang Zhang yang.z.zh...@intel.com
  ---
   arch/ia64/kvm/lapic.h   |6 ++
   arch/x86/include/asm/kvm_host.h |8 ++
   arch/x86/include/asm/vmx.h  |   11 +++
   arch/x86/kvm/irq.c  |   56 +++-
   arch/x86/kvm/lapic.c|   87 +++---
   arch/x86/kvm/lapic.h|   29 +-
   arch/x86/kvm/svm.c  |   36 
   arch/x86/kvm/vmx.c  |  190 
  ++-
   arch/x86/kvm/x86.c  |   11 ++-
   include/linux/kvm_host.h|2 +
   virt/kvm/ioapic.c   |   41 +
   virt/kvm/ioapic.h   |1 +
   virt/kvm/irq_comm.c |   20 
   13 files changed, 451 insertions(+), 47 deletions(-)
  
  diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
  index c5f92a9..cb59eb4 100644
  --- a/arch/ia64/kvm/lapic.h
  +++ b/arch/ia64/kvm/lapic.h
  @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
  kvm_lapic_irq *irq);
   #define kvm_apic_present(x) (true)
   #define kvm_lapic_enabled(x) (true)
   
  +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
  +   struct kvm_lapic_irq *irq)
  +{
  +   /* IA64 has no apicv supporting, do nothing here */
  +}
  +
   #endif
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index c431b33..135603f 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -697,6 +697,13 @@ struct kvm_x86_ops {
  void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
  void (*enable_irq_window)(struct kvm_vcpu *vcpu);
  void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
  +   int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
  +   void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
  +   void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq);
  +   void (*update_exitmap_start)(struct kvm_vcpu *vcpu);
  +   void (*update_exitmap_end)(struct kvm_vcpu *vcpu);
  +   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
  +   void (*restore_rvi)(struct kvm_vcpu *vcpu);
  int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
  int (*get_tdp_level)(void);
  u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
  @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva);
   int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
   void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
   int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
  +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
   int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
   int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
   int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
  diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
  index 44c3f7e..d1ab331 100644
  --- a/arch/x86/include/asm/vmx.h
  +++ b/arch/x86/include/asm/vmx.h
  @@ -62,6 +62,7 @@
   #define EXIT_REASON_MCE_DURING_VMENTRY  41
   #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
   #define EXIT_REASON_APIC_ACCESS 44
  +#define EXIT_REASON_EOI_INDUCED 45
   #define EXIT_REASON_EPT_VIOLATION   48
   #define EXIT_REASON_EPT_MISCONFIG   49
   #define EXIT_REASON_WBINVD  54
  @@ -143,6 +144,7 @@
   #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
   #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
   #define SECONDARY_EXEC_APIC_REGISTER_VIRT   0x0100
  +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200
   #define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
   #define SECONDARY_EXEC_ENABLE_INVPCID  0x1000
   
  @@ -180,6 +182,7 @@ enum vmcs_field {
  GUEST_GS_SELECTOR   = 0x080a,
  GUEST_LDTR_SELECTOR = 0x080c,
  GUEST_TR_SELECTOR   = 0x080e,
  +   GUEST_INTR_STATUS   = 0x0810,
  HOST_ES_SELECTOR= 0x0c00,
  HOST_CS_SELECTOR= 0x0c02,
  HOST_SS_SELECTOR= 0x0c04,
  @@ -207,6 +210,14 @@ 

Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit

2013-01-07 Thread Alexander Graf

On 07.01.2013, at 18:47, Scott Wood wrote:

 On 01/04/2013 05:41:42 PM, Alexander Graf wrote:
 @@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
 *vcpu,
  set_guest_esr(vcpu, vcpu-arch.queued_esr);
  if (update_dear == true)
  set_guest_dear(vcpu, vcpu-arch.queued_dear);
 +if (update_epr == true) {
 +kvm_make_request(KVM_REQ_EPR_EXIT, vcpu);
 +/* Indicate that we want to recheck requests */
 +allowed = 2;
 +}
 
 We shouldn't need allowed = 2 anymore.

Thanks for noticing :). I already removed this one locally today but wanted to 
wait for further comments on the series before sending out a v3.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH qom-cpu 00/11] disable-kvm_mmu + -cpu check/enforce fixes (v2)

2013-01-07 Thread Andreas Färber
Am 04.01.2013 23:01, schrieb Eduardo Habkost:
 Eduardo Habkost (11):
[...]
   target-i386: kvm: -cpu host: Use GET_SUPPORTED_CPUID for SVM features
   target-i386: kvm: Enable all supported KVM features for -cpu host
   target-i386: check/enforce: Fix CPUID leaf numbers on error messages
   target-i386: check/enforce: Do not ignore hypervisor flag
   target-i386: check/enforce: Check all CPUID.8001H.EDX bits
   target-i386: check/enforce: Check SVM flag support as well
   target-i386: check/enforce: Eliminate check_feat field
[snip]

Thanks, applied patches 3-9 to qom-cpu queue (fixing some typos in
commit messages):
https://github.com/afaerber/qemu-cpu/commits/qom-cpu

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] kvm tools: arm: make .dtb dumping a command-line option

2013-01-07 Thread Will Deacon
It can sometimes be useful to dump the .dtb file generated by kvmtool
when debugging a guest. Currently, this is achieved by rebuilding the
tool and changing some #defines, which is fairly clumsy to use.

This patch adds a new command-line option for ARM, allowing the dtb
to be dumped to a named file at runtime.

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/arm/fdt.c | 18 ++
 tools/kvm/arm/include/kvm/kvm-config-arch.h |  8 
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index c7f4b52..e52c10c 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -13,9 +13,6 @@
 #include linux/kernel.h
 #include linux/sizes.h
 
-#define DEBUG  0
-#define DEBUG_FDT_DUMP_FILE/tmp/kvmtool.dtb
-
 static char kern_cmdline[COMMAND_LINE_SIZE];
 
 bool kvm__load_firmware(struct kvm *kvm, const char *firmware_filename)
@@ -28,25 +25,21 @@ int kvm__arch_setup_firmware(struct kvm *kvm)
return 0;
 }
 
-#if DEBUG
-static void dump_fdt(void *fdt)
+static void dump_fdt(const char *dtb_file, void *fdt)
 {
int count, fd;
 
-   fd = open(DEBUG_FDT_DUMP_FILE, O_CREAT | O_TRUNC | O_RDWR, 0666);
+   fd = open(dtb_file, O_CREAT | O_TRUNC | O_RDWR, 0666);
if (fd  0)
-   die(Failed to write dtb to %s, DEBUG_FDT_DUMP_FILE);
+   die(Failed to write dtb to %s, dtb_file);
 
count = write(fd, fdt, FDT_MAX_SIZE);
if (count  0)
die_perror(Failed to dump dtb);
 
-   pr_info(Wrote %d bytes to dtb %s\n, count, DEBUG_FDT_DUMP_FILE);
+   pr_info(Wrote %d bytes to dtb %s\n, count, dtb_file);
close(fd);
 }
-#else
-static void dump_fdt(void *fdt) { }
-#endif
 
 #define DEVICE_NAME_MAX_LEN 32
 static void generate_virtio_mmio_node(void *fdt, struct virtio_mmio *vmmio)
@@ -143,7 +136,8 @@ static int setup_fdt(struct kvm *kvm)
_FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE));
_FDT(fdt_pack(fdt_dest));
 
-   dump_fdt(fdt_dest);
+   if (kvm-cfg.arch.dump_dtb_filename)
+   dump_fdt(kvm-cfg.arch.dump_dtb_filename, fdt_dest);
return 0;
 }
 late_init(setup_fdt);
diff --git a/tools/kvm/arm/include/kvm/kvm-config-arch.h 
b/tools/kvm/arm/include/kvm/kvm-config-arch.h
index 60f61de..f63f302 100644
--- a/tools/kvm/arm/include/kvm/kvm-config-arch.h
+++ b/tools/kvm/arm/include/kvm/kvm-config-arch.h
@@ -1,7 +1,15 @@
 #ifndef KVM__KVM_CONFIG_ARCH_H
 #define KVM__KVM_CONFIG_ARCH_H
 
+#include kvm/parse-options.h
+
 struct kvm_config_arch {
+   const char *dump_dtb_filename;
 };
 
+#define OPT_ARCH_RUN(pfx, cfg) \
+   pfx,\
+   OPT_STRING('\0', dump-dtb, (cfg)-dump_dtb_filename, \
+  .dtb file, Dump generated .dtb to specified file),
+
 #endif /* KVM__KVM_CONFIG_ARCH_H */
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ARM updates for kvmtool

2013-01-07 Thread Will Deacon
Hello kvm hackers,

This patch series introduces some updates to the ARM (AArch32) kvm tools
code:

- virtio mmio fixes to deal with guest page sizes != 4k (in
  preparation for AArch64, which I will post separately).
- .dtb dumping via the lkvm command line
- Support for PSCI firmware as a replacement to the spin-table
  based SMP boot code

The last option was implemented after discussion on the linux-arm-kernel
list when adding support for the mach-virt platform. I hope to upstream
the kernel-side part of the implementation for 3.9 and expect the kvm
bits to follow once that has been merged.

All feedback welcome.

Will


Will Deacon (4):
  kvm tools: virtio: remove hardcoded assumptions about guest page size
  kvm tools: pedantry: fix annoying typo
  kvm tools: arm: make .dtb dumping a command-line option
  kvm tools: arm: add support for PSCI firmware in place of spin-tables

 tools/kvm/Makefile |  5 +-
 tools/kvm/arm/aarch32/cortex-a15.c |  8 +--
 tools/kvm/arm/aarch32/include/kvm/kvm-arch.h   |  1 -
 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h   | 12 +
 tools/kvm/arm/aarch32/kvm-cpu.c| 59 ++
 tools/kvm/arm/aarch32/smp-pen.S| 39 --
 tools/kvm/arm/fdt.c| 54 +++-
 tools/kvm/arm/include/arm-common/gic.h |  2 -
 tools/kvm/arm/include/arm-common/kvm-arch.h|  5 --
 .../arm/include/{kvm = arm-common}/kvm-cpu-arch.h |  6 +--
 tools/kvm/arm/include/kvm/kvm-config-arch.h|  8 +++
 tools/kvm/arm/kvm-cpu.c|  4 +-
 tools/kvm/arm/kvm.c|  1 +
 tools/kvm/arm/smp.c| 21 
 tools/kvm/include/kvm/virtio.h | 14 +
 tools/kvm/kvm.c|  2 +-
 tools/kvm/virtio/9p.c  |  7 +--
 tools/kvm/virtio/balloon.c |  7 +--
 tools/kvm/virtio/blk.c |  7 +--
 tools/kvm/virtio/console.c |  7 +--
 tools/kvm/virtio/mmio.c|  8 +--
 tools/kvm/virtio/net.c |  8 +--
 tools/kvm/virtio/pci.c |  4 +-
 tools/kvm/virtio/rng.c |  7 +--
 tools/kvm/virtio/scsi.c|  7 +--
 25 files changed, 114 insertions(+), 189 deletions(-)
 create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h
 delete mode 100644 tools/kvm/arm/aarch32/smp-pen.S
 rename tools/kvm/arm/include/{kvm = arm-common}/kvm-cpu-arch.h (87%)
 delete mode 100644 tools/kvm/arm/smp.c

-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] kvm tools: pedantry: fix annoying typo

2013-01-07 Thread Will Deacon
s/extention/extension/

I should get out more...

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 3ea6339..a6b3c23 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -291,7 +291,7 @@ int kvm__init(struct kvm *kvm)
}
 
if (kvm__check_extensions(kvm)) {
-   pr_err(A required KVM extention is not supported by OS);
+   pr_err(A required KVM extension is not supported by OS);
ret = -ENOSYS;
goto err_vm_fd;
}
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvm tools: virtio: remove hardcoded assumptions about guest page size

2013-01-07 Thread Will Deacon
virtio-based PCI devices deal only with 4k memory granules, making
direct use of the VIRTIO_PCI_VRING_ALIGN and VIRTIO_PCI_QUEUE_ADDR_SHIFT
constants when initialising the virtqueues for a device.

For MMIO-based devices, the guest page size is arbitrary and may differ
from that of the host (this is the case on AArch64, where both 4k and
64k pages are supported).

This patch fixes the virtio drivers to honour the guest page size passed
when configuring the virtio device and align the virtqueues accordingly.

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/include/kvm/virtio.h | 14 ++
 tools/kvm/virtio/9p.c  |  7 ---
 tools/kvm/virtio/balloon.c |  7 ---
 tools/kvm/virtio/blk.c |  7 ---
 tools/kvm/virtio/console.c |  7 ---
 tools/kvm/virtio/mmio.c|  8 
 tools/kvm/virtio/net.c |  8 
 tools/kvm/virtio/pci.c |  4 +++-
 tools/kvm/virtio/rng.c |  7 ---
 tools/kvm/virtio/scsi.c|  7 ---
 10 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 5dc2544..924279b 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -43,17 +43,6 @@ static inline bool virt_queue__available(struct virt_queue 
*vq)
return vq-vring.avail-idx !=  vq-last_avail_idx;
 }
 
-/*
- * Warning: on 32-bit hosts, shifting pfn left may cause a truncation of pfn 
values
- * higher than 4GB - thus, pointing to the wrong area in guest virtual memory 
space
- * and breaking the virt queue which owns this pfn.
- */
-static inline void *guest_pfn_to_host(struct kvm *kvm, u32 pfn)
-{
-   return guest_flat_to_host(kvm, (unsigned long)pfn  
VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-}
-
-
 struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, 
u32 head, u32 len);
 
 bool virtio_queue__should_signal(struct virt_queue *vq);
@@ -81,7 +70,8 @@ struct virtio_ops {
u8 *(*get_config)(struct kvm *kvm, void *dev);
u32 (*get_host_features)(struct kvm *kvm, void *dev);
void (*set_guest_features)(struct kvm *kvm, void *dev, u32 features);
-   int (*init_vq)(struct kvm *kvm, void *dev, u32 vq, u32 pfn);
+   int (*init_vq)(struct kvm *kvm, void *dev, u32 vq, u32 page_size,
+  u32 align, u32 pfn);
int (*notify_vq)(struct kvm *kvm, void *dev, u32 vq);
int (*get_pfn_vq)(struct kvm *kvm, void *dev, u32 vq);
int (*get_size_vq)(struct kvm *kvm, void *dev, u32 vq);
diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index 4665876..60865dd 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -1254,7 +1254,8 @@ static void set_guest_features(struct kvm *kvm, void 
*dev, u32 features)
p9dev-features = features;
 }
 
-static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn)
+static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
+  u32 pfn)
 {
struct p9_dev *p9dev = dev;
struct p9_dev_job *job;
@@ -1265,10 +1266,10 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 pfn)
 
queue   = p9dev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_pfn_to_host(kvm, queue-pfn);
+   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
job = p9dev-jobs[vq];
 
-   vring_init(queue-vring, VIRTQUEUE_NUM, p, VIRTIO_PCI_VRING_ALIGN);
+   vring_init(queue-vring, VIRTQUEUE_NUM, p, align);
 
*job= (struct p9_dev_job) {
.vq = queue,
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index 9edce87..d1b64fa 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -193,7 +193,8 @@ static void set_guest_features(struct kvm *kvm, void *dev, 
u32 features)
bdev-features = features;
 }
 
-static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn)
+static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
+  u32 pfn)
 {
struct bln_dev *bdev = dev;
struct virt_queue *queue;
@@ -203,10 +204,10 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 pfn)
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
-   p   = guest_pfn_to_host(kvm, queue-pfn);
+   p   = guest_flat_to_host(kvm, queue-pfn * page_size);
 
thread_pool__init_job(bdev-jobs[vq], kvm, virtio_bln_do_io, queue);
-   vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, 
VIRTIO_PCI_VRING_ALIGN);
+   vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, align);
 
return 0;
 }
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index ec57e96..44ac44b 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -156,7 +156,8 @@ static void set_guest_features(struct kvm *kvm, void *dev, 
u32 features)
   

[PATCH 4/4] kvm tools: arm: add support for PSCI firmware in place of spin-tables

2013-01-07 Thread Will Deacon
ARM has recently published a document describing a firmware interface
for CPU power management, which can be used for booting secondary cores
on an SMP platform, amongst other things. As part of the mach-virt
upstreaming for the kernel (that is, the virtual platform targetted by
kvmtool), it was suggested that we use this interface instead of the
current spin-table based approach.

This patch implements PSCI support in kvmtool for ARM, removing a fair
amount of code in the process.

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/Makefile |  5 +-
 tools/kvm/arm/aarch32/cortex-a15.c |  8 +--
 tools/kvm/arm/aarch32/include/kvm/kvm-arch.h   |  1 -
 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h   | 12 +
 tools/kvm/arm/aarch32/kvm-cpu.c| 59 ++
 tools/kvm/arm/aarch32/smp-pen.S| 39 --
 tools/kvm/arm/fdt.c| 36 +
 tools/kvm/arm/include/arm-common/gic.h |  2 -
 tools/kvm/arm/include/arm-common/kvm-arch.h|  5 --
 .../arm/include/{kvm = arm-common}/kvm-cpu-arch.h |  6 +--
 tools/kvm/arm/kvm-cpu.c|  4 +-
 tools/kvm/arm/kvm.c|  1 +
 tools/kvm/arm/smp.c| 21 
 13 files changed, 62 insertions(+), 137 deletions(-)
 create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h
 delete mode 100644 tools/kvm/arm/aarch32/smp-pen.S
 rename tools/kvm/arm/include/{kvm = arm-common}/kvm-cpu-arch.h (87%)
 delete mode 100644 tools/kvm/arm/smp.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index a83dd10..33aa4d8 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -160,18 +160,15 @@ endif
 
 # ARM
 OBJS_ARM_COMMON:= arm/fdt.o arm/gic.o arm/ioport.o arm/irq.o \
-  arm/kvm.o arm/kvm-cpu.o arm/smp.o
+  arm/kvm.o arm/kvm-cpu.o
 HDRS_ARM_COMMON:= arm/include
 ifeq ($(ARCH), arm)
DEFINES += -DCONFIG_ARM
OBJS+= $(OBJS_ARM_COMMON)
OBJS+= arm/aarch32/cortex-a15.o
OBJS+= arm/aarch32/kvm-cpu.o
-   OBJS+= arm/aarch32/smp-pen.o
ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
ARCH_INCLUDE+= -Iarm/aarch32/include
-   ASFLAGS += -D__ASSEMBLY__
-   ASFLAGS += -I$(ARCH_INCLUDE)
CFLAGS  += -march=armv7-a
CFLAGS  += -I../../scripts/dtc/libfdt
OTHEROBJS   += $(LIBFDT_OBJS)
diff --git a/tools/kvm/arm/aarch32/cortex-a15.c 
b/tools/kvm/arm/aarch32/cortex-a15.c
index eac0bb9..8031747 100644
--- a/tools/kvm/arm/aarch32/cortex-a15.c
+++ b/tools/kvm/arm/aarch32/cortex-a15.c
@@ -31,12 +31,8 @@ static void generate_cpu_nodes(void *fdt, struct kvm *kvm)
_FDT(fdt_property_string(fdt, device_type, cpu));
_FDT(fdt_property_string(fdt, compatible, arm,cortex-a15));
 
-   if (kvm-nrcpus  1) {
-   _FDT(fdt_property_string(fdt, enable-method,
-spin-table));
-   _FDT(fdt_property_cell(fdt, cpu-release-addr,
-  kvm-arch.smp_jump_guest_start));
-   }
+   if (kvm-nrcpus  1)
+   _FDT(fdt_property_string(fdt, enable-method, psci));
 
_FDT(fdt_property_cell(fdt, reg, cpu));
_FDT(fdt_end_node(fdt));
diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h 
b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
index f236895..ca79b24 100644
--- a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
+++ b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
@@ -15,7 +15,6 @@
 
 #define ARM_KERN_OFFSET0x8000
 
-#define ARM_SMP_PEN_SIZE   PAGE_SIZE
 #define ARM_VIRTIO_MMIO_SIZE   (ARM_GIC_DIST_BASE - ARM_LOMAP_MMIO_AREA)
 #define ARM_PCI_MMIO_SIZE  (ARM_LOMAP_MEMORY_AREA - ARM_LOMAP_AXI_AREA)
 
diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h 
b/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h
new file mode 100644
index 000..b9fda07
--- /dev/null
+++ b/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h
@@ -0,0 +1,12 @@
+#ifndef KVM__KVM_CPU_ARCH_H
+#define KVM__KVM_CPU_ARCH_H
+
+#include kvm/kvm.h
+
+#include arm-common/kvm-cpu-arch.h
+
+#define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid) {   \
+   [0] = (!!(cpuid)  KVM_ARM_VCPU_POWER_OFF),\
+}
+
+#endif /* KVM__KVM_CPU_ARCH_H */
diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c
index f00a2f1..a528789 100644
--- a/tools/kvm/arm/aarch32/kvm-cpu.c
+++ b/tools/kvm/arm/aarch32/kvm-cpu.c
@@ -21,38 +21,33 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
if (ioctl(vcpu-vcpu_fd, KVM_SET_ONE_REG, reg)  0)

[PATCH qom-cpu 1/7] kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code

2013-01-07 Thread Eduardo Habkost
Any KVM-specific code that use these constants must check if
kvm_enabled() is true before using them.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
---
 include/sysemu/kvm.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 3db19ff..15f9658 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -21,6 +21,20 @@
 #ifdef CONFIG_KVM
 #include linux/kvm.h
 #include linux/kvm_para.h
+#else
+/* These constants must never be used at runtime if kvm_enabled() is false.
+ * They exist so we don't need #ifdefs around KVM-specific code that already
+ * checks kvm_enabled() properly.
+ */
+#define KVM_CPUID_SIGNATURE  0
+#define KVM_CPUID_FEATURES   0
+#define KVM_FEATURE_CLOCKSOURCE  0
+#define KVM_FEATURE_NOP_IO_DELAY 0
+#define KVM_FEATURE_MMU_OP   0
+#define KVM_FEATURE_CLOCKSOURCE2 0
+#define KVM_FEATURE_ASYNC_PF 0
+#define KVM_FEATURE_STEAL_TIME   0
+#define KVM_FEATURE_PV_EOI   0
 #endif
 
 extern int kvm_allowed;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 0/7] disable kvm_mmu + -cpu enforce fixes (v3)

2013-01-07 Thread Eduardo Habkost
Changes on v3:
 - Patches 3-9 from v2 are now already on qom-cpu tree
 - Remove CONFIG_KVM #ifdefs by declaring fake KVM_* #defines on sysemu/kvm.h
 - Refactor code that uses the feature word arrays
   (to make it easier to add a new feature name array)
 - Add feature name array for CPUID leaf 0xC001

Changes on v2:
 - Now both the kvm_mmu-disable and -cpu enforce changes are on the same
   series
 - Coding style fixes

Git tree for reference:
  git://github.com/ehabkost/qemu-hacks.git cpu-enforce-all.v3
  https://github.com/ehabkost/qemu-hacks/tree/cpu-enforce-all.v3

The changes are a bit intrusive, but:

 - The longer we take to make enforce strict as it should (and make libvirt
   finally use it), more users will have VMs with migration-unsafe unpredictable
   guest ABIs. For this reason, I would like to get this into QEMU 1.4.
 - The changes in this series should affect only users that are already using
   the enforce flag, and I believe whoever is using the enforce flag really
   want the strict behavior introduced by this series.



Eduardo Habkost (7):
  kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code
  target-i386: Don't set any KVM flag by default if KVM is disabled
  target-i386: Disable kvm_mmu by default
  target-i386/cpu: Introduce FeatureWord typedefs
  target-i386: kvm_check_features_against_host(): Use feature_word_info
  target-i386/cpu.c: Add feature name array for ext4_features
  target-i386: check/enforce: Check all feature words

 include/sysemu/kvm.h |  14 
 target-i386/cpu.c| 193 ---
 target-i386/cpu.h|  15 
 3 files changed, 150 insertions(+), 72 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled

2013-01-07 Thread Eduardo Habkost
This is a cleanup that tries to solve two small issues:

 - We don't need a separate kvm_pv_eoi_features variable just to keep a
   constant calculated at compile-time, and this style would require
   adding a separate variable (that's declared twice because of the
   CONFIG_KVM ifdef) for each feature that's going to be enabled/disable
   by machine-type compat code.
 - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features
   even when KVM is disabled at runtime. This small incosistency in
   the cpuid_kvm_features field isn't a problem today because
   cpuid_kvm_features is ignored by the TCG code, but it may cause
   unexpected problems later when refactoring the CPUID handling code.

This patch eliminates the kvm_pv_eoi_features variable and simply uses
kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it
enables kvm_pv_eoi only if KVM is enabled. I believe this makes the
behavior of enable_kvm_pv_eoi() clearer and easier to understand.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com

Changes v2:
 - Coding style fix

Changes v3:
 - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define
---
 target-i386/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 951e206..40400ac 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1  
KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
 (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
-static const uint32_t kvm_pv_eoi_features = (0x1  KVM_FEATURE_PV_EOI);
 #else
 static uint32_t kvm_default_features = 0;
-static const uint32_t kvm_pv_eoi_features = 0;
 #endif
 
 void enable_kvm_pv_eoi(void)
 {
-kvm_default_features |= kvm_pv_eoi_features;
+if (kvm_enabled()) {
+kvm_default_features |= (1UL  KVM_FEATURE_PV_EOI);
+}
 }
 
 void host_cpuid(uint32_t function, uint32_t count,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 6/7] target-i386/cpu.c: Add feature name array for ext4_features

2013-01-07 Thread Eduardo Habkost
Feature names were taken from the X86_FEATURE_* constants in the Linux
kernel code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: Gleb Natapov g...@redhat.com
---
 target-i386/cpu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 4b3ee63..a54c464 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -95,6 +95,17 @@ static const char *ext3_feature_name[] = {
 NULL, NULL, NULL, NULL,
 };
 
+static const char *ext4_feature_name[] = {
+NULL, NULL, xstore,xstore-en,
+NULL, NULL, xcrypt,xcrypt-en,
+ace2, ace2-en, phe, phe-en,
+pmm, pmm-en, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+};
+
 static const char *kvm_feature_name[] = {
 kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock,
 kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, NULL,
@@ -147,6 +158,10 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .feat_names = ext3_feature_name,
 .cpuid_eax = 0x8001, .cpuid_reg = R_ECX,
 },
+[FEAT_C000_0001_EDX] = {
+.feat_names = ext4_feature_name,
+.cpuid_eax = 0xC001, .cpuid_reg = R_EDX,
+},
 [FEAT_KVM] = {
 .feat_names = kvm_feature_name,
 .cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EAX,
@@ -1412,6 +1427,7 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 x86_cpu_def-ext_features |= plus_features[FEAT_1_ECX];
 x86_cpu_def-ext2_features |= plus_features[FEAT_8000_0001_EDX];
 x86_cpu_def-ext3_features |= plus_features[FEAT_8000_0001_ECX];
+x86_cpu_def-ext4_features |= plus_features[FEAT_C000_0001_EDX];
 x86_cpu_def-kvm_features |= plus_features[FEAT_KVM];
 x86_cpu_def-svm_features |= plus_features[FEAT_SVM];
 x86_cpu_def-cpuid_7_0_ebx_features |= plus_features[FEAT_7_0_EBX];
@@ -1419,6 +1435,7 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 x86_cpu_def-ext_features = ~minus_features[FEAT_1_ECX];
 x86_cpu_def-ext2_features = ~minus_features[FEAT_8000_0001_EDX];
 x86_cpu_def-ext3_features = ~minus_features[FEAT_8000_0001_ECX];
+x86_cpu_def-ext4_features = ~minus_features[FEAT_C000_0001_EDX];
 x86_cpu_def-kvm_features = ~minus_features[FEAT_KVM];
 x86_cpu_def-svm_features = ~minus_features[FEAT_SVM];
 x86_cpu_def-cpuid_7_0_ebx_features = ~minus_features[FEAT_7_0_EBX];
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 3/7] target-i386: Disable kvm_mmu by default

2013-01-07 Thread Eduardo Habkost
KVM_CAP_PV_MMU capability reporting was removed from the kernel since
v2.6.33 (see commit a68a6a7282373), and was completely removed from the
kernel since v3.3 (see commit fb92045843). It doesn't make sense to keep
it enabled by default, as it would cause unnecessary hassle when using
the enforce flag.

This disables kvm_mmu on all machine-types. With this fix, the possible
scenarios when migrating from QEMU = 1.3 to QEMU 1.4 are;

++
 src kernel | dst kernel | Result
++
 = 2.6.33  | any| kvm_mmu was already disabled and will stay disabled
 = 2.6.32  | = 3.3 | correct live migration is impossible
 = 2.6.32  | = 3.2 | kvm_mmu will be disabled on next guest reboot *
++

 * If they are running kernel = 2.6.32 and want kvm_mmu to be kept
   enabled on guest reboot, they can explicitly add +kvm_mmu to the QEMU
   command-line. Using 2.6.33 and higher, it is not possible to enable
   kvm_mmu explicitly anymore.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: kvm@vger.kernel.org
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: libvir-l...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com

Changes v2:
 - Coding style fix
 - Removed redundant comments above machine init functions

Changes v3:
 - Eliminate per-machine-type compatibility code
---
 target-i386/cpu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 40400ac..b09b625 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -159,7 +159,6 @@ int enforce_cpuid = 0;
 #if defined(CONFIG_KVM)
 static uint32_t kvm_default_features = (1  KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_NOP_IO_DELAY) |
-(1  KVM_FEATURE_MMU_OP) |
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_STEAL_TIME) |
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 5/7] target-i386: kvm_check_features_against_host(): Use feature_word_info

2013-01-07 Thread Eduardo Habkost
Instead of carrying the CPUID leaf/register and feature name array on
the model_features_t struct, move that information into
feature_word_info so it can be reused by other functions.

The goal is to eventually kill model_features_t entirely, but to do that
we have to either convert x86_def_t.features to an array or use
offsetof() inside FeatureWordInfo (to replace the pointers inside
model_features_t). So by now just move most of the model_features_t
fields to FeatureWordInfo except for the two pointers to local
arguments.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 73 +--
 1 file changed, 49 insertions(+), 24 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 7d62d48..4b3ee63 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -126,16 +126,39 @@ static const char *cpuid_7_0_ebx_feature_name[] = {
 
 typedef struct FeatureWordInfo {
 const char **feat_names;
+uint32_t cpuid_eax; /* Input EAX for CPUID */
+int cpuid_reg;  /* R_* register constant */
 } FeatureWordInfo;
 
 static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
-[FEAT_1_EDX] = { .feat_names = feature_name },
-[FEAT_1_ECX] = { .feat_names = ext_feature_name },
-[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name },
-[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name },
-[FEAT_KVM]   = { .feat_names = kvm_feature_name },
-[FEAT_SVM]   = { .feat_names = svm_feature_name },
-[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name },
+[FEAT_1_EDX] = {
+.feat_names = feature_name,
+.cpuid_eax = 1, .cpuid_reg = R_EDX,
+},
+[FEAT_1_ECX] = {
+.feat_names = ext_feature_name,
+.cpuid_eax = 1, .cpuid_reg = R_ECX,
+},
+[FEAT_8000_0001_EDX] = {
+.feat_names = ext2_feature_name,
+.cpuid_eax = 0x8001, .cpuid_reg = R_EDX,
+},
+[FEAT_8000_0001_ECX] = {
+.feat_names = ext3_feature_name,
+.cpuid_eax = 0x8001, .cpuid_reg = R_ECX,
+},
+[FEAT_KVM] = {
+.feat_names = kvm_feature_name,
+.cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EAX,
+},
+[FEAT_SVM] = {
+.feat_names = svm_feature_name,
+.cpuid_eax = 0x800A, .cpuid_reg = R_EDX,
+},
+[FEAT_7_0_EBX] = {
+.feat_names = cpuid_7_0_ebx_feature_name,
+.cpuid_eax = 7, .cpuid_reg = R_EBX,
+},
 };
 
 const char *get_register_name_32(unsigned int reg)
@@ -162,9 +185,7 @@ const char *get_register_name_32(unsigned int reg)
 typedef struct model_features_t {
 uint32_t *guest_feat;
 uint32_t *host_feat;
-const char **flag_names;
-uint32_t cpuid;
-int reg;
+FeatureWord feat_word;
 } model_features_t;
 
 int check_cpuid = 0;
@@ -935,19 +956,19 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
 #endif /* CONFIG_KVM */
 }
 
-static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
+static int unavailable_host_feature(FeatureWordInfo *f, uint32_t mask)
 {
 int i;
 
 for (i = 0; i  32; ++i)
 if (1  i  mask) {
-const char *reg = get_register_name_32(f-reg);
+const char *reg = get_register_name_32(f-cpuid_reg);
 assert(reg);
 fprintf(stderr, warning: host doesn't support requested feature: 
 CPUID.%02XH:%s%s%s [bit %d]\n,
-f-cpuid, reg,
-f-flag_names[i] ? . : ,
-f-flag_names[i] ? f-flag_names[i] : , i);
+f-cpuid_eax, reg,
+f-feat_names[i] ? . : ,
+f-feat_names[i] ? f-feat_names[i] : , i);
 break;
 }
 return 0;
@@ -965,25 +986,29 @@ static int kvm_check_features_against_host(x86_def_t 
*guest_def)
 int rv, i;
 struct model_features_t ft[] = {
 {guest_def-features, host_def.features,
-feature_name, 0x0001, R_EDX},
+FEAT_1_EDX },
 {guest_def-ext_features, host_def.ext_features,
-ext_feature_name, 0x0001, R_ECX},
+FEAT_1_ECX },
 {guest_def-ext2_features, host_def.ext2_features,
-ext2_feature_name, 0x8001, R_EDX},
+FEAT_8000_0001_EDX },
 {guest_def-ext3_features, host_def.ext3_features,
-ext3_feature_name, 0x8001, R_ECX}
+FEAT_8000_0001_ECX },
 };
 
 assert(kvm_enabled());
 
 kvm_cpu_fill_host(host_def);
-for (rv = 0, i = 0; i  ARRAY_SIZE(ft); ++i)
-for (mask = 1; mask; mask = 1)
+for (rv = 0, i = 0; i  ARRAY_SIZE(ft); ++i) {
+FeatureWord w = ft[i].feat_word;
+FeatureWordInfo *wi = feature_word_info[w];
+for (mask = 1; mask; mask = 1) {
 if (*ft[i].guest_feat  mask 
 !(*ft[i].host_feat  mask)) {
-unavailable_host_feature(ft[i], mask);
-rv = 1;
-}
+ 

[PATCH qom-cpu 7/7] target-i386: check/enforce: Check all feature words

2013-01-07 Thread Eduardo Habkost
This adds the following feature words to the list of flags to be checked
by kvm_check_features_against_host():

 - cpuid_7_0_ebx_features
 - ext4_features
 - kvm_features
 - svm_features

This will ensure the enforce flag works as it should: it won't allow
QEMU to be started unless every flag that was requested by the user or
defined in the CPU model is supported by the host.

This patch may cause existing configurations where enforce wasn't
preventing QEMU from being started to abort QEMU. But that's exactly the
point of this patch: if a flag was not supported by the host and QEMU
wasn't aborting, it was a bug in the enforce code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Cc: Gleb Natapov g...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: libvir-l...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com

CCing libvirt people, as this is directly related to the planned usage
of the enforce flag by libvirt.

The libvirt team probably has a problem in their hands: libvirt should
use enforce to make sure all requested flags are making their way into
the guest (so the resulting CPU is always the same, on any host), but
users may have existing working configurations where a flag is not
supported by the guest and the user really doesn't care about it. Those
configurations will necessarily break when libvirt starts using
enforce.

One example where it may cause trouble for common setups: pc-1.3 wants
the kvm_pv_eoi flag enabled by default (so enforce will make sure it
is enabled), but the user may have an existing VM running on a host
without pv_eoi support. That setup is unsafe today because
live-migration between different host kernel versions may enable/disable
pv_eoi silently (that's why we need the enforce flag to be used by
libvirt), but the user probably would like to be able to live-migrate
that VM anyway (and have libvirt to just do the right thing).

One possible solution to libvirt is to use enforce only on newer
machine-types, so existing machines with older machine-types will keep
the unsafe host-dependent-ABI behavior, but at least would keep
live-migration working in case the user is careful.

I really don't know what the libvirt team prefers, but that's the
situation today. The longer we take to make enforce strict as it
should and make libvirt finally use it, more users will have VMs with
migration-unsafe unpredictable guest ABIs.

Changes v2:
 - Coding style fix

Changes v3:
 - Added ext4_feature_name array
---
 target-i386/cpu.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index a54c464..68cabcf 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -989,8 +989,9 @@ static int unavailable_host_feature(FeatureWordInfo *f, 
uint32_t mask)
 return 0;
 }
 
-/* best effort attempt to inform user requested cpu flags aren't making
- * their way to the guest.
+/* Check if all requested cpu flags are making their way to the guest
+ *
+ * Returns 0 if all flags are supported by the host, non-zero otherwise.
  *
  * This function may be called only if KVM is enabled.
  */
@@ -1008,6 +1009,14 @@ static int kvm_check_features_against_host(x86_def_t 
*guest_def)
 FEAT_8000_0001_EDX },
 {guest_def-ext3_features, host_def.ext3_features,
 FEAT_8000_0001_ECX },
+{guest_def-ext4_features, host_def.ext4_features,
+FEAT_C000_0001_EDX },
+{guest_def-cpuid_7_0_ebx_features, host_def.cpuid_7_0_ebx_features,
+FEAT_7_0_EBX },
+{guest_def-svm_features, host_def.svm_features,
+FEAT_SVM },
+{guest_def-kvm_features, host_def.kvm_features,
+FEAT_KVM },
 };
 
 assert(kvm_enabled());
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qom-cpu 4/7] target-i386/cpu: Introduce FeatureWord typedefs

2013-01-07 Thread Eduardo Habkost
This introduces a FeatureWord enum, FeatureWordInfo struct (with
generation information about a feature word), and a FeatureWordArray
typedef, and changes add_flagname_to_bitmaps() code and
cpu_x86_parse_featurestr() to use the new typedefs instead of separate
variables for each feature word.

This will help us keep the code at kvm_check_features_against_host(),
cpu_x86_parse_featurestr() and add_flagname_to_bitmaps() sane while
adding new feature name arrays.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 target-i386/cpu.c | 97 +++
 target-i386/cpu.h | 15 +
 2 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index b09b625..7d62d48 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -124,6 +124,20 @@ static const char *cpuid_7_0_ebx_feature_name[] = {
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 };
 
+typedef struct FeatureWordInfo {
+const char **feat_names;
+} FeatureWordInfo;
+
+static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
+[FEAT_1_EDX] = { .feat_names = feature_name },
+[FEAT_1_ECX] = { .feat_names = ext_feature_name },
+[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name },
+[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name },
+[FEAT_KVM]   = { .feat_names = kvm_feature_name },
+[FEAT_SVM]   = { .feat_names = svm_feature_name },
+[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name },
+};
+
 const char *get_register_name_32(unsigned int reg)
 {
 static const char *reg_names[CPU_NB_REGS32] = {
@@ -271,23 +285,20 @@ static bool lookup_feature(uint32_t *pval, const char *s, 
const char *e,
 return found;
 }
 
-static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features,
-uint32_t *ext_features,
-uint32_t *ext2_features,
-uint32_t *ext3_features,
-uint32_t *kvm_features,
-uint32_t *svm_features,
-uint32_t *cpuid_7_0_ebx_features)
+static void add_flagname_to_bitmaps(const char *flagname,
+FeatureWordArray words)
 {
-if (!lookup_feature(features, flagname, NULL, feature_name) 
-!lookup_feature(ext_features, flagname, NULL, ext_feature_name) 
-!lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) 
-!lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) 
-!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) 
-!lookup_feature(svm_features, flagname, NULL, svm_feature_name) 
-!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL,
-cpuid_7_0_ebx_feature_name))
-fprintf(stderr, CPU feature %s not found\n, flagname);
+FeatureWord w;
+for (w = 0; w  FEATURE_WORDS; w++) {
+FeatureWordInfo *wi = feature_word_info[w];
+if (wi-feat_names 
+lookup_feature(words[w], flagname, NULL, wi-feat_names)) {
+break;
+}
+}
+if (w == FEATURE_WORDS) {
+fprintf(stderr, CPU feature %s not found\n, flagname);
+}
 }
 
 typedef struct x86_def_t {
@@ -1256,35 +1267,23 @@ static int cpu_x86_parse_featurestr(x86_def_t 
*x86_cpu_def, char *features)
 unsigned int i;
 char *featurestr; /* Single 'key=value string being parsed */
 /* Features to be added */
-uint32_t plus_features = 0, plus_ext_features = 0;
-uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
-uint32_t plus_kvm_features = kvm_default_features, plus_svm_features = 0;
-uint32_t plus_7_0_ebx_features = 0;
+FeatureWordArray plus_features = {
+[FEAT_KVM] = kvm_default_features,
+};
 /* Features to be removed */
-uint32_t minus_features = 0, minus_ext_features = 0;
-uint32_t minus_ext2_features = 0, minus_ext3_features = 0;
-uint32_t minus_kvm_features = 0, minus_svm_features = 0;
-uint32_t minus_7_0_ebx_features = 0;
+FeatureWordArray minus_features = { 0 };
 uint32_t numvalue;
 
-add_flagname_to_bitmaps(hypervisor, plus_features,
-plus_ext_features, plus_ext2_features, plus_ext3_features,
-plus_kvm_features, plus_svm_features,  plus_7_0_ebx_features);
+add_flagname_to_bitmaps(hypervisor, plus_features);
 
 featurestr = features ? strtok(features, ,) : NULL;
 
 while (featurestr) {
 char *val;
 if (featurestr[0] == '+') {
-add_flagname_to_bitmaps(featurestr + 1, plus_features,
-plus_ext_features, plus_ext2_features,
-plus_ext3_features, plus_kvm_features,
-plus_svm_features, plus_7_0_ebx_features);
+add_flagname_to_bitmaps(featurestr + 1, plus_features);
 } else if 

[PATCH 0/2] Add support for ARMv8 CPUs to kvmtool

2013-01-07 Thread Will Deacon
Hello again,

These two patches add support for ARMv8 processors running an AArch64 instance
of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the
case of the latter, the guest page size may be either 64k or 4k.

This depends on the ARM updates series I just posted:

  https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html

Feedback welcome,

Will


Will Deacon (2):
  kvm tools: add support for ARMv8 processors
  kvm tools: arm: align guest memory buffer to maximum page size

 tools/kvm/Makefile |  14 +-
 tools/kvm/arm/aarch32/include/kvm/kvm-arch.h   |  20 +--
 .../kvm/arm/aarch32/include/kvm/kvm-config-arch.h  |   8 ++
 tools/kvm/arm/aarch64/cortex-a57.c |  95 
 tools/kvm/arm/aarch64/include/kvm/barrier.h|   8 ++
 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h   |  17 +++
 .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h  |  10 ++
 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h   |  13 ++
 tools/kvm/arm/aarch64/kvm-cpu.c| 160 +
 tools/kvm/arm/fdt.c|   2 +-
 tools/kvm/arm/include/arm-common/kvm-arch.h|  32 -
 .../include/{kvm = arm-common}/kvm-config-arch.h  |   8 +-
 tools/kvm/arm/kvm.c|  26 +++-
 13 files changed, 381 insertions(+), 32 deletions(-)
 create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h
 create mode 100644 tools/kvm/arm/aarch64/cortex-a57.c
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/barrier.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/arm/aarch64/kvm-cpu.c
 rename tools/kvm/arm/include/{kvm = arm-common}/kvm-config-arch.h (61%)

-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm tools: arm: align guest memory buffer to maximum page size

2013-01-07 Thread Will Deacon
If we're running a guest with a larger page size than the host,
interesting things start to happen when communicating via a virtio-mmio
device because the idea of buffer alignment between the guest and the
host will be off by the misalignment of the guest memory buffer allocated
by the host. This causes things like the index field of vring.used to
be accessed at different addresses on the guest and the host, leading
to deadlock.

Fix this problem by allocating guest memory aligned to the maximum
possible page size for the architecture (64K).

Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/arm/include/arm-common/kvm-arch.h | 10 ++
 tools/kvm/arm/kvm.c | 24 ++--
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/arm/include/arm-common/kvm-arch.h 
b/tools/kvm/arm/include/arm-common/kvm-arch.h
index 46ee7e2..7860e17 100644
--- a/tools/kvm/arm/include/arm-common/kvm-arch.h
+++ b/tools/kvm/arm/include/arm-common/kvm-arch.h
@@ -37,6 +37,16 @@ static inline bool arm_addr_in_pci_mmio_region(u64 phys_addr)
 }
 
 struct kvm_arch {
+   /*
+* We may have to align the guest memory for virtio, so keep the
+* original pointers here for munmap.
+*/
+   void*ram_alloc_start;
+   u64 ram_alloc_size;
+
+   /*
+* Guest addresses for memory layout.
+*/
u64 memory_guest_start;
u64 kern_guest_start;
u64 initrd_guest_start;
diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c
index 9eff927..1bcfce3 100644
--- a/tools/kvm/arm/kvm.c
+++ b/tools/kvm/arm/kvm.c
@@ -7,6 +7,7 @@
 
 #include linux/kernel.h
 #include linux/kvm.h
+#include linux/sizes.h
 
 struct kvm_ext kvm_req_ext[] = {
{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
@@ -41,7 +42,7 @@ void kvm__init_ram(struct kvm *kvm)
 
 void kvm__arch_delete_ram(struct kvm *kvm)
 {
-   munmap(kvm-ram_start, kvm-ram_size);
+   munmap(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size);
 }
 
 void kvm__arch_periodic_poll(struct kvm *kvm)
@@ -56,13 +57,24 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 
 void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 {
-   /* Allocate guest memory. */
+   /*
+* Allocate guest memory. We must align out buffer to 64K to
+* correlate with the maximum guest page size for virtio-mmio.
+*/
kvm-ram_size = min(ram_size, (u64)ARM_MAX_MEMORY(kvm));
-   kvm-ram_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path, 
kvm-ram_size);
-   if (kvm-ram_start == MAP_FAILED)
+   kvm-arch.ram_alloc_size = kvm-ram_size + SZ_64K;
+   kvm-arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path,
+   kvm-arch.ram_alloc_size);
+
+   if (kvm-arch.ram_alloc_start == MAP_FAILED)
die(Failed to map %lld bytes for guest memory (%d),
-   kvm-ram_size, errno);
-   madvise(kvm-ram_start, kvm-ram_size, MADV_MERGEABLE);
+   kvm-arch.ram_alloc_size, errno);
+
+   kvm-ram_start = (void *)ALIGN((unsigned long)kvm-arch.ram_alloc_start,
+   SZ_64K);
+
+   madvise(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size,
+   MADV_MERGEABLE);
 
/* Initialise the virtual GIC. */
if (gic__init_irqchip(kvm))
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm tools: add support for ARMv8 processors

2013-01-07 Thread Will Deacon
This patch adds support for ARMv8 processors (more specifically,
Cortex-A57) to kvmtool. Both AArch64 and AArch32 guests are supported,
so the existing AArch32 code is slightly restructured to allow for
re-use of much of the current code.

The implementation closely follows the ARMv7 code and reuses much of the
work written there.

Tested-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/Makefile |  14 +-
 tools/kvm/arm/aarch32/include/kvm/kvm-arch.h   |  20 +--
 .../kvm/arm/aarch32/include/kvm/kvm-config-arch.h  |   8 ++
 tools/kvm/arm/aarch64/cortex-a57.c |  95 
 tools/kvm/arm/aarch64/include/kvm/barrier.h|   8 ++
 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h   |  17 +++
 .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h  |  10 ++
 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h   |  13 ++
 tools/kvm/arm/aarch64/kvm-cpu.c| 160 +
 tools/kvm/arm/fdt.c|   2 +-
 tools/kvm/arm/include/arm-common/kvm-arch.h|  22 ++-
 .../include/{kvm = arm-common}/kvm-config-arch.h  |   8 +-
 tools/kvm/arm/kvm.c|   2 +-
 13 files changed, 353 insertions(+), 26 deletions(-)
 create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h
 create mode 100644 tools/kvm/arm/aarch64/cortex-a57.c
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/barrier.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h
 create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h
 create mode 100644 tools/kvm/arm/aarch64/kvm-cpu.c
 rename tools/kvm/arm/include/{kvm = arm-common}/kvm-config-arch.h (61%)

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 33aa4d8..0c59faa 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -103,7 +103,7 @@ OBJS+= virtio/mmio.o
 
 # Translate uname -m into ARCH string
 ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \
- -e s/armv7.*/arm/)
+ -e s/armv7.*/arm/ -e s/aarch64.*/arm64/)
 
 ifeq ($(ARCH),i386)
ARCH := x86
@@ -174,6 +174,18 @@ ifeq ($(ARCH), arm)
OTHEROBJS   += $(LIBFDT_OBJS)
 endif
 
+# ARM64
+ifeq ($(ARCH), arm64)
+   DEFINES += -DCONFIG_ARM64
+   OBJS+= $(OBJS_ARM_COMMON)
+   OBJS+= arm/aarch64/cortex-a57.o
+   OBJS+= arm/aarch64/kvm-cpu.o
+   ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
+   ARCH_INCLUDE+= -Iarm/aarch64/include
+   CFLAGS  += -I../../scripts/dtc/libfdt
+   OTHEROBJS   += $(LIBFDT_OBJS)
+endif
+
 ###
 
 ifeq (,$(ARCH_INCLUDE))
diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h 
b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
index ca79b24..1632e3c 100644
--- a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
+++ b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h
@@ -1,28 +1,12 @@
 #ifndef KVM__KVM_ARCH_H
 #define KVM__KVM_ARCH_H
 
-#include linux/const.h
-
-#define ARM_LOMAP_MMIO_AREA_AC(0x, UL)
-#define ARM_LOMAP_AXI_AREA _AC(0x4000, UL)
-#define ARM_LOMAP_MEMORY_AREA  _AC(0x8000, UL)
-#define ARM_LOMAP_MAX_MEMORY   _AC(0x7fff, UL)
-
 #define ARM_GIC_DIST_SIZE  0x1000
-#define ARM_GIC_DIST_BASE  (ARM_LOMAP_AXI_AREA - ARM_GIC_DIST_SIZE)
 #define ARM_GIC_CPUI_SIZE  0x2000
-#define ARM_GIC_CPUI_BASE  (ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE)
-
-#define ARM_KERN_OFFSET0x8000
-
-#define ARM_VIRTIO_MMIO_SIZE   (ARM_GIC_DIST_BASE - ARM_LOMAP_MMIO_AREA)
-#define ARM_PCI_MMIO_SIZE  (ARM_LOMAP_MEMORY_AREA - ARM_LOMAP_AXI_AREA)
 
-#define ARM_MEMORY_AREAARM_LOMAP_MEMORY_AREA
-#define ARM_MAX_MEMORY ARM_LOMAP_MAX_MEMORY
+#define ARM_KERN_OFFSET(...)   0x8000
 
-#define KVM_PCI_MMIO_AREA  ARM_LOMAP_AXI_AREA
-#define KVM_VIRTIO_MMIO_AREA   ARM_LOMAP_MMIO_AREA
+#define ARM_MAX_MEMORY(...)ARM_LOMAP_MAX_MEMORY
 
 #include arm-common/kvm-arch.h
 
diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h 
b/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h
new file mode 100644
index 000..acf0d23
--- /dev/null
+++ b/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h
@@ -0,0 +1,8 @@
+#ifndef KVM__KVM_CONFIG_ARCH_H
+#define KVM__KVM_CONFIG_ARCH_H
+
+#define ARM_OPT_ARCH_RUN(...)
+
+#include arm-common/kvm-config-arch.h
+
+#endif /* KVM__KVM_CONFIG_ARCH_H */
diff --git a/tools/kvm/arm/aarch64/cortex-a57.c 
b/tools/kvm/arm/aarch64/cortex-a57.c
new file mode 100644
index 000..4fd11ba
--- /dev/null
+++ b/tools/kvm/arm/aarch64/cortex-a57.c
@@ -0,0 +1,95 @@
+#include kvm/fdt.h
+#include kvm/kvm.h
+#include kvm/kvm-cpu.h
+#include kvm/util.h
+
+#include arm-common/gic.h
+
+#include linux/byteorder.h
+#include linux/types.h
+
+#define CPU_NAME_MAX_LEN 8
+static void generate_cpu_nodes(void *fdt, struct 

[PATCH 3/4] KVM: PPC: BookE: Implement EPR exit

2013-01-07 Thread Alexander Graf
The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rework update_epr logic
  - add documentation for ENABLE_CAP on EPR cap

v2 - v3:

  - remove leftover 'allowed==2' logic
---
 Documentation/virtual/kvm/api.txt   |   40 +-
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |9 +++
 arch/powerpc/kvm/booke.c|   14 +++-
 arch/powerpc/kvm/powerpc.c  |   10 
 include/linux/kvm_host.h|1 +
 include/uapi/linux/kvm.h|6 +
 7 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9cf591d..66bf7cf 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2238,8 +2238,8 @@ executed a memory-mapped I/O instruction which could not 
be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
-NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR
-  and KVM_EXIT_PAPR the corresponding
+NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
+  KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
 operations are complete (and guest state is consistent) only after userspace
 has re-entered the kernel with KVM_RUN.  The kernel side will first finish
 incomplete operations and then check for pending signals.  Userspace
@@ -2342,6 +2342,25 @@ The possible hypercalls are defined in the Power 
Architecture Platform
 Requirements (PAPR) document available from www.power.org (free
 developer registration required to access it).
 
+   /* KVM_EXIT_EPR */
+   struct {
+   __u32 epr;
+   } epr;
+
+On FSL BookE PowerPC chips, the interrupt controller has a fast patch
+interrupt acknowledge path to the core. When the core successfully
+delivers an interrupt, it automatically populates the EPR register with
+the interrupt vector number and acknowledges the interrupt inside
+the interrupt controller.
+
+In case the interrupt controller lives in user space, we need to do
+the interrupt acknowledge cycle through it to fetch the next to be
+delivered interrupt vector using this exit.
+
+It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
+external interrupt has just been delivered into the guest. User space
+should put the acknowledged interrupt vector into the 'epr' field.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -2463,3 +2482,20 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and 
KVM_MMU_FSL_BOOKE_HV:
where num_sets is the tlb_sizes[] value divided by the tlb_ways[] value.
  - The tsize field of mas1 shall be set to 4K on TLB0, even though the
hardware ignores this value for TLB0.
+
+6.4 KVM_CAP_PPC_EPR
+
+Architectures: ppc
+Parameters: args[0] defines whether the proxy facility is active
+Returns: 0 on success; -1 on error
+
+This capability enables or disables the delivery of interrupts through the
+external proxy facility.
+
+When enabled (args[0] != 0), every time the guest gets an external interrupt
+delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
+to receive the topmost interrupt vector.
+
+When disabled (args[0] == 0), behavior is as if this facility is unsupported.
+
+When this capability is enabled, KVM_EXIT_EPR can occur.
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ab49c6c..8a72d59 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -520,6 +520,8 @@ struct kvm_vcpu_arch {
u8 sane;
u8 cpu_type;
u8 hcall_needed;
+   u8 epr_enabled;
+   u8 epr_needed;
 
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 5f5f69a..493630e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -264,6 +264,15 @@ static inline void kvm_linear_init(void)
 {}
 #endif
 
+static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr)
+{
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GEPR, epr);
+#elif defined(CONFIG_BOOKE)
+   vcpu-arch.epr = epr;
+#endif
+}
+
 int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
 

[PATCH 0/4] KVM: PPC: BookE: Add EPR user space support v3

2013-01-07 Thread Alexander Graf
The FSL MPIC implementation contains a feature called external proxy
facility which allows for interrupts to be acknowledged in the MPIC
as soon as a core accepts its pending external interrupt.

This patch set implements all the necessary pieces to support this
from the kernel space side.

v1 - v2:

  - do an explicit requests check rather than play with return values
  - rework update_epr logic
  - add documentation for ENABLE_CAP on EPR cap

v2 - v3:

  - remove leftover 'allowed==2' logic

Alexander Graf (3):
  KVM: PPC: BookE: Emulate mfspr on EPR
  KVM: PPC: BookE: Implement EPR exit
  KVM: PPC: BookE: Add EPR ONE_REG sync

Mihai Caraman (1):
  KVM: PPC: BookE: Allow irq deliveries to inject requests

 Documentation/virtual/kvm/api.txt   |   41 +-
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |9 +++
 arch/powerpc/include/uapi/asm/kvm.h |6 -
 arch/powerpc/kvm/booke.c|   40 +-
 arch/powerpc/kvm/booke_emulate.c|3 ++
 arch/powerpc/kvm/powerpc.c  |   10 
 include/linux/kvm_host.h|1 +
 include/uapi/linux/kvm.h|6 +
 9 files changed, 114 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: PPC: BookE: Emulate mfspr on EPR

2013-01-07 Thread Alexander Graf
The EPR register is potentially valid for PR KVM as well, so we need
to emulate accesses to it. It's only defined for reading, so only
handle the mfspr case.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke_emulate.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index 4685b8c..27a4b28 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -269,6 +269,9 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, ulong *spr_val)
case SPRN_ESR:
*spr_val = vcpu-arch.shared-esr;
break;
+   case SPRN_EPR:
+   *spr_val = vcpu-arch.epr;
+   break;
case SPRN_CSRR0:
*spr_val = vcpu-arch.csrr0;
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: PPC: BookE: Allow irq deliveries to inject requests

2013-01-07 Thread Alexander Graf
From: Mihai Caraman mihai.cara...@freescale.com

When injecting an interrupt into guest context, we usually don't need
to check for requests anymore. At least not until today.

With the introduction of EPR, we will have to create a request when the
guest has successfully accepted an external interrupt though.

So we need to prepare the interrupt delivery to abort guest entry
gracefully. Otherwise we'd delay the EPR request.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - do an explicit requests check rather than play with return values
---
 arch/powerpc/kvm/booke.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 69f1140..964f447 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -581,6 +581,11 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 
kvmppc_core_check_exceptions(vcpu);
 
+   if (vcpu-requests) {
+   /* Exception delivery raised request; start over */
+   return 1;
+   }
+
if (vcpu-arch.shared-msr  MSR_WE) {
local_irq_enable();
kvm_vcpu_block(vcpu);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: PPC: BookE: Add EPR ONE_REG sync

2013-01-07 Thread Alexander Graf
We need to be able to read and write the contents of the EPR register
from user space.

This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.

Signed-off-by: Alexander Graf ag...@suse.de
---
 Documentation/virtual/kvm/api.txt   |1 +
 arch/powerpc/include/uapi/asm/kvm.h |6 +-
 arch/powerpc/kvm/booke.c|   21 +
 3 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 66bf7cf..6601973 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1774,6 +1774,7 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_VPA_SLB   | 128
   PPC   | KVM_REG_PPC_VPA_DTL   | 128
   PPC   | KVM_REG_PPC_EPCR | 32
+  PPC   | KVM_REG_PPC_EPR  | 32
 
 4.69 KVM_GET_ONE_REG
 
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 2fba8a6..16064d0 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -114,7 +114,10 @@ struct kvm_regs {
 /* Embedded Floating Point (SPE) -- IVOR32-34 if KVM_SREGS_E_IVOR */
 #define KVM_SREGS_E_SPE(1  9)
 
-/* External Proxy (EXP) -- EPR */
+/*
+ * DEPRECATED! USE ONE_REG FOR THIS ONE!
+ * External Proxy (EXP) -- EPR
+ */
 #define KVM_SREGS_EXP  (1  10)
 
 /* External PID (E.PD) -- EPSC/EPLC */
@@ -412,5 +415,6 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_VPA_DTL(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x84)
 
 #define KVM_REG_PPC_EPCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x85)
+#define KVM_REG_PPC_EPR(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x86)
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 940ec80..8779cd4 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -300,6 +300,15 @@ static void set_guest_esr(struct kvm_vcpu *vcpu, u32 esr)
 #endif
 }
 
+static unsigned long get_guest_epr(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_KVM_BOOKE_HV
+   return mfspr(SPRN_GEPR);
+#else
+   return vcpu-arch.epr;
+#endif
+}
+
 /* Deliver the interrupt of the corresponding priority, if possible. */
 static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 unsigned int priority)
@@ -1405,6 +1414,11 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 vcpu-arch.dbg_reg.dac[dac], sizeof(u64));
break;
}
+   case KVM_REG_PPC_EPR: {
+   u32 epr = get_guest_epr(vcpu);
+   r = put_user(epr, (u32 __user *)(long)reg-addr);
+   break;
+   }
 #if defined(CONFIG_64BIT)
case KVM_REG_PPC_EPCR:
r = put_user(vcpu-arch.epcr, (u32 __user *)(long)reg-addr);
@@ -1437,6 +1451,13 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 (u64 __user *)(long)reg-addr, sizeof(u64));
break;
}
+   case KVM_REG_PPC_EPR: {
+   u32 new_epr;
+   r = get_user(new_epr, (u32 __user *)(long)reg-addr);
+   if (!r)
+   kvmppc_set_epr(vcpu, new_epr);
+   break;
+   }
 #if defined(CONFIG_64BIT)
case KVM_REG_PPC_EPCR: {
u32 new_epcr;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] KVM: Write protect the updated slot only when we start dirty logging

2013-01-07 Thread Marcelo Tosatti
On Tue, Dec 18, 2012 at 04:26:47PM +0900, Takuya Yoshikawa wrote:
 This is needed to make kvm_mmu_slot_remove_write_access() rmap based:
 otherwise we may end up using invalid rmap's.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp

Why? memslot-arch.rmap[] has been properly allocated at this point.

 ---
  arch/x86/kvm/x86.c  |9 -
  virt/kvm/kvm_main.c |1 -
  2 files changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 1c9c834..9451efa 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6897,7 +6897,14 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
   spin_lock(kvm-mmu_lock);
   if (nr_mmu_pages)
   kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
 - kvm_mmu_slot_remove_write_access(kvm, mem-slot);
 + /*
 +  * Write protect all pages for dirty logging.
 +  * Existing largepage mappings are destroyed here and new ones will
 +  * not be created until the end of the logging.
 +  */
 + if ((mem-flags  KVM_MEM_LOG_DIRTY_PAGES) 
 + !(old.flags  KVM_MEM_LOG_DIRTY_PAGES))
 + kvm_mmu_slot_remove_write_access(kvm, mem-slot);
   spin_unlock(kvm-mmu_lock);
   /*
* If memory slot is created, or moved, we need to clear all
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index bd31096..0ef5daa 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -805,7 +805,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
   if ((new.flags  KVM_MEM_LOG_DIRTY_PAGES)  !new.dirty_bitmap) {
   if (kvm_create_dirty_bitmap(new)  0)
   goto out_free;
 - /* destroy any largepage mappings for dirty tracking */
   }
  
   if (!npages || base_gfn != old.base_gfn) {
 -- 
 1.7.5.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging

2013-01-07 Thread Marcelo Tosatti
On Tue, Dec 18, 2012 at 04:25:58PM +0900, Takuya Yoshikawa wrote:
 This patch set makes kvm_mmu_slot_remove_write_access() rmap based and
 adds conditional rescheduling to it.
 
 The motivation for this change is of course to reduce the mmu_lock hold
 time when we start dirty logging for a large memory slot.  You may not
 see the problem if you just give 8GB or less of the memory to the guest
 with THP enabled on the host -- this is for the worst case.

Neat.

Looks good, except patch 1 - 

a) don't understand why it is necessary and 
b) not confident its safe - isnt clearing necessary for KVM_SET_MEMORY
instances other than

!(old.flags  LOG_DIRTY)  (new.flags  LOG_DIRTY)

 
 
 IMPORTANT NOTE (not about this patch set):
 
 I have hit the following bug many times with the current next branch,
 even WITHOUT my patches.  Although I do not know a way to reproduce this
 yet, it seems that something was broken around slot-dirty_bitmap.  I am
 now investigating the new code in __kvm_set_memory_region().
 
 The bug:
 [  575.238063] BUG: unable to handle kernel paging request at 0002efe83a77
 [  575.238185] IP: [a05f9619] mark_page_dirty_in_slot+0x19/0x20 
 [kvm]
 [  575.238308] PGD 0 
 [  575.238343] Oops: 0002 [#1] SMP 
 
 The call trace:
 [  575.241207] Call Trace:
 [  575.241257]  [a05f96b1] kvm_write_guest_cached+0x91/0xb0 [kvm]
 [  575.241370]  [a0610db9] kvm_arch_vcpu_ioctl_run+0x1109/0x12c0 
 [kvm]
 [  575.241488]  [a060fd55] ? kvm_arch_vcpu_ioctl_run+0xa5/0x12c0 
 [kvm]
 [  575.241595]  [81679194] ? mutex_lock_killable_nested+0x274/0x340
 [  575.241706]  [a05faf80] ? kvm_set_ioapic_irq+0x20/0x20 [kvm]
 [  575.241813]  [a05f71c9] kvm_vcpu_ioctl+0x559/0x670 [kvm]
 [  575.241913]  [a05f8a58] ? kvm_vm_ioctl+0x1b8/0x570 [kvm]
 [  575.242007]  [8101b9d3] ? native_sched_clock+0x13/0x80
 [  575.242125]  [8101ba49] ? sched_clock+0x9/0x10
 [  575.242208]  [8109015d] ? sched_clock_cpu+0xbd/0x110
 [  575.242298]  [811a914c] ? fget_light+0x3c/0x140
 [  575.242381]  [8119dfa8] do_vfs_ioctl+0x98/0x570
 [  575.242463]  [811a91b1] ? fget_light+0xa1/0x140
 [  575.246393]  [811a914c] ? fget_light+0x3c/0x140
 [  575.250363]  [8119e511] sys_ioctl+0x91/0xb0
 [  575.254327]  [81684c19] system_call_fastpath+0x16/0x1b
 
 
 Takuya Yoshikawa (7):
   KVM: Write protect the updated slot only when we start dirty logging
   KVM: MMU: Remove unused parameter level from __rmap_write_protect()
   KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based
   KVM: x86: Remove unused slot_bitmap from kvm_mmu_page
   KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself
   KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself
   KVM: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes 
 a long time
 
  Documentation/virtual/kvm/mmu.txt |7 
  arch/x86/include/asm/kvm_host.h   |5 ---
  arch/x86/kvm/mmu.c|   56 +++-
  arch/x86/kvm/x86.c|   13 +---
  virt/kvm/kvm_main.c   |1 -
  5 files changed, 38 insertions(+), 44 deletions(-)
 
 -- 
 1.7.5.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/5] KVM: x86: improve reexecute_instruction

2013-01-07 Thread Marcelo Tosatti
On Sat, Jan 05, 2013 at 04:16:37PM +0800, Xiao Guangrong wrote:
 On 01/05/2013 06:44 AM, Marcelo Tosatti wrote:
 
  index b0a3678..44c6992 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -4756,15 +4756,8 @@ static int handle_emulation_failure(struct kvm_vcpu 
  *vcpu)
   static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long 
  cr2)
   {
 gpa_t gpa = cr2;
  +  gfn_t gfn;
 pfn_t pfn;
  -  unsigned int indirect_shadow_pages;
  -
  -  spin_lock(vcpu-kvm-mmu_lock);
  -  indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages;
  -  spin_unlock(vcpu-kvm-mmu_lock);
  -
  -  if (!indirect_shadow_pages)
  -  return false;
  
  This renders the previous patch obsolete, pretty much (please fold).
 
 Will try.
 
  
 if (!vcpu-arch.mmu.direct_map) {
 /*
  @@ -4781,13 +4774,7 @@ static bool reexecute_instruction(struct kvm_vcpu 
  *vcpu, unsigned long cr2)
 return true;
 }
 
  -  /*
  -   * if emulation was due to access to shadowed page table
  -   * and it failed try to unshadow page and re-enter the
  -   * guest to let CPU execute the instruction.
  -   */
  -  if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)))
  -  return true;
  +  gfn = gpa_to_gfn(gpa);
 
 /*
  * Do not retry the unhandleable instruction if it faults on the
  @@ -4795,13 +4782,38 @@ static bool reexecute_instruction(struct kvm_vcpu 
  *vcpu, unsigned long cr2)
  * retry instruction - write #PF - emulation fail - retry
  * instruction - ...
  */
  -  pfn = gfn_to_pfn(vcpu-kvm, gpa_to_gfn(gpa));
  -  if (!is_error_noslot_pfn(pfn)) {
  -  kvm_release_pfn_clean(pfn);
  +  pfn = gfn_to_pfn(vcpu-kvm, gfn);
  +
  +  /*
  +   * If the instruction failed on the error pfn, it can not be fixed,
  +   * report the error to userspace.
  +   */
  +  if (is_error_noslot_pfn(pfn))
  +  return false;
  +
  +  kvm_release_pfn_clean(pfn);
  +
  +  /* The instructions are well-emulated on direct mmu. */
  +  if (vcpu-arch.mmu.direct_map) {
  
  !direct_map?
 
 No. This logic is, if it is direct mmu, we just unprotect the page shadowed by
 nested mmu, then let guest retry the instruction, no need to detect 
 unhandlable
 instruction.
 
  
  +  unsigned int indirect_shadow_pages;
  +
  +  spin_lock(vcpu-kvm-mmu_lock);
  +  indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages;
  +  spin_unlock(vcpu-kvm-mmu_lock);
  +
  +  if (indirect_shadow_pages)
  +  kvm_mmu_unprotect_page(vcpu-kvm, gfn);
  +
 return true;
 }
 
  -  return false;
  +  kvm_mmu_unprotect_page(vcpu-kvm, gfn);
  +
  +  /* If the target gfn is used as page table, the fault can
  +   * not be avoided by unprotecting shadow page and it will
  +   * be reported to userspace.
  +   */
  +  return !vcpu-arch.target_gfn_is_pt;
   }
  
  The idea was
  
  How about recording the gfn number for shadow pages that have been
  shadowed in the current pagefault run? (which is cheap, compared to
  shadowing these pages).
  
  If failed instruction emulation is write to one of these gfns, then
  fail.
 
 If i understood correctly, i do not think it is simpler than the way in this
 patch.
 
 There is the change to apply the idea:
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c431b33..2163de8 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -502,6 +502,8 @@ struct kvm_vcpu_arch {
   u64 msr_val;
   struct gfn_to_hva_cache data;
   } pv_eoi;
 +
 + gfn_t pt_gfns[4];
  };
 
  struct kvm_lpage_info {
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 0453fa0..ac4210f 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -523,6 +523,18 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
   return false;
  }
 
 +static void FNAME(cache_pt_gfns)(struct kvm_vcpu *vcpu, struct guest_walker 
 *walker)
 +{
 + int level;
 +
 + /* Reset all gfns to -1, then we can detect the levels which is not 
 used in guest. */
 + for (level = 0; level  4; level++)
 + vcpu-arch.pt_gfns[level] = (gfn_t)(-1);
 +
 + for (level = walker-level; level = walker-max_level; level++)
 + vcpu-arch.pt_gfns[level - 1] = walker-table_gfn[level - 1];
 +}
 +
  /*
   * Page fault handler.  There are several causes for a page fault:
   *   - there is no shadow pte for the guest pte
 @@ -576,6 +588,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
 addr, u32 error_code,
   return 0;
   }
 
 +  FNAME(cache_pt_gfns)(vcpu, walker);
 +
   if (walker.level = PT_DIRECTORY_LEVEL)
   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
  || FNAME(is_self_change_mapping)(vcpu, walker, user_fault);
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 

Re: [RFC PATCH 0/4] MSI affinity for assigned devices

2013-01-07 Thread Alex Williamson
On Mon, 2013-01-07 at 20:14 +, Krishna J wrote:
 Hi Alex,
  MSI routing updates aren't currently handled by pci-assign or
  vfio-pci (when using KVM acceleration), which means that trying to
  set interrupt SMP affinity in the guest has no effect unless MSI is
  completely disabled and re-enabled.  This series fixes this for both
  device assignment backends using similar schemes.  We store the last
  MSIMessage programmed to KVM and do updates to the MSI route when it
  changes.  pci-assign takes a little bit of refactoring to make this
  happen cleanly.  Thanks,
 
 I am using the MSI affinity for assigned devices patch 1 to 4. I have
 setup the guest such that VCPU0 is pinned to PCPU1, VCPU1 is pinned to
 PCPU2, VCPU2 is pinned to PCPU3 and VCPU3 is pinned to PCPU4. I do
 this by taskset after the guest boots. I then start generating
 interrupts affined to VCPU3. I see all the interrupts directly
 delivered to VCPU 3. Now i do the same test but interrupt affined to
 VCPU 2. Although the interrupts are delivered to VCPU2  there are lot
 of Rescheduling interrupts in VCPU 3. I have checked the
 smp_affinity and it is updated to VCPU 2. 
 Wanted to know your feedback on this usecase and what might be the
 impact. 
CPU0   CPU1 CPU2   CPU3
   0:211  0  0  0   IO-APIC-edge  timer
   4: 60940  0  0  0   IO-APIC-edge  serial
   8: 65  0  0  0   IO-APIC-edge  rtc0
   9:  0  0  0  0   IO-APIC-fasteoi   acpi
  40:  0  0  0  0   PCI-MSI-edge  
 virtio1-config
  41:  1910  0  0  0   PCI-MSI-edge  
 virtio1-requests
  42:  0  0  0  0   PCI-MSI-edge  
 virtio0-config
  43:127  0  0  0   PCI-MSI-edge  
 virtio0-input
  44:  1  0  0  0   PCI-MSI-edge  
 virtio0-output
  45:  1  0   3377  11194   PCI-MSI-edge  FPGA_DEV
 NMI:  0  0  0  0   Non-maskable interrupts
 LOC: 225880 231572 223670 223612   Local timer interrupts
 SPU:  0  0  0  0   Spurious interrupts
 PMI:  0  0  0  0   Performance monitoring 
 interrupts
 IWI:  0  0  0  0   IRQ work interrupts
 RTR:  0  0  0  0   APIC ICR read retries
 RES: 14 20 21   3398   Rescheduling 
 interrupts--- Many RES Interrtups!!
 CAL:  0 14 14 16   Function call interrupts
 TLB:  0  0  0  0   TLB shootdowns

I don't know, but I'll fix the line wrap for anyone else that wants to
have a look.  The count looks roughly similar to the number of
interrupts to VCPU2.  Is your application somehow tied to VCPU3?
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: VMX: fix incorrect cached cpl value with real/v8086 modes (v3)

2013-01-07 Thread Marcelo Tosatti

CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.

Using values other than those can cause failures on operations that
check CPL.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 55dfc37..dd2a85c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1696,7 +1696,6 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
__set_bit(VCPU_EXREG_RFLAGS, (ulong *)vcpu-arch.regs_avail);
-   __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail);
to_vmx(vcpu)-rflags = rflags;
if (to_vmx(vcpu)-rmode.vm86_active) {
to_vmx(vcpu)-rmode.save_rflags = rflags;
@@ -3110,7 +3109,6 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
vmcs_writel(CR0_READ_SHADOW, cr0);
vmcs_writel(GUEST_CR0, hw_cr0);
vcpu-arch.cr0 = cr0;
-   __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail);
 }
 
 static u64 construct_eptp(unsigned long root_hpa)
@@ -3220,8 +3218,10 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, 
int seg)
return vmx_read_guest_seg_base(to_vmx(vcpu), seg);
 }
 
-static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
+static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
if (!is_protmode(vcpu))
return 0;
 
@@ -3229,13 +3229,6 @@ static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
 (kvm_get_rflags(vcpu)  X86_EFLAGS_VM)) /* if virtual 8086 */
return 3;
 
-   return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS)  3;
-}
-
-static int vmx_get_cpl(struct kvm_vcpu *vcpu)
-{
-   struct vcpu_vmx *vmx = to_vmx(vcpu);
-
/*
 * If we enter real mode with cs.sel  3 != 0, the normal CPL 
calculations
 * fail; use the cache instead.
@@ -3246,7 +3239,7 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 
if (!test_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail)) {
__set_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail);
-   vmx-cpl = __vmx_get_cpl(vcpu);
+   vmx-cpl = vmx_read_guest_seg_selector(vmx, VCPU_SREG_CS)  3;
}
 
return vmx-cpl;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-07 Thread Marcelo Tosatti
On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
  ioapic_write (or any other ioapic update)
  lock()
  perform update
  make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
  unlock()
  
  (*) Similarly to TLB flush.
  
  The advantage is that all work becomes vcpu local. The end result
  is much simpler code.
 What complexity will it remove?

Synchronization between multiple CPUs (except the KVM_REQ_ bit
processing, which is infrastructure shared by other parts of KVM).

We agreed that performance is non issue here.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: mmu: remove unused trace event

2013-01-07 Thread Marcelo Tosatti
On Tue, Dec 25, 2012 at 02:34:06PM +0200, Gleb Natapov wrote:
 trace_kvm_mmu_delay_free_pages() is no longer used.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/7] s390: Host support for channel I/O.

2013-01-07 Thread Marcelo Tosatti
On Thu, Dec 20, 2012 at 03:32:05PM +0100, Cornelia Huck wrote:
 Hi,
 
 here's the next iteration of the host patches to support channel
 I/O against kvm/next.
 
 Changes from v4 are on the style side; mainly using defines instead
 of magic numbers and using helper functions for decoding instructions.
 
 Patches 1 and 2 are new (and can be applied independently of the
 channel I/O patches); some things Alex pointed out in the patches
 apply to existing code as well.
 
 Please consider for kvm/next.
 
 Cornelia Huck (7):
   KVM: s390: Constify intercept handler tables.
   KVM: s390: Decoding helper functions.
   KVM: s390: Support for I/O interrupts.
   KVM: s390: Add support for machine checks.
   KVM: s390: In-kernel handling of I/O instructions.
   KVM: s390: Base infrastructure for enabling capabilities.
   KVM: s390: Add support for channel I/O instructions.

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vfio-pci: [NOT FOR COMMIT] Add support for legacy MMIO I/O port towards VGA support

2013-01-07 Thread Alex Williamson
Create two new legacy regions, one for MMIO space below 1MB and
another for 64k of I/O port space.  For devices of PCI class VGA
these ranges will be exposed and allow direct access to the device
at the PCI defined VGA addresses, 0xa, 0x3b0, 0x3c0.  VFIO
makes use of the host VGA arbiter to manage host chipset config
to route each access to the correct device.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   74 ---
 drivers/vfio/pci/vfio_pci_private.h |6 +
 drivers/vfio/pci/vfio_pci_rdwr.c|  170 +++
 include/uapi/linux/vfio.h   |3 +
 4 files changed, 197 insertions(+), 56 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b28e66c..8a09c33 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -223,9 +223,14 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev-reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
-   info.num_regions = VFIO_PCI_NUM_REGIONS;
+   info.num_regions = VFIO_PCI_CONFIG_REGION_INDEX + 1;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
+   if ((vdev-pdev-class  8) == PCI_CLASS_DISPLAY_VGA) {
+   info.flags |= VFIO_DEVICE_FLAGS_VGA;
+   info.num_regions += 2;
+   }
+
return copy_to_user((void __user *)arg, info, minsz);
 
} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
@@ -285,6 +290,26 @@ static long vfio_pci_ioctl(void *device_data,
info.flags = VFIO_REGION_INFO_FLAG_READ;
break;
}
+   case VFIO_PCI_LEGACY_MMIO_REGION_INDEX:
+   if ((pdev-class  8) != PCI_CLASS_DISPLAY_VGA)
+   return -EINVAL;
+
+   info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+   info.size = 1024 * 1024;
+   info.flags = VFIO_REGION_INFO_FLAG_READ |
+VFIO_REGION_INFO_FLAG_WRITE;
+
+   break;
+   case VFIO_PCI_LEGACY_IOPORT_REGION_INDEX:
+   if ((pdev-class  8) != PCI_CLASS_DISPLAY_VGA)
+   return -EINVAL;
+
+   info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+   info.size = 64 * 1024;
+   info.flags = VFIO_REGION_INFO_FLAG_READ |
+VFIO_REGION_INFO_FLAG_WRITE;
+
+   break;
default:
return -EINVAL;
}
@@ -376,14 +401,25 @@ static ssize_t vfio_pci_read(void *device_data, char 
__user *buf,
if (index = VFIO_PCI_NUM_REGIONS)
return -EINVAL;
 
-   if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+   switch (index) {
+   case VFIO_PCI_CONFIG_REGION_INDEX:
return vfio_pci_config_readwrite(vdev, buf, count, ppos, false);
-   else if (index == VFIO_PCI_ROM_REGION_INDEX)
-   return vfio_pci_mem_readwrite(vdev, buf, count, ppos, false);
-   else if (pci_resource_flags(pdev, index)  IORESOURCE_IO)
-   return vfio_pci_io_readwrite(vdev, buf, count, ppos, false);
-   else if (pci_resource_flags(pdev, index)  IORESOURCE_MEM)
+   case VFIO_PCI_ROM_REGION_INDEX:
return vfio_pci_mem_readwrite(vdev, buf, count, ppos, false);
+   case VFIO_PCI_LEGACY_MMIO_REGION_INDEX:
+   return vfio_pci_legacy_mem_readwrite(vdev, buf, count,
+ppos, false);
+   case VFIO_PCI_LEGACY_IOPORT_REGION_INDEX:
+   return vfio_pci_legacy_io_readwrite(vdev, buf, count,
+   ppos, false);
+   default:
+   if (pci_resource_flags(pdev, index)  IORESOURCE_IO)
+   return vfio_pci_io_readwrite(vdev, buf, count,
+ppos, false);
+   if (pci_resource_flags(pdev, index)  IORESOURCE_MEM)
+   return vfio_pci_mem_readwrite(vdev, buf, count,
+ ppos, false);
+   }
 
return -EINVAL;
 }
@@ -398,17 +434,25 @@ static ssize_t vfio_pci_write(void *device_data, const 
char __user *buf,
if (index = VFIO_PCI_NUM_REGIONS)
return -EINVAL;
 
-   if (index == VFIO_PCI_CONFIG_REGION_INDEX)
+   switch (index) {
+   case VFIO_PCI_CONFIG_REGION_INDEX:
return vfio_pci_config_readwrite(vdev, (char __user *)buf,
 count, ppos, true);
-   else if (index == VFIO_PCI_ROM_REGION_INDEX)
+   case VFIO_PCI_ROM_REGION_INDEX:
return -EINVAL;
-   else if 

[PATCH 0/1] vfio-pci: Towards VGA support

2013-01-07 Thread Alex Williamson
vfio makes a nice interface to start looking at supporting VGA devices
assigned to virtual machines (ie. userspace drivers) because we can so
easily add additional ranges for a device.  In this patch we add
legacy MMIO (below 1MB) and I/O port (64k) to devices with PCI class
code VGA.  We can then use the kernel VGA arbiter service to change
chipset routing for each access to the VGA ranges defined in the PCI
spec.  The rest of the region space not used by VGA is left
inaccessible until we add future feature that needs some other
legacy range.

There's also a qemu userspace companion series to this which learns
how to look for this new feature flag and setup ranges.  Together
they get a step closer to supporting vfio-based VGA assignment, but
it doesn't yet work.  I'm posting in this broken state both for
archival purposes as well as the hope that someone has ideas of what
might be missing or be able to pick up and run with this code.

Some cards are able to get through execution of their VGA BIOS with
these patches, but none that I've seen sync the monitor to VGA text
mode from seabios.  With a hack in qemu for a card specific backdoor
on a Radeon HD5450 I've been able to get syslinux graphics mode to
work and Windows will use it during normal bootup.  I have no idea
what might be missing for VGA text mode.  Thanks,

Alex

---

Alex Williamson (1):
  vfio-pci: [NOT FOR COMMIT] Add support for legacy MMIO  I/O port towards 
VGA support


 drivers/vfio/pci/vfio_pci.c |   74 ---
 drivers/vfio/pci/vfio_pci_private.h |6 +
 drivers/vfio/pci/vfio_pci_rdwr.c|  170 +++
 include/uapi/linux/vfio.h   |3 +
 4 files changed, 197 insertions(+), 56 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Towards vfio-base VGA device assignment

2013-01-07 Thread Alex Williamson
This is the companion series to the vfio-pci kernel series for VGA
support.  Combined, these don't do as much as you'd hope.  Patch 1
is simpy a header update for the matching vfio kernel changes.
Patch 2 is the meat of the changes, enabling vfio-pci to claim access
to VGA ranges and pass them through to the kernel.  The last patch
is a hard coded hack specific to my system and only known to be
needed on a Radeon HD5450 where there seems to be a backdoor for the
VGA BIOS to find the physical address of the device (physical devices
is at 0x4000, virtual device is at 0xc000).

With this and the kernel patch, some devices are able to get through
VGA bios execution.  The HD5450 can even sync the monitor and show the
correct thing on the screen if you run something that uses VGA graphic
mode.  Seabios seems to think VBE works, but for some reason VGA text
mode doesn't work, the monitor turns off.  So, like the kernel side,
I'm posting these for archival purposes and with hopes that someone
may have some ideas on what's still missing.  Thanks,

Alex

---

Alex Williamson (3):
  qemu: [NOT FOR COMMIT] Update linux headers for vfio VGA
  vfio-pci: [NOT FOR COMMIT] Add support for VGA MMIO and I/O port access
  vfio-pci: [NOT FOR COMMIT] Hack around HD5450 I/O port backdoor


 hw/vfio_pci.c|  182 ++
 linux-headers/asm-powerpc/kvm.h  |   86 
 linux-headers/asm-powerpc/kvm_para.h |   13 +-
 linux-headers/linux/kvm.h|   21 +++-
 linux-headers/linux/kvm_para.h   |6 +
 linux-headers/linux/vfio.h   |9 +-
 linux-headers/linux/virtio_config.h  |6 +
 linux-headers/linux/virtio_ring.h|6 +
 8 files changed, 305 insertions(+), 24 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] qemu: [NOT FOR COMMIT] Update linux headers for vfio VGA

2013-01-07 Thread Alex Williamson
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 linux-headers/asm-powerpc/kvm.h  |   86 ++
 linux-headers/asm-powerpc/kvm_para.h |   13 +++--
 linux-headers/linux/kvm.h|   21 ++--
 linux-headers/linux/kvm_para.h   |6 +-
 linux-headers/linux/vfio.h   |9 ++--
 linux-headers/linux/virtio_config.h  |6 +-
 linux-headers/linux/virtio_ring.h|6 +-
 7 files changed, 124 insertions(+), 23 deletions(-)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 1bea4d8..2fba8a6 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -221,6 +221,12 @@ struct kvm_sregs {
 
__u32 dbsr; /* KVM_SREGS_E_UPDATE_DBSR */
__u32 dbcr[3];
+   /*
+* iac/dac registers are 64bit wide, while this API
+* interface provides only lower 32 bits on 64 bit
+* processors. ONE_REG interface is added for 64bit
+* iac/dac registers.
+*/
__u32 iac[4];
__u32 dac[2];
__u32 dvc[2];
@@ -325,6 +331,86 @@ struct kvm_book3e_206_tlb_params {
__u32 reserved[8];
 };
 
+/* For KVM_PPC_GET_HTAB_FD */
+struct kvm_get_htab_fd {
+   __u64   flags;
+   __u64   start_index;
+   __u64   reserved[2];
+};
+
+/* Values for kvm_get_htab_fd.flags */
+#define KVM_GET_HTAB_BOLTED_ONLY   ((__u64)0x1)
+#define KVM_GET_HTAB_WRITE ((__u64)0x2)
+
+/*
+ * Data read on the file descriptor is formatted as a series of
+ * records, each consisting of a header followed by a series of
+ * `n_valid' HPTEs (16 bytes each), which are all valid.  Following
+ * those valid HPTEs there are `n_invalid' invalid HPTEs, which
+ * are not represented explicitly in the stream.  The same format
+ * is used for writing.
+ */
+struct kvm_get_htab_header {
+   __u32   index;
+   __u16   n_valid;
+   __u16   n_invalid;
+};
+
 #define KVM_REG_PPC_HIOR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x1)
+#define KVM_REG_PPC_IAC1   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x2)
+#define KVM_REG_PPC_IAC2   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3)
+#define KVM_REG_PPC_IAC3   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x4)
+#define KVM_REG_PPC_IAC4   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x5)
+#define KVM_REG_PPC_DAC1   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x6)
+#define KVM_REG_PPC_DAC2   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x7)
+#define KVM_REG_PPC_DABR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8)
+#define KVM_REG_PPC_DSCR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9)
+#define KVM_REG_PPC_PURR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa)
+#define KVM_REG_PPC_SPURR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb)
+#define KVM_REG_PPC_DAR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc)
+#define KVM_REG_PPC_DSISR  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd)
+#define KVM_REG_PPC_AMR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xe)
+#define KVM_REG_PPC_UAMOR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xf)
+
+#define KVM_REG_PPC_MMCR0  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10)
+#define KVM_REG_PPC_MMCR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11)
+#define KVM_REG_PPC_MMCRA  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12)
+
+#define KVM_REG_PPC_PMC1   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18)
+#define KVM_REG_PPC_PMC2   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19)
+#define KVM_REG_PPC_PMC3   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1a)
+#define KVM_REG_PPC_PMC4   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1b)
+#define KVM_REG_PPC_PMC5   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1c)
+#define KVM_REG_PPC_PMC6   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1d)
+#define KVM_REG_PPC_PMC7   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1e)
+#define KVM_REG_PPC_PMC8   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1f)
+
+/* 32 floating-point registers */
+#define KVM_REG_PPC_FPR0   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x20)
+#define KVM_REG_PPC_FPR(n) (KVM_REG_PPC_FPR0 + (n))
+#define KVM_REG_PPC_FPR31  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3f)
+
+/* 32 VMX/Altivec vector registers */
+#define KVM_REG_PPC_VR0(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x40)
+#define KVM_REG_PPC_VR(n)  (KVM_REG_PPC_VR0 + (n))
+#define KVM_REG_PPC_VR31   (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x5f)
+
+/* 32 double-width FP registers for VSX */
+/* High-order halves overlap with FP regs */
+#define KVM_REG_PPC_VSR0   (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x60)
+#define KVM_REG_PPC_VSR(n) (KVM_REG_PPC_VSR0 + (n))
+#define KVM_REG_PPC_VSR31  (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x7f)
+
+/* FP and vector status/control registers */
+#define KVM_REG_PPC_FPSCR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80)
+#define KVM_REG_PPC_VSCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81)
+
+/* Virtual 

[PATCH 2/3] vfio-pci: [NOT FOR COMMIT] Add support for VGA MMIO and I/O port access

2013-01-07 Thread Alex Williamson
With this, some VGA cards can make it through VGA BIOS init, but I
have yet to see one sync the monitor in VGA text mode.  Only tested
with -vga none.  This adds a new option to vfio-pci, vga=on, which
enables legacy VGA ranges.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/vfio_pci.c |  173 +
 1 file changed, 172 insertions(+), 1 deletion(-)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 94c61ab..846e8de 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -59,6 +59,15 @@ typedef struct VFIOBAR {
 uint8_t nr; /* cache the BAR number for debug */
 } VFIOBAR;
 
+typedef struct VFIOLegacyIO {
+off_t fd_offset;
+int fd;
+MemoryRegion mem;
+off_t region_offset;
+size_t size;
+uint32_t flags;
+} VFIOLegacyIO;
+
 typedef struct VFIOINTx {
 bool pending; /* interrupt pending */
 bool kvm_accel; /* set when QEMU bypass through KVM enabled */
@@ -126,10 +135,15 @@ typedef struct VFIODevice {
 int nr_vectors; /* Number of MSI/MSIX vectors currently in use */
 int interrupt; /* Current interrupt type */
 VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
+VFIOLegacyIO vga[3]; /* 0xa, 0x3b0, 0x3c0 */
 PCIHostDeviceAddress host;
 QLIST_ENTRY(VFIODevice) next;
 struct VFIOGroup *group;
+uint32_t features;
+#define VFIO_FEATURE_ENABLE_VGA_BIT 0
+#define VFIO_FEATURE_ENABLE_VGA (1  VFIO_FEATURE_ENABLE_VGA_BIT)
 bool reset_works;
+bool has_vga;
 } VFIODevice;
 
 typedef struct VFIOGroup {
@@ -958,6 +972,87 @@ static const MemoryRegionOps vfio_bar_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static void vfio_legacy_write(void *opaque, hwaddr addr,
+  uint64_t data, unsigned size)
+{
+VFIOLegacyIO *io = opaque;
+union {
+uint8_t byte;
+uint16_t word;
+uint32_t dword;
+uint64_t qword;
+} buf;
+off_t offset = io-fd_offset + io-region_offset + addr;
+
+switch (size) {
+case 1:
+buf.byte = data;
+break;
+case 2:
+buf.word = cpu_to_le16(data);
+break;
+case 4:
+buf.dword = cpu_to_le32(data);
+break;
+default:
+hw_error(vfio: unsupported write size, %d bytes\n, size);
+break;
+}
+
+if (pwrite(io-fd, buf, size, offset) != size) {
+error_report(%s(,0x%HWADDR_PRIx, 0x%PRIx64, %d) failed: %m\n,
+ __func__, io-region_offset + addr, data, size);
+}
+
+DPRINTF(%s(0x%HWADDR_PRIx, 0x%PRIx64, %d)\n,
+__func__, io-region_offset + addr, data, size);
+}
+
+static uint64_t vfio_legacy_read(void *opaque, hwaddr addr, unsigned size)
+{
+VFIOLegacyIO *io = opaque;
+union {
+uint8_t byte;
+uint16_t word;
+uint32_t dword;
+uint64_t qword;
+} buf;
+uint64_t data = 0;
+off_t offset = io-fd_offset + io-region_offset + addr;
+
+if (pread(io-fd, buf, size, offset) != size) {
+error_report(%s(,0x%HWADDR_PRIx, %d) failed: %m\n,
+ __func__, io-region_offset + addr, size);
+return (uint64_t)-1;
+}
+
+switch (size) {
+case 1:
+data = buf.byte;
+break;
+case 2:
+data = le16_to_cpu(buf.word);
+break;
+case 4:
+data = le32_to_cpu(buf.dword);
+break;
+default:
+hw_error(vfio: unsupported read size, %d bytes\n, size);
+break;
+}
+
+DPRINTF(%s(0x%HWADDR_PRIx, %d) = 0x%PRIx64\n,
+__func__, io-region_offset + addr, size, data);
+
+return data;
+}
+
+static const MemoryRegionOps vfio_legacy_ops = {
+.read = vfio_legacy_read,
+.write = vfio_legacy_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
 /*
  * PCI config space
  */
@@ -1498,6 +1593,27 @@ static void vfio_map_bars(VFIODevice *vdev)
 for (i = 0; i  PCI_ROM_SLOT; i++) {
 vfio_map_bar(vdev, i);
 }
+
+if (vdev-has_vga  (vdev-features  VFIO_FEATURE_ENABLE_VGA)) {
+memory_region_init_io(vdev-vga[0].mem, vfio_legacy_ops,
+  vdev-vga[0], vfio-vga-mmio@0xa,
+  0xc - 0xa);
+memory_region_add_subregion_overlap(pci_address_space(vdev-pdev),
+0xa, vdev-vga[0].mem, 1);
+memory_region_set_coalescing(vdev-vga[0].mem);
+
+memory_region_init_io(vdev-vga[1].mem, vfio_legacy_ops,
+  vdev-vga[1], vfio-vga-io@0x3b0,
+  0x3bc - 0x3b0);
+memory_region_add_subregion_overlap(pci_address_space_io(vdev-pdev),
+0x3b0, vdev-vga[1].mem, 1);
+
+memory_region_init_io(vdev-vga[2].mem, vfio_legacy_ops,
+  vdev-vga[2], vfio-vga-io@0x3c0,
+  0x3e0 - 0x3c0);
+

[PATCH 3/3] vfio-pci: [NOT FOR COMMIT] Hack around HD5450 I/O port backdoor

2013-01-07 Thread Alex Williamson
This is a hack specific to my system which I haven't even attempted
to generalize yet.  The ATI/AMD Radeon HD5450 VGA BIOS appears to have
a backdoor to determine the physical address of the device.  It reads
a value matching the top byte of the I/O Port BAR from a register in
VGA I/O port space then uses in/out to that address during BIOS
execution.  On my setup the I/O port BAR is at 0x4000 physically and
emulated for the guest at 0xc.  So I simply look for this access
and replace 0x40 with 0xc0.  That's enough for it to get through BIOS
init, but it's still only partially functional (no VGA text mode).

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/vfio_pci.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index 846e8de..5db076f 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1041,6 +1041,15 @@ static uint64_t vfio_legacy_read(void *opaque, hwaddr 
addr, unsigned size)
 break;
 }
 
+/* XXX - Complete hardcoded hack, need to figure out how common this is and
+ * come up with a device quirk and match host phys to guest phys.  This is
+ * only known to be needed for an ATI/AMD Radeon HD5450 which stores the
+ * upper byte of the I/O port address in this unused VGA I/O port register.
+ */
+if (io-region_offset == 0x3c0  addr == 3  size == 1  data == 0x40) {
+data = 0xc0;
+}
+
 DPRINTF(%s(0x%HWADDR_PRIx, %d) = 0x%PRIx64\n,
 __func__, io-region_offset + addr, size, data);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: simplify folding of dirty bit into accessed_dirty

2013-01-07 Thread Marcelo Tosatti
On Thu, Dec 27, 2012 at 02:44:58PM +0200, Gleb Natapov wrote:
 MMU code tries to avoid if()s HW is not able to predict reliably by using
 bitwise operation to streamline code execution, but in case of a dirty bit
 folding this gives us nothing since write_fault is checked right before
 the folding code. Lets just piggyback onto the if() to make code more clear.
 
 Signed-off-by: Gleb Natapov g...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FreeBSD-amd64 fails to start with SMP on quemu-kvm

2013-01-07 Thread Marcelo Tosatti
On Mon, Jan 07, 2013 at 06:13:22PM +0100, Artur Samborski wrote:
 Hello,
 
 When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm
 (Fedora Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with:
 
 qemu-kvm -m 1024m -cpu host -smp 2 -cdrom
 /storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso
 
 it freezes KVM with:
 
 KVM internal error. Suberror: 1
 emulation failure
 RAX=80b0d4c0 RBX=0009f000 RCX=c080
 RDX=
 RSI=d238 RDI= RBP=
 RSP=
 R8 = R9 = R10=
 R11=
 R12= R13= R14=
 R15=
 RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =   f300 DPL=3 DS16 [-WA]
 CS =0008   00209900 DPL=0 CS64 [--A]
 SS =9f00 0009f000  f300 DPL=3 DS16 [-WA]
 DS =0018   00c09300 DPL=0 DS   [-WA]
 FS =   f300 DPL=3 DS16 [-WA]
 GS =   f300 DPL=3 DS16 [-WA]
 LDT=   8200 DPL=0 LDT
 TR =   8b00 DPL=0 TSS64-busy
 GDT= 0009f080 0020
 IDT=  
 CR0=8011 CR2= CR3=0009c000 CR4=0030
 DR0= DR1= DR2=
 DR3=
 DR6=0ff0 DR7=0400
 EFER=0501
 Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80
 ff ff ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00 99 20 00 ff ff 00 00

Artur,

Can you check whether 

https://patchwork-mail.kernel.org/patch/1942681/

fixes your problem

TIA

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH] pci-assign: Enable MSIX on device to match guest

2013-01-07 Thread Marcelo Tosatti
On Mon, Jan 07, 2013 at 06:01:19PM +0200, Michael S. Tsirkin wrote:
 On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote:
  When a guest enables MSIX on a device we evaluate the MSIX vector
  table, typically find no unmasked vectors and don't switch the device
  to MSIX mode.  This generally works fine and the device will be
  switched once the guest enables and therefore unmasks a vector.
  Unfortunately some drivers enable MSIX, then use interfaces to send
  commands between VF  PF or PF  firmware that act based on the host
  state of the device.  These therefore may break when MSIX is managed
  lazily.  This change re-enables the previous test used to enable MSIX
  (see qemu-kvm a6b402c9), which basically guesses whether a vector
  will be used based on the data field of the vector table.
  
  Cc: qemu-sta...@nongnu.org
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  Acked-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  Michael has now ack'd this patch as the correct initial first step,
  so I'm resending with that included.  I'm actually not sure what the
  expected upstream path is for this file now that it's part of qemu.
  There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically
  for this file.  Is kvm still upstream for this, through the uq branch
  or is it qemu for anything not specifically part of a kvm interface?
  Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree,
  any path is fine by me.  Thanks,
  
  Alex
 
 I can merge this if there are no other takers.

Go for it.

 
   hw/kvm/pci-assign.c |   17 +++--
   1 file changed, 15 insertions(+), 2 deletions(-)
  
  diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
  index 8ee9428..896cfe8 100644
  --- a/hw/kvm/pci-assign.c
  +++ b/hw/kvm/pci-assign.c
  @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry 
  *entry)
   return (entry-ctrl  cpu_to_le32(0x1)) != 0;
   }
   
  +/*
  + * When MSI-X is first enabled the vector table typically has all the
  + * vectors masked, so we can't use that as the obvious test to figure out
  + * how many vectors to initially enable.  Instead we look at the data field
  + * because this is what worked for pci-assign for a long time.  This makes
  + * sure the physical MSI-X state tracks the guest's view, which is 
  important
  + * for some VF/PF and PF/fw communication channels.
  + */
  +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
  +{
  +return !entry-data;
  +}
  +
   static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
   {
   AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
  *pci_dev)
   
   /* Get the usable entry number for allocating */
   for (i = 0; i  adev-msix_max; i++, entry++) {
  -if (assigned_dev_msix_masked(entry)) {
  +if (assigned_dev_msix_skipped(entry)) {
   continue;
   }
   entries_nr++;
  @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
  *pci_dev)
   for (i = 0; i  adev-msix_max; i++, entry++) {
   adev-msi_virq[i] = -1;
   
  -if (assigned_dev_msix_masked(entry)) {
  +if (assigned_dev_msix_skipped(entry)) {
   continue;
   }
   
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

2013-01-07 Thread mmogilvi
On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov g...@redhat.com wrote:
 On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
 Reading the spec, it is clear that most modes normally leave the IRQ
 output line high, and only pulse it low to generate a leading edge.
 Especially the most commonly used mode 2.
 
 The KVM i8254 model does not try to emulate the duration of the pulse at
 all, so just swap the high/low settings it to leave it high most of
 the time.
 
 This fix is a prerequisite to improving the i8259 model to handle
 the trailing edge of an interupt request as indicated in its spec:
 If it gets a trailing edge of an IRQ line before it starts to service
 the interrupt, the request should be canceled.
 
 See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
 or search the net for 23124406.pdf.
 
 Risks:
 
 There is a risk that migrating a running guest between versions
 with and without this patch will lose or gain a single timer
 interrupt during the migration process.  The only case where
 Can you elaborate on how exactly this can happen? Do not see it.
 

KVM 8254: In the corrected model, when the count expires, the model
briefly pulses output low and then high again, with the low to high
transition being what triggers the interrupt.  In the old model,
when the count expires, the model expects the output line
to already be low, and briefly pulses it high (triggering the
interrupt) and then low again.  But if the line was already
high (because it migrated from the corrected model),
this won't generate a new leading edge (low to high) and won't
trigger a new interrupt (the first post-back-migration pulse turns
into a simple trailing edge instead of a pulse).

Unless there is something I'm missing?

The qemu 8254 model actually models each edge at independent
clock ticks instead of combining both into a very brief pulse at one time.
I've found it handy to draw out old and new timing diagrams on paper
(for each mode), and then carefully think about what happens with respect
to levels and edges when you transition back and forth between old and
new models at various points in the timing cycle.  (Note I've spent more
time examining the qemu models rather than the kvm models.)

 this is likely to be serious is probably losing a single-shot (mode 4)
 interrupt, but if my understanding of how things work is good, then
 that should only be possible if a whole slew of conditions are
 all met:
 
  1. The guest is configured to run in a tickless mode (like
 modern Linux).
  2. The guest is for some reason still using the i8254 rather
 than something more modern like an HPET.  (The combination
 of 1 and 2 should be rare.)
 This is not so rare. For performance reason it is better to not have
 HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.

In a later email you mention that Linux prefers a timer in the APIC.
I don't know much about the APIC (advanced interrupt controller), and
wasn't
even aware had it's own timer.

The big question is if we can safely just fix the i825* models, or if
we need something more subtle to avoid breaking commonly used guests
like modern Linux (support both corrected and old models,
or only fix IRQ2 instead of all IRQs, or similar subtlety).

 
  3. The migration is going from a fixed version back to the
 old version.  (Not sure how common this is, but it should
 be rarer than migrating from old to new.)
  4. There are not going to be any timely events/interrupts
 (keyboard, network, process sleeps, etc) that cause the guest
 to reset the PIT mode 4 one-shot counter soon enough.
 
 This combination should be rare enough that more complicated
 solutions are not worth the effort.
 
 Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
 ---
  arch/x86/kvm/i8254.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index c1d30b2..cd4ec60 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
  }
  spin_unlock(ps-inject_lock);
  if (inject) {
 -kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
 +/* Clear previous interrupt, then create a rising
 + * edge to request another interupt, and leave it at
 + * level=1 until time to inject another one.
 + */
  kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
 +kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
  
  /*
   * Provides NMI watchdog support via Virtual Wire mode.
 -- 
 1.7.10.2.484.gcd07cc5
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes

2013-01-07 Thread mmogilvi
On Mon, 7 Jan 2013 14:04:03 +0200, Gleb Natapov g...@redhat.com wrote:
 On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote:
 Make git_get_out() consistent with spec.  Currently pit_get_out()
 doesn't affect IRQ0, but it can be read by the guest in other ways.
 This makes it consistent with proposed changes in qemu's i8254 model
 as well.
 
 See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
 or search the net for 23124406.pdf.
 
 Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
 ---
  arch/x86/kvm/i8254.c | 44 ++--
  1 file changed, 34 insertions(+), 10 deletions(-)
 
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index cd4ec60..fd38938 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int
 channel)
  
  WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock));
  
 +/* FIXME: Add some way to represent a paused timer and return
 + *   the paused-at counter value, to better model gate pausing,
 + *   wait until next CLK pulse to load counter logic, etc.
 + */
  t = kpit_elapsed(kvm, c, channel);
  d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
  
 @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int
 channel)
  counter = (c-count - d)  0x;
  break;
  case 3:
 -/* XXX: may be incorrect for odd counts */
 -counter = c-count - (mod_64((2 * d), c-count));
 +counter = (c-count - (mod_64((2 * d), c-count)))  0xfffe;
  break;
  default:
  counter = c-count - mod_64(d, c-count);
 @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int
 channel)
  switch (c-mode) {
  default:
  case 0:
 -out = (d = c-count);
 -break;
  case 1:
 -out = (d  c-count);
 +out = (d = c-count);
  break;
  case 2:
 -out = ((mod_64(d, c-count) == 0)  (d != 0));
 +out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0);
  break;
  case 3:
 -out = (mod_64(d, c-count)  ((c-count + 1)  1));
 +out = (mod_64(d, c-count)  ((c-count + 1)  1) || c-gate 
 == 0);
  break;
  case 4:
  case 5:
 -out = (d == c-count);
 +out = (d != c-count);
  break;
  }
  
 @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int
 channel, u32 val)
  
  /*
   * The largest possible initial count is 0; this is equivalent
 - * to 216 for binary counting and 104 for BCD counting.
 + * to pow(2,16) for binary counting and pow(10,4) for BCD counting.
   */
  if (val == 0)
  val = 0x1;
 @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int
 channel, u32 val)
  
  if (channel != 0) {
  ps-channels[channel].count_load_time = ktime_get();
 +
 +/* In gate-triggered one-shot modes,
 + * indirectly model some pit_get_out()
 + * cases by setting the load time way
 + * back until gate-triggered.
 + * (Generally only affects reading status
 + * from channel 2 speaker,
 + * due to hard-wired gates on other
 + * channels.)
 + *
 + * FIXME: This might be redesigned if a paused
 + * timer state is added for pit_get_count().
 + */
 +if (ps-channels[channel].mode == 1 ||
 +ps-channels[channel].mode == 5) {
 +u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ);
 +ps-channels[channel].count_load_time =
 +   
 ktime_sub(ps-channels[channel].count_load_time,
 +  ns_to_ktime(delta));
 I do not understand what are you trying to do here. You assume that
 trigger will happen 2 clocks after counter is loaded?
 

Modes 1 and 5 are single-shot, and they do not start counting until GATE
is triggered, potentially well after count is loaded.  So this is
attempting to model the start of countdown has not been triggered
state as being mostly identical to the already triggered and also
expired some number of clocks (2) ago state.

It might be clearer to have a way to explicitly model a
paused countdown, but such a mechanism doesn't currently exist.

Note that modeling modes 1 and 5 is fairly low priority,
because channel 0's GATE line is generally hard-wired such that
GATE edges/triggers are impossible.  But it may still be
somewhat relevant to the PC speaker channel, or if someone
might want to use this in a model of non-PC hardware.

 +}
  return;
  }
  
 @@ -383,7 +404,6 @@ static void pit_load_count(struct kvm *kvm, int
 channel, u32 val)
   * mode 1 is one shot, mode 2 is period, otherwise del timer */
  switch 

RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support

2013-01-07 Thread Zhang, Yang Z
Marcelo Tosatti wrote on 2013-01-08:
 On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote:
 ioapic_write (or any other ioapic update)
 lock()
 perform update
 make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*)
 unlock()
 
 (*) Similarly to TLB flush.
 
 The advantage is that all work becomes vcpu local. The end result
 is much simpler code.
 What complexity will it remove?
 
 Synchronization between multiple CPUs (except the KVM_REQ_ bit
 processing, which is infrastructure shared by other parts of KVM).
 
 We agreed that performance is non issue here.
The current logic is this:
ioapic_write
lock()
perform update
make request on each vcpu
kick each vcpu
unlock()

The only difference is the way to make the request. And the complex part is 
performing update. With your suggestion, we still need to do the update. Why 
you think it is much simpler?

Best regards,
Yang

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm:linux-next 16/16] arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift'

2013-01-07 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
head:   908e7d7999bcce70ac52e7f390a8f5cbc55948de
commit: 908e7d7999bcce70ac52e7f390a8f5cbc55948de [16/16] KVM: MMU: simplify 
folding of dirty bit into accessed_dirty
config: make ARCH=x86_64 allmodconfig

All warnings:

In file included from arch/x86/kvm/mmu.c:3482:0:
arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic':
arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
[-Wunused-variable]
In file included from arch/x86/kvm/mmu.c:3486:0:
arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic':
arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' 
[-Wunused-variable]

vim +/shift +154 arch/x86/kvm/paging_tmpl.h

8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  138
walker-ptes[level] = pte;
8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  139}
8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  140return 
0;
8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  141  }
8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  142  
ac79c978 drivers/kvm/paging_tmpl.h  Avi Kivity   2007-01-05  143  /*
ac79c978 drivers/kvm/paging_tmpl.h  Avi Kivity   2007-01-05  144   * Fetch 
a guest pte for a guest virtual address
ac79c978 drivers/kvm/paging_tmpl.h  Avi Kivity   2007-01-05  145   */
1e301feb arch/x86/kvm/paging_tmpl.h Joerg Roedel 2010-09-10  146  static 
int FNAME(walk_addr_generic)(struct guest_walker *walker,
1e301feb arch/x86/kvm/paging_tmpl.h Joerg Roedel 2010-09-10  147
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
33770780 arch/x86/kvm/paging_tmpl.h Xiao Guangrong   2010-09-28  148
gva_t addr, u32 access)
6aa8b732 drivers/kvm/paging_tmpl.h  Avi Kivity   2006-12-10  149  {
8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16  150int ret;
42bf3f0a drivers/kvm/paging_tmpl.h  Avi Kivity   2007-10-17  151
pt_element_t pte;
b7233635 arch/x86/kvm/paging_tmpl.h Borislav Petkov  2011-05-30  152
pt_element_t __user *uninitialized_var(ptep_user);
cea0f0e7 drivers/kvm/paging_tmpl.h  Avi Kivity   2007-01-05  153gfn_t 
table_gfn;
b514c30f arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-16 @154
unsigned index, pt_access, pte_access, accessed_dirty, shift;
42bf3f0a drivers/kvm/paging_tmpl.h  Avi Kivity   2007-10-17  155gpa_t 
pte_gpa;
134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01  156int 
offset;
134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01  157const 
int write_fault = access  PFERR_WRITE_MASK;
134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01  158const 
int user_fault  = access  PFERR_USER_MASK;
134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01  159const 
int fetch_fault = access  PFERR_FETCH_MASK;
134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01  160u16 
errcode = 0;
13d22b6a arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-12  161gpa_t 
real_gpa;
13d22b6a arch/x86/kvm/paging_tmpl.h Avi Kivity   2012-09-12  162gfn_t 
gfn;

---
0-DAY kernel build testing backend Open Source Technology Center
Fengguang Wu, Yuanhan Liu  Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-07 Thread Rusty Russell
Paolo Bonzini pbonz...@redhat.com writes:
 Il 07/01/2013 01:02, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 02/01/2013 06:03, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 The virtqueue_add_buf function has two limitations:

 1) it requires the caller to provide all the buffers in a single call;

 2) it does not support chained scatterlists: the buffers must be
 provided as an array of struct scatterlist;

 Chained scatterlists are a horrible interface, but that doesn't mean we
 shouldn't support them if there's a need.

 I think I once even had a patch which passed two chained sgs, rather
 than a combo sg and two length numbers.  It's very old, but I've pasted
 it below.

 Duplicating the implementation by having another interface is pretty
 nasty; I think I'd prefer the chained scatterlists, if that's optimal
 for you.

 Unfortunately, that cannot work because not all architectures support
 chained scatterlists.
 
 WHAT?  I can't figure out what an arch needs to do to support this?

 It needs to use the iterator functions in its DMA driver.

But we don't care for virtio.

 All archs we care about support them, though, so I think we can ignore
 this issue for now.

 Kind of... In principle all QEMU-supported arches can use virtio, and
 the speedup can be quite useful.  And there is no Kconfig symbol for SG
 chains that I can use to disable virtio-scsi on unsupported arches. :/

Well, we #error if it's not supported.  Then the lazy architectures can
fix it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ARM updates for kvmtool

2013-01-07 Thread Christoffer Dall
On Mon, Jan 7, 2013 at 1:14 PM, Will Deacon will.dea...@arm.com wrote:
   - virtio mmio fixes to deal with guest page sizes != 4k (in
   preparation for AArch64, which I will post separately).
 - .dtb dumping via the lkvm command line
 - Support for PSCI firmware as a replacement to the spin-table
   based SMP boot code

 The last option was implemented after discussion on the linux-arm-kernel
 list when adding support for the mach-virt platform.

I completely missed that, it would have been nice if the kvmarm list
were cc'ed on those discussions.

 I hope to upstream
 the kernel-side part of the implementation for 3.9 and expect the kvm
 bits to follow once that has been merged.

 All feedback welcome.

Very cool, I'm looking forward to trying this out, hopefully I'll find
cycles this week.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/5] KVM: MMU: fix Dirty bit missed if CR0.WP = 0

2013-01-07 Thread Xiao Guangrong
If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |   47 ---
 arch/x86/kvm/paging_tmpl.h |   30 +++
 2 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 01d7c2a..2a3c890 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2342,8 +2342,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
 }

 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-   unsigned pte_access, int user_fault,
-   int write_fault, int level,
+   unsigned pte_access, int level,
gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync, bool host_writable)
 {
@@ -2378,9 +2377,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= (u64)pfn  PAGE_SHIFT;

-   if ((pte_access  ACC_WRITE_MASK)
-   || (!vcpu-arch.mmu.direct_map  write_fault
-!is_write_protection(vcpu)  !user_fault)) {
+   if (pte_access  ACC_WRITE_MASK) {

/*
 * There are two cases:
@@ -2399,19 +2396,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;

-   if (!vcpu-arch.mmu.direct_map
-!(pte_access  ACC_WRITE_MASK)) {
-   spte = ~PT_USER_MASK;
-   /*
-* If we converted a user page to a kernel page,
-* so that the kernel can write to it when cr0.wp=0,
-* then we should prevent the kernel from executing it
-* if SMEP is enabled.
-*/
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-   spte |= PT64_NX_MASK;
-   }
-
/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
@@ -2442,18 +2426,15 @@ done:

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 unsigned pt_access, unsigned pte_access,
-int user_fault, int write_fault,
-int *emulate, int level, gfn_t gfn,
-pfn_t pfn, bool speculative,
-bool host_writable)
+int write_fault, int *emulate, int level, gfn_t gfn,
+pfn_t pfn, bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;

-   pgprintk(%s: spte %llx access %x write_fault %d
- user_fault %d gfn %llx\n,
+   pgprintk(%s: spte %llx access %x write_fault %d gfn %llx\n,
 __func__, *sptep, pt_access,
-write_fault, user_fault, gfn);
+write_fault, gfn);

if (is_rmap_spte(*sptep)) {
/*
@@ -2477,9 +2458,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
was_rmapped = 1;
}

-   if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
- level, gfn, pfn, speculative, true,
- host_writable)) {
+   if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
+ true, host_writable)) {
if (write_fault)
*emulate = 1;
kvm_mmu_flush_tlb(vcpu);
@@ -2571,10 +2551,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu 
*vcpu,
return -1;

for (i = 0; i  ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, ACC_ALL,
-access, 0, 0, NULL,
-sp-role.level, gfn,
-page_to_pfn(pages[i]), true, true);
+   mmu_set_spte(vcpu, start, ACC_ALL, access, 0, NULL,
+sp-role.level, gfn, page_to_pfn(pages[i]),
+true, true);

return 0;
 }
@@ -2636,8 +2615,8 

[PATCH v5 2/5] KVM: MMU: fix infinite fault access retry

2013-01-07 Thread Xiao Guangrong
We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn - try to use large page size to map
  the gfn - retry the access -...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/mmu.c |   13 -
 arch/x86/kvm/paging_tmpl.h |   35 ++-
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a3c890..54fc61e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2380,15 +2380,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (pte_access  ACC_WRITE_MASK) {

/*
-* There are two cases:
-* - the one is other vcpu creates new sp in the window
-*   between mapping_level() and acquiring mmu-lock.
-* - the another case is the new sp is created by itself
-*   (page-fault path) when guest uses the target gfn as
-*   its page table.
-* Both of these cases can be fixed by allowing guest to
-* retry the access, it will refault, then we can establish
-* the mapping by using small page.
+* Other vcpu creates new sp in the window between
+* mapping_level() and acquiring mmu-lock. We can
+* allow guest to retry the access, the mapping can
+* be fixed if guest refault.
 */
if (level  PT_PAGE_TABLE_LEVEL 
has_wrprotected_page(vcpu-kvm, gfn, level))
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7c575e7..67b390d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -487,6 +487,38 @@ out_gpte_changed:
return 0;
 }

+ /*
+ * To see whether the mapped gfn can write its page table in the current
+ * mapping.
+ *
+ * It is the helper function of FNAME(page_fault). When guest uses large page
+ * size to map the writable gfn which is used as current page table, we should
+ * force kvm to use small page size to map it because new shadow page will be
+ * created when kvm establishes shadow page table that stop kvm using large
+ * page size. Do it early can avoid unnecessary #PF and emulation.
+ *
+ * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
+ * since the PDPT is always shadowed, that means, we can not use large page
+ * size to map the gfn which is used as PDPT.
+ */
+static bool
+FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
+ struct guest_walker *walker, int user_fault)
+{
+   int level;
+   gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker-level) - 1);
+
+   if (!(walker-pte_access  ACC_WRITE_MASK ||
+ (!is_write_protection(vcpu)  !user_fault)))
+   return false;
+
+   for (level = walker-level; level = walker-max_level; level++)
+   if (!((walker-gfn ^ walker-table_gfn[level - 1])  mask))
+   return true;
+
+   return false;
+}
+
 /*
  * Page fault handler.  There are several causes for a page fault:
  *   - there is no shadow pte for the guest pte
@@ -541,7 +573,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
}

if (walker.level = PT_DIRECTORY_LEVEL)
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
+   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
+  || FNAME(is_self_change_mapping)(vcpu, walker, user_fault);
else
force_pt_level = 1;
if (!force_pt_level) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/5] KVM: x86: clean up reexecute_instruction

2013-01-07 Thread Xiao Guangrong
Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1c9c834..08cacd9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4761,19 +4761,18 @@ static bool reexecute_instruction(struct kvm_vcpu 
*vcpu, gva_t gva)
if (tdp_enabled)
return false;

+   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
+   if (gpa == UNMAPPED_GVA)
+   return true; /* let cpu generate fault */
+
/*
 * if emulation was due to access to shadowed page table
 * and it failed try to unshadow page and re-enter the
 * guest to let CPU execute the instruction.
 */
-   if (kvm_mmu_unprotect_page_virt(vcpu, gva))
+   if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)))
return true;

-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
-
/*
 * Do not retry the unhandleable instruction if it faults on the
 * readonly host memory, otherwise it will goto a infinite loop:
@@ -4828,7 +4827,7 @@ static bool retry_instruction(struct x86_emulate_ctxt 
*ctxt,
if (!vcpu-arch.mmu.direct_map)
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-   kvm_mmu_unprotect_page(vcpu-kvm, gpa  PAGE_SHIFT);
+   kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa));

return true;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/5] KVM: x86: let reexecute_instruction work for tdp

2013-01-07 Thread Xiao Guangrong
Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c |   61 ---
 1 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08cacd9..6f13e03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4753,25 +4753,25 @@ static int handle_emulation_failure(struct kvm_vcpu 
*vcpu)
return r;
 }

-static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
+static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2)
 {
-   gpa_t gpa;
+   gpa_t gpa = cr2;
pfn_t pfn;

-   if (tdp_enabled)
-   return false;
-
-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
+   if (!vcpu-arch.mmu.direct_map) {
+   /*
+* Write permission should be allowed since only
+* write access need to be emulated.
+*/
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-   /*
-* if emulation was due to access to shadowed page table
-* and it failed try to unshadow page and re-enter the
-* guest to let CPU execute the instruction.
-*/
-   if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)))
-   return true;
+   /*
+* If the mapping is invalid in guest, let cpu retry
+* it to generate fault.
+*/
+   if (gpa == UNMAPPED_GVA)
+   return true;
+   }

/*
 * Do not retry the unhandleable instruction if it faults on the
@@ -4780,12 +4780,37 @@ static bool reexecute_instruction(struct kvm_vcpu 
*vcpu, gva_t gva)
 * instruction - ...
 */
pfn = gfn_to_pfn(vcpu-kvm, gpa_to_gfn(gpa));
-   if (!is_error_noslot_pfn(pfn)) {
-   kvm_release_pfn_clean(pfn);
+
+   /*
+* If the instruction failed on the error pfn, it can not be fixed,
+* report the error to userspace.
+*/
+   if (is_error_noslot_pfn(pfn))
+   return false;
+
+   kvm_release_pfn_clean(pfn);
+
+   /* The instructions are well-emulated on direct mmu. */
+   if (vcpu-arch.mmu.direct_map) {
+   unsigned int indirect_shadow_pages;
+
+   spin_lock(vcpu-kvm-mmu_lock);
+   indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages;
+   spin_unlock(vcpu-kvm-mmu_lock);
+
+   if (indirect_shadow_pages)
+   kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa));
+
return true;
}

-   return false;
+   /*
+* if emulation was due to access to shadowed page table
+* and it failed try to unshadow page and re-enter the
+* guest to let CPU execute the instruction.
+*/
+   kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa));
+   return true;
 }

 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/5] KVM: x86: improve reexecute_instruction

2013-01-07 Thread Xiao Guangrong
The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_host.h |7 +++
 arch/x86/kvm/paging_tmpl.h  |   27 ---
 arch/x86/kvm/x86.c  |8 +++-
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c431b33..d6ab8d2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -502,6 +502,13 @@ struct kvm_vcpu_arch {
u64 msr_val;
struct gfn_to_hva_cache data;
} pv_eoi;
+
+   /*
+* Indicate whether the access faults on its page table in guest
+* which is set when fix page fault and used to detect unhandeable
+* instruction.
+*/
+   bool write_fault_to_shadow_pgtable;
 };

 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 67b390d..df50560 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -497,26 +497,34 @@ out_gpte_changed:
  * created when kvm establishes shadow page table that stop kvm using large
  * page size. Do it early can avoid unnecessary #PF and emulation.
  *
+ * @write_fault_to_shadow_pgtable will return true if the fault gfn is
+ * currently used as its page table.
+ *
  * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
  * since the PDPT is always shadowed, that means, we can not use large page
  * size to map the gfn which is used as PDPT.
  */
 static bool
 FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
- struct guest_walker *walker, int user_fault)
+ struct guest_walker *walker, int user_fault,
+ bool *write_fault_to_shadow_pgtable)
 {
int level;
gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker-level) - 1);
+   bool self_changed = false;

if (!(walker-pte_access  ACC_WRITE_MASK ||
  (!is_write_protection(vcpu)  !user_fault)))
return false;

-   for (level = walker-level; level = walker-max_level; level++)
-   if (!((walker-gfn ^ walker-table_gfn[level - 1])  mask))
-   return true;
+   for (level = walker-level; level = walker-max_level; level++) {
+   gfn_t gfn = walker-gfn ^ walker-table_gfn[level - 1];
+
+   self_changed |= !(gfn  mask);
+   *write_fault_to_shadow_pgtable |= !gfn;
+   }

-   return false;
+   return self_changed;
 }

 /*
@@ -544,7 +552,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int level = PT_PAGE_TABLE_LEVEL;
int force_pt_level;
unsigned long mmu_seq;
-   bool map_writable;
+   bool map_writable, is_self_change_mapping;

pgprintk(%s: addr %lx err %x\n, __func__, addr, error_code);

@@ -572,9 +580,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
return 0;
}

+   vcpu-arch.write_fault_to_shadow_pgtable = false;
+
+   is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
+ walker, user_fault, vcpu-arch.write_fault_to_shadow_pgtable);
+
if (walker.level = PT_DIRECTORY_LEVEL)
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
-  || FNAME(is_self_change_mapping)(vcpu, walker, user_fault);
+  || is_self_change_mapping;
else
force_pt_level = 1;
if (!force_pt_level) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f13e03..2957012 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4810,7 +4810,13 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, 
gva_t cr2)
 * guest to let CPU execute the instruction.
 */
kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa));
-   return true;
+
+   /*
+* If the access faults on its page table, it can not
+* be fixed by unprotecting shadow page and it should
+* be reported to userspace.
+*/
+   return !vcpu-arch.write_fault_to_shadow_pgtable;
 }

 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2013-01-07 Thread Michael S. Tsirkin
On Sun, Jan 06, 2013 at 02:36:13PM +0800, Asias He wrote:
 This drops the cmd completion list spin lock and makes the cmd
 completion queue lock-less.
 
 Signed-off-by: Asias He as...@redhat.com


Nicholas, any feedback?

 ---
  drivers/vhost/tcm_vhost.c | 46 +-
  drivers/vhost/tcm_vhost.h |  2 +-
  2 files changed, 14 insertions(+), 34 deletions(-)
 
 diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
 index b20df5c..3720604 100644
 --- a/drivers/vhost/tcm_vhost.c
 +++ b/drivers/vhost/tcm_vhost.c
 @@ -47,6 +47,7 @@
  #include linux/vhost.h
  #include linux/virtio_net.h /* TODO vhost.h currently depends on this */
  #include linux/virtio_scsi.h
 +#include linux/llist.h
  
  #include vhost.c
  #include vhost.h
 @@ -64,8 +65,7 @@ struct vhost_scsi {
   struct vhost_virtqueue vqs[3];
  
   struct vhost_work vs_completion_work; /* cmd completion work item */
 - struct list_head vs_completion_list;  /* cmd completion queue */
 - spinlock_t vs_completion_lock;/* protects s_completion_list */
 + struct llist_head vs_completion_list; /* cmd completion queue */
  };
  
  /* Local pointer to allocated TCM configfs fabric module */
 @@ -301,9 +301,7 @@ static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd 
 *tv_cmd)
  {
   struct vhost_scsi *vs = tv_cmd-tvc_vhost;
  
 - spin_lock_bh(vs-vs_completion_lock);
 - list_add_tail(tv_cmd-tvc_completion_list, vs-vs_completion_list);
 - spin_unlock_bh(vs-vs_completion_lock);
 + llist_add(tv_cmd-tvc_completion_list, vs-vs_completion_list);
  
   vhost_work_queue(vs-dev, vs-vs_completion_work);
  }
 @@ -347,27 +345,6 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd 
 *tv_cmd)
   kfree(tv_cmd);
  }
  
 -/* Dequeue a command from the completion list */
 -static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(
 - struct vhost_scsi *vs)
 -{
 - struct tcm_vhost_cmd *tv_cmd = NULL;
 -
 - spin_lock_bh(vs-vs_completion_lock);
 - if (list_empty(vs-vs_completion_list)) {
 - spin_unlock_bh(vs-vs_completion_lock);
 - return NULL;
 - }
 -
 - list_for_each_entry(tv_cmd, vs-vs_completion_list,
 - tvc_completion_list) {
 - list_del(tv_cmd-tvc_completion_list);
 - break;
 - }
 - spin_unlock_bh(vs-vs_completion_lock);
 - return tv_cmd;
 -}
 -
  /* Fill in status and signal that we are done processing this command
   *
   * This is scheduled in the vhost work queue so we are called with the owner
 @@ -377,12 +354,18 @@ static void vhost_scsi_complete_cmd_work(struct 
 vhost_work *work)
  {
   struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
   vs_completion_work);
 + struct virtio_scsi_cmd_resp v_rsp;
   struct tcm_vhost_cmd *tv_cmd;
 + struct llist_node *llnode;
 + struct se_cmd *se_cmd;
 + int ret;
  
 - while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs))) {
 - struct virtio_scsi_cmd_resp v_rsp;
 - struct se_cmd *se_cmd = tv_cmd-tvc_se_cmd;
 - int ret;
 + llnode = llist_del_all(vs-vs_completion_list);
 + while (llnode) {
 + tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd,
 +  tvc_completion_list);
 + llnode = llist_next(llnode);
 + se_cmd = tv_cmd-tvc_se_cmd;
  
   pr_debug(%s tv_cmd %p resid %u status %#02x\n, __func__,
   tv_cmd, se_cmd-residual_count, se_cmd-scsi_status);
 @@ -426,7 +409,6 @@ static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
   pr_err(Unable to allocate struct tcm_vhost_cmd\n);
   return ERR_PTR(-ENOMEM);
   }
 - INIT_LIST_HEAD(tv_cmd-tvc_completion_list);
   tv_cmd-tvc_tag = v_req-tag;
   tv_cmd-tvc_task_attr = v_req-task_attr;
   tv_cmd-tvc_exp_data_len = exp_data_len;
 @@ -859,8 +841,6 @@ static int vhost_scsi_open(struct inode *inode, struct 
 file *f)
   return -ENOMEM;
  
   vhost_work_init(s-vs_completion_work, vhost_scsi_complete_cmd_work);
 - INIT_LIST_HEAD(s-vs_completion_list);
 - spin_lock_init(s-vs_completion_lock);
  
   s-vqs[VHOST_SCSI_VQ_CTL].handle_kick = vhost_scsi_ctl_handle_kick;
   s-vqs[VHOST_SCSI_VQ_EVT].handle_kick = vhost_scsi_evt_handle_kick;
 diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h
 index 7e87c63..47ee80b 100644
 --- a/drivers/vhost/tcm_vhost.h
 +++ b/drivers/vhost/tcm_vhost.h
 @@ -34,7 +34,7 @@ struct tcm_vhost_cmd {
   /* Sense buffer that will be mapped into outgoing status */
   unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER];
   /* Completed commands list, serviced from vhost worker thread */
 - struct list_head tvc_completion_list;
 + struct llist_node tvc_completion_list;
  };
  
  struct tcm_vhost_nexus {
 -- 
 1.7.11.7
--
To unsubscribe from this 

Re: [PATCH 11/12] virtio-net: migration support for multiqueue

2013-01-07 Thread Michael S. Tsirkin
On Fri, Dec 28, 2012 at 06:32:03PM +0800, Jason Wang wrote:
 This patch add migration support for multiqueue virtio-net. The version were
 bumped to 12.
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  hw/virtio-net.c |   45 +++--
  1 files changed, 35 insertions(+), 10 deletions(-)
 
 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index aaeef1b..ca4b804 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -21,7 +21,7 @@
  #include virtio-net.h
  #include vhost_net.h
  
 -#define VIRTIO_NET_VM_VERSION11
 +#define VIRTIO_NET_VM_VERSION12

Please don't, use a subsection instead.

  #define MAC_TABLE_ENTRIES64
  #define MAX_VLAN(1  12)   /* Per 802.1Q definition */
 @@ -1058,16 +1058,18 @@ static void virtio_net_set_multiqueue(VirtIONet *n, 
 int multiqueue, int ctrl)
  
  static void virtio_net_save(QEMUFile *f, void *opaque)
  {
 +int i;
  VirtIONet *n = opaque;
 -VirtIONetQueue *q = n-vqs[0];
  
 -/* At this point, backend must be stopped, otherwise
 - * it might keep writing to memory. */
 -assert(!q-vhost_started);
 +for (i = 0; i  n-max_queues; i++) {
 +/* At this point, backend must be stopped, otherwise
 + * it might keep writing to memory. */
 +assert(!n-vqs[i].vhost_started);
 +}
  virtio_save(n-vdev, f);
  
  qemu_put_buffer(f, n-mac, ETH_ALEN);
 -qemu_put_be32(f, q-tx_waiting);
 +qemu_put_be32(f, n-vqs[0].tx_waiting);
  qemu_put_be32(f, n-mergeable_rx_bufs);
  qemu_put_be16(f, n-status);
  qemu_put_byte(f, n-promisc);
 @@ -1083,13 +1085,17 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
  qemu_put_byte(f, n-nouni);
  qemu_put_byte(f, n-nobcast);
  qemu_put_byte(f, n-has_ufo);
 +qemu_put_be16(f, n-max_queues);

Above is specified by user so seems unnecessary in the migration stream.

Below should only be put if relevant: check host feature bit
set and/or max_queues  1.

 +qemu_put_be16(f, n-curr_queues);
 +for (i = 1; i  n-curr_queues; i++) {
 +qemu_put_be32(f, n-vqs[i].tx_waiting);
 +}
  }
  
  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
  {
  VirtIONet *n = opaque;
 -VirtIONetQueue *q = n-vqs[0];
 -int ret, i;
 +int ret, i, link_down;
  
  if (version_id  2 || version_id  VIRTIO_NET_VM_VERSION)
  return -EINVAL;
 @@ -1100,7 +1106,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
 int version_id)
  }
  
  qemu_get_buffer(f, n-mac, ETH_ALEN);
 -q-tx_waiting = qemu_get_be32(f);
 +n-vqs[0].tx_waiting = qemu_get_be32(f);
  
  virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
  
 @@ -1170,6 +1176,22 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
 int version_id)
  }
  }
  
 +if (version_id = 12) {
 +if (n-max_queues != qemu_get_be16(f)) {
 +error_report(virtio-net: different max_queues );
 +return -1;
 +}
 +
 +n-curr_queues = qemu_get_be16(f);
 +for (i = 1; i  n-curr_queues; i++) {
 +n-vqs[i].tx_waiting = qemu_get_be32(f);
 +}
 +}
 +
 +virtio_net_set_queues(n);
 +/* Must do this again, since we may have more than one active queues. */

s/queues/queue/

Also I didn't understand why it's here.
It seems that virtio has vm running callback,
and that will invoke virtio_net_set_status after vm load.
No?


 +virtio_net_set_status(n-vdev, n-status);
 +
  /* Find the first multicast entry in the saved MAC filter */
  for (i = 0; i  n-mac_table.in_use; i++) {
  if (n-mac_table.macs[i * ETH_ALEN]  1) {
 @@ -1180,7 +1202,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
 int version_id)
  
  /* nc.link_down can't be migrated, so infer link_down according
   * to link status bit in n-status */
 -qemu_get_queue(n-nic)-link_down = (n-status  VIRTIO_NET_S_LINK_UP) 
 == 0;
 +link_down = (n-status  VIRTIO_NET_S_LINK_UP) == 0;
 +for (i = 0; i  n-max_queues; i++) {
 +qemu_get_subqueue(n-nic, i)-link_down = link_down;
 +}
  
  return 0;
  }
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/12] virtio: introduce virtio_queue_del()

2013-01-07 Thread Michael S. Tsirkin
On Fri, Dec 28, 2012 at 06:31:59PM +0800, Jason Wang wrote:
 Some device (such as virtio-net) needs the ability to destroy or re-order the
 virtqueues, this patch adds a helper to do this.
 
 Signed-off-by: Jason Wang jasowang

Actually del_queue unlike what the subject says :)

 ---
  hw/virtio.c |9 +
  hw/virtio.h |2 ++
  2 files changed, 11 insertions(+), 0 deletions(-)
 
 diff --git a/hw/virtio.c b/hw/virtio.c
 index f40a8c5..bc3c9c3 100644
 --- a/hw/virtio.c
 +++ b/hw/virtio.c
 @@ -700,6 +700,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
 queue_size,
  return vdev-vq[i];
  }
  
 +void virtio_del_queue(VirtIODevice *vdev, int n)
 +{
 +if (n  0 || n = VIRTIO_PCI_QUEUE_MAX) {
 +abort();
 +}
 +
 +vdev-vq[n].vring.num = 0;
 +}
 +
  void virtio_irq(VirtQueue *vq)
  {
  trace_virtio_irq(vq);
 diff --git a/hw/virtio.h b/hw/virtio.h
 index 7c17f7b..f6cb0f9 100644
 --- a/hw/virtio.h
 +++ b/hw/virtio.h
 @@ -138,6 +138,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
 queue_size,
  void (*handle_output)(VirtIODevice *,
VirtQueue *));
  
 +void virtio_del_queue(VirtIODevice *vdev, int n);
 +
  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
  unsigned int len);
  void virtqueue_flush(VirtQueue *vq, unsigned int count);
 -- 
 1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogi...@miniinfo.net wrote:
 On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov g...@redhat.com wrote:
  On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
  Reading the spec, it is clear that most modes normally leave the IRQ
  output line high, and only pulse it low to generate a leading edge.
  Especially the most commonly used mode 2.
  
  The KVM i8254 model does not try to emulate the duration of the pulse at
  all, so just swap the high/low settings it to leave it high most of
  the time.
  
  This fix is a prerequisite to improving the i8259 model to handle
  the trailing edge of an interupt request as indicated in its spec:
  If it gets a trailing edge of an IRQ line before it starts to service
  the interrupt, the request should be canceled.
  
  See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
  or search the net for 23124406.pdf.
  
  Risks:
  
  There is a risk that migrating a running guest between versions
  with and without this patch will lose or gain a single timer
  interrupt during the migration process.  The only case where
  Can you elaborate on how exactly this can happen? Do not see it.
  
 
 KVM 8254: In the corrected model, when the count expires, the model
 briefly pulses output low and then high again, with the low to high
 transition being what triggers the interrupt.  In the old model,
 when the count expires, the model expects the output line
 to already be low, and briefly pulses it high (triggering the
 interrupt) and then low again.  But if the line was already
 high (because it migrated from the corrected model),
 this won't generate a new leading edge (low to high) and won't
 trigger a new interrupt (the first post-back-migration pulse turns
 into a simple trailing edge instead of a pulse).
 
 Unless there is something I'm missing?
 
No, I missed that pic-last_irr/ioapic-irr  will be migrated as 1. But
this means that next interrupt after migration from new to old will
always be lost.  What about clearing pit bit from last_irr/irr before
migration? Should not affect new-new migration and should fix new-old
one. The only problem is that we may need to consult irq routing table
to know how pit is connected to ioapic.

Still do not see how can we gain one interrupt.

 The qemu 8254 model actually models each edge at independent
 clock ticks instead of combining both into a very brief pulse at one time.
 I've found it handy to draw out old and new timing diagrams on paper
 (for each mode), and then carefully think about what happens with respect
 to levels and edges when you transition back and forth between old and
 new models at various points in the timing cycle.  (Note I've spent more
 time examining the qemu models rather than the kvm models.)
 
Yes, drawing it definitely helps :)

  this is likely to be serious is probably losing a single-shot (mode 4)
  interrupt, but if my understanding of how things work is good, then
  that should only be possible if a whole slew of conditions are
  all met:
  
   1. The guest is configured to run in a tickless mode (like
  modern Linux).
   2. The guest is for some reason still using the i8254 rather
  than something more modern like an HPET.  (The combination
  of 1 and 2 should be rare.)
  This is not so rare. For performance reason it is better to not have
  HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
 
 In a later email you mention that Linux prefers a timer in the APIC.
 I don't know much about the APIC (advanced interrupt controller), and
 wasn't
 even aware had it's own timer.
 
 The big question is if we can safely just fix the i825* models, or if
 we need something more subtle to avoid breaking commonly used guests
 like modern Linux (support both corrected and old models,
 or only fix IRQ2 instead of all IRQs, or similar subtlety).
Migration may happen while guest is running firmaware. Who knows what
those are doing. If the fix is as easy as I described above we should go
for it.

 
  
   3. The migration is going from a fixed version back to the
  old version.  (Not sure how common this is, but it should
  be rarer than migrating from old to new.)
   4. There are not going to be any timely events/interrupts
  (keyboard, network, process sleeps, etc) that cause the guest
  to reset the PIT mode 4 one-shot counter soon enough.
  
  This combination should be rare enough that more complicated
  solutions are not worth the effort.
  
  Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
  ---
   arch/x86/kvm/i8254.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
  
  diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
  index c1d30b2..cd4ec60 100644
  --- a/arch/x86/kvm/i8254.c
  +++ b/arch/x86/kvm/i8254.c
  @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
 }
 spin_unlock(ps-inject_lock);
 if (inject) {
  -  kvm_set_irq(kvm, 

Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes

2013-01-07 Thread Gleb Natapov
On Mon, Jan 07, 2013 at 06:35:47PM -0600, mmogi...@miniinfo.net wrote:
 On Mon, 7 Jan 2013 14:04:03 +0200, Gleb Natapov g...@redhat.com wrote:
  On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote:
  Make git_get_out() consistent with spec.  Currently pit_get_out()
  doesn't affect IRQ0, but it can be read by the guest in other ways.
  This makes it consistent with proposed changes in qemu's i8254 model
  as well.
  
  See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
  or search the net for 23124406.pdf.
  
  Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net
  ---
   arch/x86/kvm/i8254.c | 44 ++--
   1 file changed, 34 insertions(+), 10 deletions(-)
  
  diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
  index cd4ec60..fd38938 100644
  --- a/arch/x86/kvm/i8254.c
  +++ b/arch/x86/kvm/i8254.c
  @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int
  channel)
   
 WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock));
   
  +  /* FIXME: Add some way to represent a paused timer and return
  +   *   the paused-at counter value, to better model gate pausing,
  +   *   wait until next CLK pulse to load counter logic, etc.
  +   */
 t = kpit_elapsed(kvm, c, channel);
 d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
   
  @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int
  channel)
 counter = (c-count - d)  0x;
 break;
 case 3:
  -  /* XXX: may be incorrect for odd counts */
  -  counter = c-count - (mod_64((2 * d), c-count));
  +  counter = (c-count - (mod_64((2 * d), c-count)))  0xfffe;
 break;
 default:
 counter = c-count - mod_64(d, c-count);
  @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int
  channel)
 switch (c-mode) {
 default:
 case 0:
  -  out = (d = c-count);
  -  break;
 case 1:
  -  out = (d  c-count);
  +  out = (d = c-count);
 break;
 case 2:
  -  out = ((mod_64(d, c-count) == 0)  (d != 0));
  +  out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0);
 break;
 case 3:
  -  out = (mod_64(d, c-count)  ((c-count + 1)  1));
  +  out = (mod_64(d, c-count)  ((c-count + 1)  1) || c-gate 
  == 0);
 break;
 case 4:
 case 5:
  -  out = (d == c-count);
  +  out = (d != c-count);
 break;
 }
   
  @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int
  channel, u32 val)
   
 /*
  * The largest possible initial count is 0; this is equivalent
  -   * to 216 for binary counting and 104 for BCD counting.
  +   * to pow(2,16) for binary counting and pow(10,4) for BCD counting.
  */
 if (val == 0)
 val = 0x1;
  @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int
  channel, u32 val)
   
 if (channel != 0) {
 ps-channels[channel].count_load_time = ktime_get();
  +
  +  /* In gate-triggered one-shot modes,
  +   * indirectly model some pit_get_out()
  +   * cases by setting the load time way
  +   * back until gate-triggered.
  +   * (Generally only affects reading status
  +   * from channel 2 speaker,
  +   * due to hard-wired gates on other
  +   * channels.)
  +   *
  +   * FIXME: This might be redesigned if a paused
  +   * timer state is added for pit_get_count().
  +   */
  +  if (ps-channels[channel].mode == 1 ||
  +  ps-channels[channel].mode == 5) {
  +  u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ);
  +  ps-channels[channel].count_load_time =
  +   
  ktime_sub(ps-channels[channel].count_load_time,
  +  ns_to_ktime(delta));
  I do not understand what are you trying to do here. You assume that
  trigger will happen 2 clocks after counter is loaded?
  
 
 Modes 1 and 5 are single-shot, and they do not start counting until GATE
 is triggered, potentially well after count is loaded.  So this is
 attempting to model the start of countdown has not been triggered
 state as being mostly identical to the already triggered and also
 expired some number of clocks (2) ago state.
 
So this is still not accurate. This assumes that guest loads counter and
then immediately triggers the gate. If between loading the counter and
triggering the gate guest does something else for a long time the result
will still not be accurate.

 It might be clearer to have a way to explicitly model a
 paused countdown, but such a mechanism doesn't currently exist.
If it worth doing it worth doing right. Should not be hard. Like setting
channels[channel].count_load_time on trigger instead of during count
loading.

 
 Note that modeling modes 1 and 5 is fairly low priority,
 

Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit

2013-01-07 Thread Scott Wood

On 01/04/2013 05:41:42 PM, Alexander Graf wrote:
@@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct  
kvm_vcpu *vcpu,

set_guest_esr(vcpu, vcpu-arch.queued_esr);
if (update_dear == true)
set_guest_dear(vcpu, vcpu-arch.queued_dear);
+   if (update_epr == true) {
+   kvm_make_request(KVM_REQ_EPR_EXIT, vcpu);
+   /* Indicate that we want to recheck requests */
+   allowed = 2;
+   }


We shouldn't need allowed = 2 anymore.

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] KVM: PPC: BookE: Add EPR user space support v3

2013-01-07 Thread Alexander Graf
The FSL MPIC implementation contains a feature called external proxy
facility which allows for interrupts to be acknowledged in the MPIC
as soon as a core accepts its pending external interrupt.

This patch set implements all the necessary pieces to support this
from the kernel space side.

v1 - v2:

  - do an explicit requests check rather than play with return values
  - rework update_epr logic
  - add documentation for ENABLE_CAP on EPR cap

v2 - v3:

  - remove leftover 'allowed==2' logic

Alexander Graf (3):
  KVM: PPC: BookE: Emulate mfspr on EPR
  KVM: PPC: BookE: Implement EPR exit
  KVM: PPC: BookE: Add EPR ONE_REG sync

Mihai Caraman (1):
  KVM: PPC: BookE: Allow irq deliveries to inject requests

 Documentation/virtual/kvm/api.txt   |   41 +-
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |9 +++
 arch/powerpc/include/uapi/asm/kvm.h |6 -
 arch/powerpc/kvm/booke.c|   40 +-
 arch/powerpc/kvm/booke_emulate.c|3 ++
 arch/powerpc/kvm/powerpc.c  |   10 
 include/linux/kvm_host.h|1 +
 include/uapi/linux/kvm.h|6 +
 9 files changed, 114 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: PPC: BookE: Implement EPR exit

2013-01-07 Thread Alexander Graf
The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rework update_epr logic
  - add documentation for ENABLE_CAP on EPR cap

v2 - v3:

  - remove leftover 'allowed==2' logic
---
 Documentation/virtual/kvm/api.txt   |   40 +-
 arch/powerpc/include/asm/kvm_host.h |2 +
 arch/powerpc/include/asm/kvm_ppc.h  |9 +++
 arch/powerpc/kvm/booke.c|   14 +++-
 arch/powerpc/kvm/powerpc.c  |   10 
 include/linux/kvm_host.h|1 +
 include/uapi/linux/kvm.h|6 +
 7 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9cf591d..66bf7cf 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2238,8 +2238,8 @@ executed a memory-mapped I/O instruction which could not 
be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
-NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR
-  and KVM_EXIT_PAPR the corresponding
+NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
+  KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
 operations are complete (and guest state is consistent) only after userspace
 has re-entered the kernel with KVM_RUN.  The kernel side will first finish
 incomplete operations and then check for pending signals.  Userspace
@@ -2342,6 +2342,25 @@ The possible hypercalls are defined in the Power 
Architecture Platform
 Requirements (PAPR) document available from www.power.org (free
 developer registration required to access it).
 
+   /* KVM_EXIT_EPR */
+   struct {
+   __u32 epr;
+   } epr;
+
+On FSL BookE PowerPC chips, the interrupt controller has a fast patch
+interrupt acknowledge path to the core. When the core successfully
+delivers an interrupt, it automatically populates the EPR register with
+the interrupt vector number and acknowledges the interrupt inside
+the interrupt controller.
+
+In case the interrupt controller lives in user space, we need to do
+the interrupt acknowledge cycle through it to fetch the next to be
+delivered interrupt vector using this exit.
+
+It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
+external interrupt has just been delivered into the guest. User space
+should put the acknowledged interrupt vector into the 'epr' field.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -2463,3 +2482,20 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and 
KVM_MMU_FSL_BOOKE_HV:
where num_sets is the tlb_sizes[] value divided by the tlb_ways[] value.
  - The tsize field of mas1 shall be set to 4K on TLB0, even though the
hardware ignores this value for TLB0.
+
+6.4 KVM_CAP_PPC_EPR
+
+Architectures: ppc
+Parameters: args[0] defines whether the proxy facility is active
+Returns: 0 on success; -1 on error
+
+This capability enables or disables the delivery of interrupts through the
+external proxy facility.
+
+When enabled (args[0] != 0), every time the guest gets an external interrupt
+delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
+to receive the topmost interrupt vector.
+
+When disabled (args[0] == 0), behavior is as if this facility is unsupported.
+
+When this capability is enabled, KVM_EXIT_EPR can occur.
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ab49c6c..8a72d59 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -520,6 +520,8 @@ struct kvm_vcpu_arch {
u8 sane;
u8 cpu_type;
u8 hcall_needed;
+   u8 epr_enabled;
+   u8 epr_needed;
 
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 5f5f69a..493630e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -264,6 +264,15 @@ static inline void kvm_linear_init(void)
 {}
 #endif
 
+static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr)
+{
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GEPR, epr);
+#elif defined(CONFIG_BOOKE)
+   vcpu-arch.epr = epr;
+#endif
+}
+
 int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
 

  1   2   >