Re: [PATCH 1/4] KVM: Provide function for VCPU lookup by id

2015-11-19 Thread David Hildenbrand
> On 11/19/2015 02:13 PM, Paolo Bonzini wrote:
> > 
> > 
> > On 19/11/2015 13:55, David Hildenbrand wrote:
> >>>>
> >>>> I'm pretty sure this will cause confusion in the future!
> >>>> ==> Could you maybe name the new function something like
> >>>> "kvm_lookup_vcpu_by_id" or "kvm_get_vcpu_by_id" instead?
> >> Had that in a previous version but decided to name it that way ... hmm ...
> >> other opinions?
> > 
> > Having _by_id at the end of the name makes sense indeed.
> 
> David,
> I will fix up the function name myself to kvm_get_vcpu_by_id. Ok?

Sure, thanks a lot!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Provide function for VCPU lookup by id

2015-11-19 Thread David Hildenbrand
> 
> Any chance that you name this function differently? Otherwise we've got
> two functions that sound very similar and also have similar prototypes:
> 
> - kvm_get_vcpu(struct kvm *kvm, int i)
> - kvm_lookup_vcpu(struct kvm *kvm, int id)
> 
> I'm pretty sure this will cause confusion in the future!
> ==> Could you maybe name the new function something like
> "kvm_lookup_vcpu_by_id" or "kvm_get_vcpu_by_id" instead?

Had that in a previous version but decided to name it that way ... hmm ...
other opinions?

> 
> Also a short comment before the function would be nice to make the
> reader aware of the difference (e.g. when hot-plugging) between the
> vcpu-id and array-id.

Also not sure as the header directly includes the (for my eyes ;) ) easy code.
I would agree for a pure prototype. Other opinions?

> 
> Otherwise: +1 for finally having a proper common function for this!
> 

Thanks for having a look Thomas!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: disable halt_poll_ns as default for s390x

2015-09-18 Thread David Hildenbrand
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand 
---
 arch/arm/include/asm/kvm_host.h | 1 +
 arch/arm64/include/asm/kvm_host.h   | 1 +
 arch/mips/include/asm/kvm_host.h| 1 +
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/s390/include/asm/kvm_host.h| 1 +
 arch/x86/include/asm/kvm_host.h | 1 +
 virt/kvm/kvm_main.c | 4 ++--
 7 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..780499f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -39,6 +39,7 @@
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #define KVM_HAVE_ONE_REG
+#define KVM_HALT_POLL_NS_DEFAULT 50
 
 #define KVM_VCPU_MAX_FEATURES 2
 
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 415938d..bac73c8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,6 +39,7 @@
 #define KVM_USER_MEM_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HALT_POLL_NS_DEFAULT 50
 
 #include 
 #include 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index e8c8d9d..3b8aca4 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -61,6 +61,7 @@
 #define KVM_PRIVATE_MEM_SLOTS  0
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HALT_POLL_NS_DEFAULT 50
 
 
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 98eebbf..4e51268 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -44,6 +44,7 @@
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 #endif
+#define KVM_HALT_POLL_NS_DEFAULT 50
 
 /* These values are internal and can be increased later */
 #define KVM_NR_IRQCHIPS  1
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3d012e0..0cb5aad 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -35,6 +35,7 @@
  */
 #define KVM_NR_IRQCHIPS 1
 #define KVM_IRQCHIP_NUM_PINS 4096
+#define KVM_HALT_POLL_NS_DEFAULT 0
 
 #define SIGP_CTRL_C0x80
 #define SIGP_CTRL_SCN_MASK 0x3f
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c12e845..a66a98c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -40,6 +40,7 @@
 
 #define KVM_PIO_PAGE_OFFSET 1
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
+#define KVM_HALT_POLL_NS_DEFAULT 50
 
 #define KVM_IRQCHIP_NUM_PINS  KVM_IOAPIC_NUM_PINS
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eb4c9d2..a447c8c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,8 +66,8 @@
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
-/* halt polling only reduces halt latency by 5-7 us, 500us is enough */
-static unsigned int halt_poll_ns = 50;
+/* Architectures should define their poll value according to the halt latency 
*/
+static unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
 module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
 
 /* Default doubles per-vcpu halt_poll_ns. */
-- 
2.3.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: add halt_attempted_poll to VCPU stats

2015-09-17 Thread David Hildenbrand
> Am 15.09.2015 um 18:27 schrieb Paolo Bonzini:
> > This new statistic can help diagnosing VCPUs that, for any reason,
> > trigger bad behavior of halt_poll_ns autotuning.
> > 
> > For example, say halt_poll_ns = 48, and wakeups are spaced exactly
> > like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
> > 10+20+40+80+160+320+480 = 1110 microseconds out of every
> > 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
> > is consuming about 30% more CPU than it would use without
> > polling.  This would show as an abnormally high number of
> > attempted polling compared to the successful polls.
> > 
> > Cc: Christian Borntraeger  > Cc: David Matlack 
> > Signed-off-by: Paolo Bonzini 
> 
> Acked-by: Christian Borntraeger 
> 
> yes, this will help to detect some bad cases, but not all.
> 
> PS: 
> upstream maintenance keeps me really busy at the moment :-)
> I am looking into a case right now, where auto polling goes 
> completely nuts on my system:
> 
> guest1: 8vcpusguest2: 1 vcpu
> iperf with 25 process (-P25) from guest1 to guest2.
> 
> I/O interrupts on s390 are floating (pending on all CPUs) so on 
> ALL VCPUs that go to sleep, polling will consider any pending
> network interrupt as successful poll. So with auto polling the
> guest consumes up to 5 host CPUs without auto polling only 1.
> Reducing  halt_poll_ns to 10 seems to work (goes back to 
> 1 cpu).
> 
> The proper way might be to feedback the result of the
> interrupt dequeue into the heuristics. Don't know yet how
> to handle that properly.
> 
> Christian

I think the main problem is that we have two different kinds of wakeups, and
they can't be properly reported for now. "runnability" says nothing about the
"reason".

a) "forced wakeup" - "local workload"
- "local" interrupt/timer pending
- signal
- (request bits, nested irqs ... for some archs)
-> Impacts halt_poll_ns

b) "trial wakeup" - "floating" workload
-> floating interrupts
Another vcpu might be faster and dequeue the floating interrupt.
However, if we have a high I/O load, we want all VCPUs to reduce their
halt_poll_ns value. Special cases would also be:
- Only one VCPU in the system
- Only one VCPU running
- Only one VCPU that is enabled for this type of interrupt
-> Impacts halt_poll_ns only partially

So kvm_arch_vcpu_runnable() returns true for multiple
VCPUs, although only one might get the interrupt. If we could change that
internally (one VCPU reserving that interrupt), we might get this working out of
the box. As kvm_arch_vcpu_runnable is on the hot path and also called from other
VCPUs, this isn't that trivial. Will play with it. Until then, I'll prepare a
patch to disable it for s390x, just as Paolo suggested.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-04-13 Thread David Hildenbrand
> On Tue, Mar 31, 2015 at 04:08:02PM +0100, Alex Bennée wrote:
> > This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
> > ioctl. Currently any operation flag will return EINVAL. Actual
> > functionality will be added with further patches.
> > 
> > Signed-off-by: Alex Bennée .
> > 
> > ---
> > v2
> >   - simplified form of the ioctl (stuff will go into setup_debug)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index b112efc..06c5064 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2604,7 +2604,7 @@ handled.
> >  4.87 KVM_SET_GUEST_DEBUG
> >  
> >  Capability: KVM_CAP_SET_GUEST_DEBUG
> > -Architectures: x86, s390, ppc
> > +Architectures: x86, s390, ppc, arm64
> >  Type: vcpu ioctl
> >  Parameters: struct kvm_guest_debug (in)
> >  Returns: 0 on success; -1 on error
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 5560f74..445933d 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -183,6 +183,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> > ext)
> > case KVM_CAP_ARM_PSCI:
> > case KVM_CAP_ARM_PSCI_0_2:
> > case KVM_CAP_READONLY_MEM:
> > +   case KVM_CAP_SET_GUEST_DEBUG:
> > r = 1;
> > break;
> 
> shouldn't you wait with advertising this capability until you've
> implemented support for it?
> 

I think this would work for now, however it's not very practical
- in the end one has to sense which debug flags are actually supported.

Question is if he wants to add initial support and extend functionality and
flags with each patch or enable the whole set of features in one shot at the
end.

Doing the latter seems more practicable to me (especially as the debug features
are added in the same patch series).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/10] KVM: arm64: guest debug, add SW break point support

2015-04-02 Thread David Hildenbrand
> This adds support for SW breakpoints inserted by userspace.
> 
> We do this by trapping all BKPT exceptions in the
> hypervisor (MDCR_EL2_TDE). The kvm_debug_exit_arch carries the address
> of the exception. If user-space doesn't know of the breakpoint then we
> have a guest inserted breakpoint and the hypervisor needs to start again
> and deliver the exception to guest.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - update to use new exit struct
>   - tweak for C setup
>   - do our setup in debug_setup/clear code
>   - fixed up comments
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 06c5064..17d4f9c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2626,7 +2626,7 @@ when running. Common control bits are:
>  The top 16 bits of the control field are architecture specific control
>  flags which can include the following:
> 
> -  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
> +  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
>- KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
>- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
>- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ea8b0e..d3bc8dc 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -304,7 +304,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_arm_set_running_vcpu(NULL);
>  }
> 
> -#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE)
> +#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE|KVM_GUESTDBG_USE_SW_BP)
> 
>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg)
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8a29d0b..cff0475 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -45,11 +45,18 @@ void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMCR);
>   vcpu->arch.mdcr_el2 |= (MDCR_EL2_TDRA | MDCR_EL2_TDOSA);
> 
> + /* Trap debug register access? */

This should probably go to the earlier patch.

>   if (!vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
>   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>   else
>   vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDA;
> 
> + /* Trap breakpoints? */
> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
> + else
> + vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDE;

Again, a candidate for clear_debug?

> +
>  }
> 
>  void kvm_arch_clear_debug(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 524fa25..ed1bbb4 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -82,6 +82,37 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   return 1;
>  }
> 
> +/**
> + * kvm_handle_debug_exception - handle a debug exception instruction

"kvm_handle_guest_debug"

> + *
> + * @vcpu:the vcpu pointer
> + * @run: access to the kvm_run structure for results
> + *
> + * We route all debug exceptions through the same handler as we
> + * just need to report the PC and the HSR values to userspace.
> + * Userspace may decide to re-inject the exception and deliver it to
> + * the guest if it wasn't for the host to deal with.
> + */
> +static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> + u32 hsr = kvm_vcpu_get_hsr(vcpu);
> +
> + run->exit_reason = KVM_EXIT_DEBUG;
> + run->debug.arch.hsr = hsr;
> +
> + switch (hsr >> ESR_ELx_EC_SHIFT) {
> + case ESR_ELx_EC_BKPT32:
> + case ESR_ELx_EC_BRK64:
> + run->debug.arch.pc = *vcpu_pc(vcpu);
> + break;
> + default:
> + kvm_err("%s: un-handled case hsr: %#08x\n",
> + __func__, (unsigned int) hsr);

Don't you want to fail hard in this case? This might result in many messages.
returning 0 feels wrong.

> + break;
> + }
> + return 0;
> +}
> +
>  static exit_handle_fn arm_exit_handlers[] = {
>   [ESR_ELx_EC_WFx]= kvm_handle_wfx,
>   [ESR_ELx_EC_CP15_32]= kvm_handle_cp15_32,
> @@ -96,6 +127,8 @@ static exit_handle_fn arm_exit_handlers[] = {
>   [ESR_ELx_EC_SYS64]  = kvm_handle_sys_reg,
>   [ESR_ELx_EC_IABT_LOW]   = kvm_handle_guest_abort,
>   [ESR_ELx_EC_DABT_LOW]   = kvm_handle_guest_abort,
> + [ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
> + [ESR_ELx_EC_BRK64]  = kvm_handle_guest_debug,
>  };
> 
>  static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)


David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

Re: [PATCH v2 05/10] KVM: arm: introduce kvm_arch_setup/clear_debug()

2015-04-01 Thread David Hildenbrand
> This is a precursor for later patches which will need to do more to
> setup debug state before entering the hyp.S switch code. The existing
> functionality for setting mdcr_el2 has been moved out of hyp.S and now
> uses the value kept in vcpu->arch.mdcr_el2.
> 
> This also moves the conditional setting of the TDA bit from the hyp code
> into the C code.
> 
> Signed-off-by: Alex Bennée 
> 
>  create mode 100644 arch/arm64/kvm/debug.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 41008cd..8c01c97 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -242,5 +242,7 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_arch_setup_debug(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_clear_debug(struct kvm_vcpu *vcpu) {}

Do you really want to call these functions "kvm_arch.." although they are not
defined for other arch and not triggered by common code?

kvm_setup ... or kvm_arm_setup...

> 
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 445933d..7ea8b0e 100644
> --- a/arch/arm/kvm/arm.c
[...]
> +#include 
> +
> +#include 
> +#include 
> +
> +/**
> + * kvm_arch_setup_debug - set-up debug related stuff
> + *
> + * @vcpu:the vcpu pointer
> + *
> + * This is called before each entry in to the hypervisor to setup any
> + * debug related registers. Currently this just ensures we will trap
> + * access to:
> + *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> + *  - Debug ROM Address (MDCR_EL2_TDRA)
> + *  - Power down debug registers (MDCR_EL2_TDOSA)
> + *
> + * Additionally the hypervisor lazily saves/restores the debug
> + * register state. If it is not currently doing so (arch.debug_flags)
> + * then we also need to ensure we trap if the guest messes with them
> + * so we know we need to save them.
> + */
> +
> +void kvm_arch_setup_debug(struct kvm_vcpu *vcpu)
> +{
> + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | MDCR_EL2_TPMCR);
> + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TDRA | MDCR_EL2_TDOSA);

I'd put that into a single assignment.

> +
> + if (!vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
> + else
> + vcpu->arch.mdcr_el2 &= ~MDCR_EL2_TDA;
> +
> +}
> +
> +void kvm_arch_clear_debug(struct kvm_vcpu *vcpu)
> +{
> + /* Nothing to do yet */

Wonder if that would be the right place to clear MDCR_EL2_TDA unconditionally?
But see my comment below.

> +}
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index 5befd01..be92bfe1 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -768,17 +768,8 @@
>   mov x2, #(1 << 15)  // Trap CP15 Cr=15
>   msr hstr_el2, x2
> 
> - mrs x2, mdcr_el2
> - and x2, x2, #MDCR_EL2_HPMN_MASK
> - orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> - orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
> -
> - // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
> - // if not dirty.
> - ldr x3, [x0, #VCPU_DEBUG_FLAGS]
> - tbnzx3, #KVM_ARM64_DEBUG_DIRTY_SHIFT, 1f
> - orr x2, x2,  #MDCR_EL2_TDA
> -1:

*neither an assembler nor arm expert*
The existing code always cleared all bits except MDCR_EL2_HPMN_MASK. So they
remained set.

We would now always overwrite these bits with the values from 
vcpu->arch.mdcr_el2, is that okay?

> + // Monitor Debug Config - see kvm_arch_setup_debug()
> + ldr x2, [x0, #VCPU_MDCR_EL2]
>   msr mdcr_el2, x2
>  .endm
> 



David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/10] KVM: arm: guest debug, define API headers

2015-04-01 Thread David Hildenbrand

> > Looks good to me!
> 
> Is that a Reviewed-by?

Now it is :)

Reviewed-by: David Hildenbrand 

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-04-01 Thread David Hildenbrand
> This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
> ioctl. Currently any operation flag will return EINVAL. Actual

Well it won't return -EINVAL if you push in KVM_GUESTDBG_ENABLE or 0.

"Any unsupported flag will return -EINVAL. For now, only KVM_GUESTDBG_ENABLE is
supported, although it won't have any effects."

> functionality will be added with further patches.
> 
> Signed-off-by: Alex Bennée .
> 
> ---
> v2
>   - simplified form of the ioctl (stuff will go into setup_debug)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index b112efc..06c5064 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2604,7 +2604,7 @@ handled.
>  4.87 KVM_SET_GUEST_DEBUG
> 
>  Capability: KVM_CAP_SET_GUEST_DEBUG
> -Architectures: x86, s390, ppc
> +Architectures: x86, s390, ppc, arm64
>  Type: vcpu ioctl
>  Parameters: struct kvm_guest_debug (in)
>  Returns: 0 on success; -1 on error
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 5560f74..445933d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -183,6 +183,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_ARM_PSCI:
>   case KVM_CAP_ARM_PSCI_0_2:
>   case KVM_CAP_READONLY_MEM:
> + case KVM_CAP_SET_GUEST_DEBUG:
>   r = 1;
>   break;
>   case KVM_CAP_COALESCED_MMIO:
> @@ -303,10 +304,21 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_arm_set_running_vcpu(NULL);
>  }
> 
> +#define KVM_GUESTDBG_VALID (KVM_GUESTDBG_ENABLE)

That makes me rather think that it is another flag.

We(s390x) use VALID_GUESTDBG_FLAGS, what about that or KVM_GUESTDBG_VALID_MASK?

> +
>  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>   struct kvm_guest_debug *dbg)
>  {
> - return -EINVAL;
> + if (dbg->control & KVM_GUESTDBG_ENABLE) {
> + if (dbg->control & ~KVM_GUESTDBG_VALID)
> + return -EINVAL;

I'd move that check directly to the start of the function and bail out on any
unsupported flag.

> +
> + vcpu->guest_debug = dbg->control;
> + } else {
> + /* If not enabled clear all flags */
> + vcpu->guest_debug = 0;
> + }
> + return 0;
>  }
> 
> 

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/10] KVM: arm: guest debug, define API headers

2015-04-01 Thread David Hildenbrand
> This commit defines the API headers for guest debugging. There are two
> architecture specific debug structures:
> 
>   - kvm_guest_debug_arch, allows us to pass in HW debug registers
>   - kvm_debug_exit_arch, signals the exact debug exit and pc
> 
> The type of debugging being used is control by the architecture specific

s/control/controlled/

> control bits of the kvm_guest_debug->control flags in the ioctl
> structure.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>- expose hsr and pc directly to user-space
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 3ef77a4..6ee70a0 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -100,10 +100,24 @@ struct kvm_sregs {
>  struct kvm_fpu {
>  };
> 
> +/*
> + * See ARM ARM D7.3: Debug Registers

Maybe drop one ARM

> + *
> + * The control registers are architecturally defined as 32 bits but are
> + * stored as 64 bit values along side the value registers and aligned
> + * with the rest 64 bit registers in the normal CPU context.
> + */
> +#define KVM_ARM_NDBG_REGS 16
>  struct kvm_guest_debug_arch {
> + __u64 dbg_bcr[KVM_ARM_NDBG_REGS];
> + __u64 dbg_bvr[KVM_ARM_NDBG_REGS];
> + __u64 dbg_wcr[KVM_ARM_NDBG_REGS];
> + __u64 dbg_wvr[KVM_ARM_NDBG_REGS];
>  };
> 
>  struct kvm_debug_exit_arch {
> + __u64 pc;
> + __u32 hsr;
>  };
> 
>  struct kvm_sync_regs {
> @@ -207,4 +221,11 @@ struct kvm_arch_memory_slot {
> 
>  #endif
> 
> +/*
> + * Architecture related debug defines - upper 16 bits of
> + * kvm_guest_debug->control
> + */
> +#define KVM_GUESTDBG_USE_SW_BP   __KVM_GUESTDBG_USE_SW_BP
> +#define KVM_GUESTDBG_USE_HW_BP   __KVM_GUESTDBG_USE_HW_BP
> +
>  #endif /* __ARM_KVM_H__ */

Looks good to me!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/10] KVM: add commentary for kvm_debug_exit_arch struct

2015-04-01 Thread David Hildenbrand
> Bring into line with the commentary for the other structures and their
> KVM_EXIT_* cases.

s/commentary/comments/ in the subject and description. Unless you want to add a
lengthy discussion :)

> 
> Signed-off-by: Alex Bennée 
> 
> ---
> 
> v2
>   - add comments for other exit types
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 8055706..5eedf84 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -226,6 +226,7 @@ struct kvm_run {
>   __u32 count;
>   __u64 data_offset; /* relative to kvm_run start */
>   } io;
> + /* KVM_EXIT_DEBUG */
>   struct {
>   struct kvm_debug_exit_arch arch;
>   } debug;
> @@ -274,6 +275,7 @@ struct kvm_run {
>   __u32 data;
>   __u8  is_write;
>   } dcr;
> + /* KVM_EXIT_INTERNAL_ERROR */
>   struct {
>   __u32 suberror;
>   /* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
> @@ -284,6 +286,7 @@ struct kvm_run {
>   struct {
>   __u64 gprs[32];
>   } osi;
> + /* KVM_EXIT_PAPR_HCALL */
>   struct {
>   __u64 nr;
>   __u64 ret;

Looks good to me.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-12-03 Thread David Hildenbrand

> Applied with a rewritten commit message:
> 
> KVM: don't check for PF_VCPU when yielding
> 
> kvm_enter_guest() has to be called with preemption disabled and will
> set PF_VCPU.  Current code takes PF_VCPU as a hint that the VCPU thread
> is running and therefore needs no yield.
> 
> However, the check on PF_VCPU is wrong on s390, where preemption
> has to stay enabled on s390 in order to correctly process page faults.
> Thus, s390 reenables preemption and starts to execute the guest.
> The thread might be scheduled out between kvm_enter_guest() and
> kvm_exit_guest(), resulting in PF_VCPU being set but not being run.
> When this happens, the opportunity for directed yield is missed.
> 
> However, this check is done already in kvm_vcpu_on_spin before calling
> kvm_vcpu_yield_loop:
> 
> if (!ACCESS_ONCE(vcpu->preempted))
> continue;
> 
> so the check on PF_VCPU is superfluous in general, and this patch 
> removes it.
> 
> Signed-off-by: David Hildenbrand 
> Signed-off-by: Paolo Bonzini 
> 

Perfect, thanks!

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] assign each vcpu an owning thread and improve yielding

2014-12-03 Thread David Hildenbrand
> 
> 
> On 03/12/2014 13:12, David Hildenbrand wrote:
> >> This series improves yielding on architectures that cannot disable 
> >> preemption
> >> while entering the guest and makes the creating thread of a VCPU the owning
> >> thread and therefore the yield target when yielding to that VCPU.
> >>
> >> We should focus on the case creating thread == executing thread and 
> >> therefore
> >> remove the complicated handling of PIDs involving synchronize_rcus.
> >>
> >> This way we can speed up the creation of VCPUs and directly yield to the
> >> executing vcpu threads.
> >>
> >> Please note that - in theory - all VCPU ioctls should be triggered from 
> >> the same
> >> VCPU thread, so changing threads is not a scenario we should optimize.
> >>
> >>
> >> David Hildenbrand (2):
> >>   KVM: don't check for PF_VCPU when yielding
> >>   KVM: thread creating a vcpu is the owner of that vcpu
> >>
> >>  include/linux/kvm_host.h |  1 +
> >>  virt/kvm/kvm_main.c  | 22 ++
> >>  2 files changed, 3 insertions(+), 20 deletions(-)
> >>
> > 
> > Hi Paolo,
> > 
> > would be good if you could have a look at these patches.
> 
> Sure.
> 
> I think patch 1 is fine and I am applying it.  For patch 2, what about
> moving the ->pid assignment in the KVM_RUN case of kvm_vcpu_ioctl?

Thanks Paolo!

Well, do we have any known user that relies on this thread-switching in case
of KVM_RUN? If yes, I am totally with you.

If not I'd prefer to get this code completely out, as it contains some
unnecessary complexity. And maintaining such code that already had a couple of
bugs in it without any benefit doesn't make much sense. What do you think?

David

> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] assign each vcpu an owning thread and improve yielding

2014-12-03 Thread David Hildenbrand
> This series improves yielding on architectures that cannot disable preemption
> while entering the guest and makes the creating thread of a VCPU the owning
> thread and therefore the yield target when yielding to that VCPU.
> 
> We should focus on the case creating thread == executing thread and therefore
> remove the complicated handling of PIDs involving synchronize_rcus.
> 
> This way we can speed up the creation of VCPUs and directly yield to the
> executing vcpu threads.
> 
> Please note that - in theory - all VCPU ioctls should be triggered from the 
> same
> VCPU thread, so changing threads is not a scenario we should optimize.
> 
> 
> David Hildenbrand (2):
>   KVM: don't check for PF_VCPU when yielding
>   KVM: thread creating a vcpu is the owner of that vcpu
> 
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c  | 22 ++
>  2 files changed, 3 insertions(+), 20 deletions(-)
> 

Hi Paolo,

would be good if you could have a look at these patches.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-12-01 Thread David Hildenbrand
> On 11/28/2014 04:28 PM, Christian Borntraeger wrote:
> > Am 28.11.2014 um 11:08 schrieb Raghavendra KT:
> >> Was able to test the patch, here is the result: I have not tested with
> >> bigger VMs though. Results make it difficult to talk about any side
> >> effect of
> >> patch if any.
> >
> > Thanks a log.
> >
> > If our assumption is correct, then this patch should have no side effect on 
> > x86. Do you have any confidence guess if the numbers below mean: no-change 
> > vs. regression vs improvement?
> >
> 
> I am seeing very small improvement in <= 1x commit cases
> and for >1x overcommit, a very slight regression. But considering the
> test environment noises, I do not see much effect from the
> patch.
> 
> But I admit, I have not explored deeply about,
> 1. assumption of preempted approximately equals PF_VCPU case logic,

PF_VCPU is only a hint whether the target vcpu is executing the guest.
If preemption is off or !s390, PF_VCPU means that the target vcpu is running
and can't be preempted.

Although for preemption on and s390, this statement is false. Therefore this
check is not always right.

> 2. whether it helps for any future usages of yield_to against current
> sole usage of virtualization.
> 
> 
> 
Thanks for your test!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-11-26 Thread David Hildenbrand
> This change is a trade-off.
> PRO: This patch would improve the case of preemption on s390. This is 
> probably a corner case as most distros have preemption off anyway.
> CON: The downside is that kvm_vcpu_yield_to is called also from 
> kvm_vcpu_on_spin. Here we want to avoid the scheduler overhead for a wrong 
> decision.   

Won't most of that part be covered by:
if (!ACCESS_ONCE(vcpu->preempted))

vcpu->preempted is only set when scheduled out involuntarily. It is cleared
when scheduled in. s390 sets it manually, to speed up waking up a vcpu.

So when our task is scheduled in (an PF_VCPU is set), this check will already
avoid scheduler overhead in kvm_vcpu_on_spin() or am I missing something?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/2] assign each vcpu an owning thread and improve yielding

2014-11-25 Thread David Hildenbrand
This series improves yielding on architectures that cannot disable preemption
while entering the guest and makes the creating thread of a VCPU the owning
thread and therefore the yield target when yielding to that VCPU.

We should focus on the case creating thread == executing thread and therefore
remove the complicated handling of PIDs involving synchronize_rcus.

This way we can speed up the creation of VCPUs and directly yield to the
executing vcpu threads.

Please note that - in theory - all VCPU ioctls should be triggered from the same
VCPU thread, so changing threads is not a scenario we should optimize.


David Hildenbrand (2):
  KVM: don't check for PF_VCPU when yielding
  KVM: thread creating a vcpu is the owner of that vcpu

 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 22 ++
 2 files changed, 3 insertions(+), 20 deletions(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/2] KVM: thread creating a vcpu is the owner of that vcpu

2014-11-25 Thread David Hildenbrand
Currently, we allow changing the PID of a VCPU. This PID is used to
identify the thread to yield to if we want to yield to this specific
VCPU.

In practice (e.g. QEMU), the thread creating and executing the VCPU remains
always the same. Temporarily exchanging the PID (e.g. because an ioctl is
triggered from a wrong thread) doesn't really make sense.

The PID is exchanged and a synchronize_rcu() is called. When the executing
thread tries to run the VCPU again, another synchronize_rcu() happens.

If a yield to that VCPU is triggered while the PID of the wrong thread is 
active,
the wrong thread might receive a yield, but this will most likely not
help the executing thread at all. The executing thread won't have a higher
priority after the wrong thread has finished with the ioctl. The wrong thread
will even receive yields afterwards that were targeted to the executing vcpu,
until the executing VCPU has replaced the PID on the next ioctl - doesn't feel
correct to me.

This patch makes the creating thread the owning thread, and therefore the only
valid yield candidate (especially because VCPU ioctls are - in theory - only
valid when triggered from the owning thread - old user space versions may not
stick to this rule). This should also speed up the initial start of all VCPUs,
when the PID is assigned for the first time.

Should be backwards compatible - if there is any old user space version out
there that doesn't stick to the creating == executing thread rule, yields will
not work as intended.

Signed-off-by: David Hildenbrand 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 18 ++
 2 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index aa56894..f1fe655 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -245,6 +245,7 @@ struct kvm_vcpu {
int fpu_active;
int guest_fpu_loaded, guest_xcr0_loaded;
wait_queue_head_t wq;
+   /* the pid owning this vcpu - target for vcpu yields */
struct pid *pid;
int sigset_active;
sigset_t sigset;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 184f52e..4ba7810 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -124,15 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)
 
if (mutex_lock_killable(&vcpu->mutex))
return -EINTR;
-   if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) {
-   /* The thread running this VCPU changed. */
-   struct pid *oldpid = vcpu->pid;
-   struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
-   rcu_assign_pointer(vcpu->pid, newpid);
-   if (oldpid)
-   synchronize_rcu();
-   put_pid(oldpid);
-   }
cpu = get_cpu();
preempt_notifier_register(&vcpu->preempt_notifier);
kvm_arch_vcpu_load(vcpu, cpu);
@@ -220,7 +211,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, 
unsigned id)
vcpu->cpu = -1;
vcpu->kvm = kvm;
vcpu->vcpu_id = id;
-   vcpu->pid = NULL;
+   vcpu->pid = get_task_pid(current, PIDTYPE_PID);
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);
 
@@ -1771,15 +1762,10 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
 
 int kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
-   struct pid *pid;
struct task_struct *task = NULL;
int ret = 0;
 
-   rcu_read_lock();
-   pid = rcu_dereference(target->pid);
-   if (pid)
-   task = get_pid_task(pid, PIDTYPE_PID);
-   rcu_read_unlock();
+   task = get_pid_task(target->pid, PIDTYPE_PID);
if (!task)
return ret;
ret = yield_to(task, 1);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/2] KVM: don't check for PF_VCPU when yielding

2014-11-25 Thread David Hildenbrand
As some architectures (e.g. s390) can't disable preemption while
entering/leaving the guest, they won't receive the yield in all situations.

kvm_enter_guest() has to be called with preemption_disabled and will set
PF_VCPU. After that point e.g. s390 reenables preemption and starts to execute 
the
guest. The thread might therefore be scheduled out between kvm_enter_guest() and
kvm_exit_guest(), resulting in PF_VCPU being set but not being run.

Please note that preemption has to stay enabled in order to correctly process
page faults on s390.

Current code takes PF_VCPU as a hint that the VCPU thread is running and
therefore needs no yield. yield_to() checks whether the target thread is 
running,
so let's use the inbuilt functionality to make it independent of PF_VCPU and
preemption.

Signed-off-by: David Hildenbrand 
---
 virt/kvm/kvm_main.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b45330..184f52e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1782,10 +1782,6 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
rcu_read_unlock();
if (!task)
return ret;
-   if (task->flags & PF_VCPU) {
-   put_task_struct(task);
-   return ret;
-   }
ret = yield_to(task, 1);
put_task_struct(task);
 
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] blk-mq: Avoid race condition with uninitialized requests

2014-09-18 Thread David Hildenbrand
This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249.

Test is still pending.

David Hildenbrand (1):
  blk-mq: Avoid race condition with uninitialized requests

 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] blk-mq: Avoid race condition with uninitialized requests

2014-09-18 Thread David Hildenbrand
This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized pointers when
racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point where a
request is freed. So we will never end up with pending flush request indicators
that might trigger dereferences of invalid pointers in blk_mq_timeout_check().

Cc: sta...@vger.kernel.org
Signed-off-by: David Hildenbrand 
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 383ea0c..eed6340 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -203,7 +203,6 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int 
rw)
if (tag != BLK_MQ_TAG_FAIL) {
rq = data->hctx->tags->rqs[tag];
 
-   rq->cmd_flags = 0;
if (blk_mq_tag_busy(data->hctx)) {
rq->cmd_flags = REQ_MQ_INFLIGHT;
atomic_inc(&data->hctx->nr_active);
@@ -258,6 +257,7 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
*hctx,
 
if (rq->cmd_flags & REQ_MQ_INFLIGHT)
atomic_dec(&hctx->nr_active);
+   rq->cmd_flags = 0;
 
clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
blk_mq_put_tag(hctx, tag, &ctx->last_tag);
@@ -1404,6 +1404,8 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct 
blk_mq_tag_set *set,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
tags->rqs[i] = p;
+   tags->rqs[i]->atomic_flags = 0;
+   tags->rqs[i]->cmd_flags = 0;
if (set->ops->init_request) {
if (set->ops->init_request(set->driver_data,
tags->rqs[i], hctx_idx, i,
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
> On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe  wrote:
> >
> > Another way would be to ensure that the timeout handler doesn't touch hw_ctx
> > or tag_sets that aren't fully initialized yet. But I think this is
> > safer/cleaner.
> 
> That may not be easy or enough to check if hw_ctx/tag_sets are
> fully initialized if you mean all requests have been used one time.
> 
> On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
> > I was playing with a simple patch that just sets cmd_flags and action_flags 
> > to
> 
> What is action_flags?

atomic_flags, sorry :)

Otherwise e.g. REQ_ATOM_STARTED could already be set due to the randomness. I
am not sure if this is really necessary, or if it is completely shielded by the
tag-handling code, but seemed to be clean for me to do it (and I remember it
not being set within blk_mq_rq_ctx_init).

> 
> > 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the 
> > calling
> > method to do the wrong thing.
> 
> Yes, clearing rq->cmd_flags should be enough.
> 
> And looks better to move rq initialization to __blk_mq_free_request()
> too, otherwise timeout still may see old cmd_flags and rq->q before
> rq's new initialization.

Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
think we can remove the initialization from  __blk_mq_alloc_request().

David

> 
> 
> Thanks,

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand  wrote:
> 
> > > >>> Does anyone have an idea?
> > > >>> The request itself is completely filled with cc
> > > >>
> > > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > > >> valid, and rq->q shouldn't have been changed even though it was
> > > >> double free or double allocation.
> > > >>
> > > >>> I am currently asking myself if blk_mq_map_request should protect 
> > > >>> against softirq here but I cant say for sure,as I have never looked 
> > > >>> into that code before.
> > > >>
> > > >> No, it needn't the protection.
> > > >>
> > > >> Thanks,
> > > >>
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > 
> > Digging through the code, I think I found a possible cause:
> > 
> > tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> > blk-mq.c:blk_mq_init_rq_map()).
> 
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request. 
> 
> --
> From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei 
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
> 
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags 
> *tags, unsigned int tag)
>  {
>   struct request *rq = tags->rqs[tag];
>  
> + /* uninitialized request */
> + if (!rq->q || rq->tag == -1)
> + return rq;
> +
>   if (!is_flush_request(rq, tag))
>   return rq;
>  
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct 
> blk_mq_tag_set *set,
>   left -= to_do * rq_size;
>   for (j = 0; j < to_do; j++) {
>   tags->rqs[i] = p;
> +
> + /* Avoiding early access from timeout handler */
> + tags->rqs[i]->tag = -1;
> + tags->rqs[i]->q = NULL;
> + tags->rqs[i]->cmd_flags = 0;

I was playing with a simple patch that just sets cmd_flags and action_flags to
0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
method to do the wrong thing.

Will see the result tomorrow after testing.

> +
>   if (set->ops->init_request) {
>   if (set->ops->init_request(set->driver_data,
>   tags->rqs[i], hctx_idx, i,

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread David Hildenbrand
> >>> Does anyone have an idea?
> >>> The request itself is completely filled with cc
> >>
> >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> >> valid, and rq->q shouldn't have been changed even though it was
> >> double free or double allocation.
> >>
> >>> I am currently asking myself if blk_mq_map_request should protect against 
> >>> softirq here but I cant say for sure,as I have never looked into that 
> >>> code before.
> >>
> >> No, it needn't the protection.
> >>
> >> Thanks,
> >>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

Digging through the code, I think I found a possible cause:

tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).

When a request is created:

1. __blk_mq_alloc_request() gets a free tag (thus e.g. removing it from
bitmap_tags)

2. __blk_mq_alloc_request() initializes is via blk_mq_rq_ctx_init(). The struct
is filled with life and rq->q is set.


When blk_mq_hw_ctx_check_timeout() is called:

1. blk_mq_tag_busy_iter() is used to call blk_mq_timeout_check() on all busy
tags.

2. This is done by collecting all free tags using bt_for_each_free() and
handing them to blk_mq_timeout_check(). This uses bitmap_tags.

3. blk_mq_timeout_check() calls  blk_mq_tag_to_rq() to get the rq.


Could we have a race between

- getting the tag (turning it busy) and initializing it and
- detecting a tag to be busy and trying to access it?


I haven't looked at the details yet. If so, we might either do some locking
(if there is existing infrastructure), or somehow mark a request as not being
initialized prior to accessing the data.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

2014-08-19 Thread David Hildenbrand
> >> Hmm. We want to not only reduce, we want them be zero.
> >> In addition to a reworked MP_STATE patch set, we might be able to change 
> >> the code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
> >> If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on 
> >> CPU creation, because we know that all kernel version will do an implicit 
> >> cpu reset on cpu creation anyway. Can you have a try on this as well when 
> >> reworking that code? We could then fix this rcu performance penalty 
> >> independent from getting rid of that ioctl.
> >>
> >> Christian
> >>
> > 
> > Already working on it, only one ioctl left on vcpu creation that is called
> > from wrong context, trying to hide from me. Restarts and resets are already
> 
> Maybe its the synchronize when the oldpid is 0? Can you check the patch that 
> I just sent?

Already got that in my code. Seems to be an architecture specific one called
from wrong context. (actually it is the third one being called
after SET_MP_STATE and SET_SIGNAL_MASK).

A few more minutes and I should have it :)

David

> 
> > blasting fast.
> > 
> > David
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

2014-08-19 Thread David Hildenbrand
> On 19/08/14 14:14, David Hildenbrand wrote:
> >> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
> >>> Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls 
> >>> in
> >>> the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any 
> >>> reasons
> >>> why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
> >>> cpu_synchronize_all_post_reset())?
> >>
> >> No reason, feel free to post a patch for QEMU kvm-all.c.
> >> Documentation/virtual/kvm/api.txt clearly says:
> >>
> >>Only run vcpu ioctls from the same thread that was used to create the
> >>vcpu.
> >>
> >> Paolo
> >>
> > 
> > Thanks! A little more tweaking in the other parts of s390x resets
> > and we should be able to reduce the number of "wrong" ioctls (I think I 
> > found
> > most cases that are responsible for the performance degradation).
> 
> Hmm. We want to not only reduce, we want them be zero.
> In addition to a reworked MP_STATE patch set, we might be able to change the 
> code to call "KVM_S390_INITIAL_RESET" only from the cpu thread itself. 
> If that simplifies things, we could avoid doing KVM_S390_INITIAL_RESET on CPU 
> creation, because we know that all kernel version will do an implicit cpu 
> reset on cpu creation anyway. Can you have a try on this as well when 
> reworking that code? We could then fix this rcu performance penalty 
> independent from getting rid of that ioctl.
> 
> Christian
> 

Already working on it, only one ioctl left on vcpu creation that is called
from wrong context, trying to hide from me. Restarts and resets are already
blasting fast.

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

2014-08-19 Thread David Hildenbrand
> Il 19/08/2014 13:28, David Hildenbrand ha scritto:
> > Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
> > the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any 
> > reasons
> > why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
> > cpu_synchronize_all_post_reset())?
> 
> No reason, feel free to post a patch for QEMU kvm-all.c.
> Documentation/virtual/kvm/api.txt clearly says:
> 
>Only run vcpu ioctls from the same thread that was used to create the
>vcpu.
> 
> Paolo
> 

Thanks! A little more tweaking in the other parts of s390x resets
and we should be able to reduce the number of "wrong" ioctls (I think I found
most cases that are responsible for the performance degradation).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

2014-08-19 Thread David Hildenbrand
> Il 19/08/2014 10:38, Christian Borntraeger ha scritto:
> >> > The patch may be okay, but I'm worried that it might be hiding a bug in
> >> > QEMU.
> > On s390 we call "KVM_S390_INITIAL_RESET" from several reset functions, e.g. 
> > during 
> > CPU creation. This is the first hickup and the pid now points to the main 
> > thread.
> 
> Any reason to have a special ioctl instead of SET_REGS/SET_ONE_REG/...
> (via kvm_cpu_synchronize_state, which does the ioctls in the VCPU thread)?
> 
> Paolo

Looking at the code, kvm_cpu_synchronize_state() seems to do these ioctls in
the vcpu thread (e.g. comming from cpu_synchronize_all_states()), any reasons
why kvm_cpu_synchronize_post_reset() doesn't do the same (e.g. called from
cpu_synchronize_all_post_reset())?

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-31 Thread David Hildenbrand
> > We have
> > - wait (wait bit in PSW)
> > - disabled wait (wait bit and interrupt fencing in PSW)
> > - STOPPED (not related to PSW, state change usually handled via service 
> > processor or hypervisor)
> >
> > I think we have to differentiate between KVM/TCG. On KVM we always do in 
> > kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG 
> > has to take care of the normal wait as well.
> >
> >  From a first glimpse, a disabled wait and STOPPED look similar, but there 
> > are (important) differences, e.g. other CPUs get a different a different 
> > result from a SIGP SENSE. This makes a big difference, e.g. for Linux 
> > guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU 
> > is down on hotplug (and shutdown, kexec..) So I think we agree, that 
> > handling the cpu states natively makes sense.
> >
> > The question is now only how to model it correctly without breaking TCG/KVM 
> > and reuse as much common code as possible. Correct?
> >
> > Do I understand you correctly, that your collapsing of stopped and halted 
> > is only in the qemu coding sense, IOW maybe we could just modify 
> > kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
> > implementation does not support SMP anyway?
> 
> That works for me, yes.
> 
> 
> Alex
> 

I had a look at it yesterday and it seems like we can totally drop this patch:

1. TCG doesn't support multiple CPUs and the TCG SIGP implementation isn't
ready for proper STOP/START/SENSE. Testing for STOPPED cpus in cpu_has_work()
can be dropped. To be able to support TCG was the main reason for this patch -
as we don't want to do so for now, we can leave it as is. We can still decide
to support the cpu states later using a mechanism suggest by Alex
(interrupt_requests).

Even if cpu_has_work() would make cpu.c:cpu_thread_is_idle() return false,
kvm_arch_process_async_events() called by kvm-all.c:kvm_cpu_exec() would make
it go back to sleep. Therefore a stopped VCPU will never be able to run in the
KVM case (because it always has cs->halted = true).

2. The unhalt in kvm_arch_process_async_events is for a special case where a
VCPU is in disabled wait and receives e.g. a machine-check interrupt. These
might happen in the future, for now we will never see them (the only
way to get a vcpu out of disabled wait are SIGP RESTART/CPU RESET - so we
don't break anything at that point).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-29 Thread David Hildenbrand
> Il 28/07/2014 17:03, David Hildenbrand ha scritto:
> > Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
> > like things (SIGP START) AND a special interrupt (SIGP RESTART - which is 
> > like
> > a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two
> > paths that can lead to a state change.  All interrupt bits may be in any
> > combination (SIGP RESTART interrupts can't be masked out, nor can SIGP 
> > START be
> > denied).
> > 
> > The other thing may be that on s390, each vcpu (including itself) can put
> > another vcpu into the STOPPED state - I assume that this is different for 
> > x86 "
> > INIT_RECEIVED". For this reason we have to watch out for bad race conditions
> > (e.g. multiple vcpus working on another vcpu)...
> 
> You can do that in x86 by sending an INIT inter-processor interrupt.  A
> SIPI is ignored if the CPU is not in INIT_RECEIVED state.
> 
> Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
> implementation.
> 
> - an INIT cancels a previous SIPI;
> 
> - if both INIT and SIPI are sent, on real hardware you need to have a
> few hundred microseconds between them, but KVM will reliably process
> INIT before SIPI.
> 
> See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
> the races that can happen.
> 
> Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
> we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
> 

Thanks for the explanation Paolo!

Looks like from an interrupt point of view, the states have a lot in common.
The major thing that differs on s390 is probably the way these interrupts are
generated and what else they influence (all the power of the SIGP facility :)
+ special check-stop state that can't be left by an interrupt, only by SIGP
  CPU resets).

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-28 Thread David Hildenbrand
> > 
> > On 28.07.2014, at 16:16, David Hildenbrand  wrote:
> > 
> > >> 
> > >> On 10.07.14 15:10, Christian Borntraeger wrote:
> > >>> From: David Hildenbrand 
> > >>> 
> > >>> If a cpu is stopped, it must never be allowed to run and no interrupt 
> > >>> may wake it
> > >>> up. A cpu also has to be unhalted if it is halted and has work to do - 
> > >>> this
> > >>> scenario wasn't hit in kvm case yet, as only "disabled wait" is 
> > >>> processed within
> > >>> QEMU.
> > >>> 
> > >>> Signed-off-by: David Hildenbrand 
> > >>> Reviewed-by: Cornelia Huck 
> > >>> Reviewed-by: Christian Borntraeger 
> > >>> Signed-off-by: Christian Borntraeger 
> > >> 
> > >> This looks like it's something that generic infrastructure should take 
> > >> care of, no? How does this work for the other archs? They always get an 
> > >> interrupt on the transition between !has_work -> has_work. Why don't we 
> > >> get one for s390x?
> > >> 
> > >> 
> > >> Alex
> > >> 
> > >> 
> > > 
> > > Well, we have the special case on s390 as a CPU that is in the STOPPED or 
> > > the
> > > CHECK STOP state may never run - even if there is an interrupt. It's
> > > basically like this CPU has been switched off.
> > > 
> > > Imagine that it is tried to inject an interrupt into a stopped vcpu. It
> > > will kick the stopped vcpu and thus lead to a call to
> > > "kvm_arch_process_async_events()". We have to deny that this vcpu will 
> > > ever
> > > run as long as it is stopped. It's like a way to "suppress" the
> > > interrupt for such a transition you mentioned.
> > 
> > An interrupt kick usually just means we go back into the main loop. From 
> > there we check the interrupt bitmap which interrupt to handle. Check out 
> > the handling code here:
> > 
> >   
> > http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580
> > 
> > If you just check for the stopped state in here, do_interrupt() will never 
> > get called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
> > mistaken :).
> 
> So you would rather move the check out of has_work() into the main loop in
> cpu-exec.c and directly into kvm_arch_process_async_events()?
> 
> This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on 
> any
> CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to 
> run. Is okay?
> 
> Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although 
> we
> are idle (because we are idle when we are stopped)?
> 
> My qemu kvm knowledge is way better than the qemu emulation knowledge, so I
> appreciate any insights :)
> 
> > 
> > > 
> > > Later, another vcpu might decide to turn that vcpu back on (by e.g. 
> > > sending a
> > > SIGP START to that vcpu).
> > 
> > Yes, in that case that other CPU generates a signal (a different bit in 
> > interrupt_request) and the first CPU would see that it has to wake up and 
> > wake up.
> > 
> > > I am not sure if such a mechanism/scenario is applicable to any other 
> > > arch. They
> > > all seem to reset the cs->halted flag if they know they are able to run 
> > > (e.g.
> > > due to an interrupt) - they have no such thing as "stopped cpus", only
> > > "halted/waiting cpus".
> > 
> > There's not really much difference between the two. The only difference 
> > from a software point of view is that a "stopped" CPU has its external 
> > interrupt bits masked off, no?
> 
> Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
> like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
> a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two
> paths that can lead to a state change. All interrupt bits may be in any
> combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START 
> be
> denied).
> 
> The other thing may be that on s390, each vcpu (including itself) can put
> another vcpu into the STOPPED state - I assume that this is different for x86 
> "
> INIT_RECEIVED". For this reason we have to watch out for bad race conditions
> (e.g. multiple vcpus working on another vcpu)...

Ah, sorry, just to clearify, a vcpu always sets itself to STOPPED, its the other
vcpus that trigger it (= interrupt-like).

David

> 
> David
> 
> > 
> > 
> > Alex
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-28 Thread David Hildenbrand
> 
> On 28.07.2014, at 16:16, David Hildenbrand  wrote:
> 
> >> 
> >> On 10.07.14 15:10, Christian Borntraeger wrote:
> >>> From: David Hildenbrand 
> >>> 
> >>> If a cpu is stopped, it must never be allowed to run and no interrupt may 
> >>> wake it
> >>> up. A cpu also has to be unhalted if it is halted and has work to do - 
> >>> this
> >>> scenario wasn't hit in kvm case yet, as only "disabled wait" is processed 
> >>> within
> >>> QEMU.
> >>> 
> >>> Signed-off-by: David Hildenbrand 
> >>> Reviewed-by: Cornelia Huck 
> >>> Reviewed-by: Christian Borntraeger 
> >>> Signed-off-by: Christian Borntraeger 
> >> 
> >> This looks like it's something that generic infrastructure should take 
> >> care of, no? How does this work for the other archs? They always get an 
> >> interrupt on the transition between !has_work -> has_work. Why don't we 
> >> get one for s390x?
> >> 
> >> 
> >> Alex
> >> 
> >> 
> > 
> > Well, we have the special case on s390 as a CPU that is in the STOPPED or 
> > the
> > CHECK STOP state may never run - even if there is an interrupt. It's
> > basically like this CPU has been switched off.
> > 
> > Imagine that it is tried to inject an interrupt into a stopped vcpu. It
> > will kick the stopped vcpu and thus lead to a call to
> > "kvm_arch_process_async_events()". We have to deny that this vcpu will ever
> > run as long as it is stopped. It's like a way to "suppress" the
> > interrupt for such a transition you mentioned.
> 
> An interrupt kick usually just means we go back into the main loop. From 
> there we check the interrupt bitmap which interrupt to handle. Check out the 
> handling code here:
> 
>   
> http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580
> 
> If you just check for the stopped state in here, do_interrupt() will never 
> get called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
> mistaken :).

So you would rather move the check out of has_work() into the main loop in
cpu-exec.c and directly into kvm_arch_process_async_events()?

This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any
CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to 
run. Is okay?

Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we
are idle (because we are idle when we are stopped)?

My qemu kvm knowledge is way better than the qemu emulation knowledge, so I
appreciate any insights :)

> 
> > 
> > Later, another vcpu might decide to turn that vcpu back on (by e.g. sending 
> > a
> > SIGP START to that vcpu).
> 
> Yes, in that case that other CPU generates a signal (a different bit in 
> interrupt_request) and the first CPU would see that it has to wake up and 
> wake up.
> 
> > I am not sure if such a mechanism/scenario is applicable to any other arch. 
> > They
> > all seem to reset the cs->halted flag if they know they are able to run 
> > (e.g.
> > due to an interrupt) - they have no such thing as "stopped cpus", only
> > "halted/waiting cpus".
> 
> There's not really much difference between the two. The only difference from 
> a software point of view is that a "stopped" CPU has its external interrupt 
> bits masked off, no?

Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
a "SIPI"++ as it performs a psw exchange - "NMI"). So we basically have two
paths that can lead to a state change. All interrupt bits may be in any
combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be
denied).

The other thing may be that on s390, each vcpu (including itself) can put
another vcpu into the STOPPED state - I assume that this is different for x86 "
INIT_RECEIVED". For this reason we have to watch out for bad race conditions
(e.g. multiple vcpus working on another vcpu)...

David

> 
> 
> Alex
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking "has_work"

2014-07-28 Thread David Hildenbrand
> 
> On 10.07.14 15:10, Christian Borntraeger wrote:
> > From: David Hildenbrand 
> >
> > If a cpu is stopped, it must never be allowed to run and no interrupt may 
> > wake it
> > up. A cpu also has to be unhalted if it is halted and has work to do - this
> > scenario wasn't hit in kvm case yet, as only "disabled wait" is processed 
> > within
> > QEMU.
> >
> > Signed-off-by: David Hildenbrand 
> > Reviewed-by: Cornelia Huck 
> > Reviewed-by: Christian Borntraeger 
> > Signed-off-by: Christian Borntraeger 
> 
> This looks like it's something that generic infrastructure should take 
> care of, no? How does this work for the other archs? They always get an 
> interrupt on the transition between !has_work -> has_work. Why don't we 
> get one for s390x?
> 
> 
> Alex
> 
> 

Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
"kvm_arch_process_async_events()". We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to "suppress" the
interrupt for such a transition you mentioned.

Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs->halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as "stopped cpus", only
"halted/waiting cpus".

David

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm

2014-07-10 Thread David Hildenbrand
> > This is the qemu part of kernel series "Let user space control the
> > cpu states"
> > 
> > Christian Borntraeger (1):
> >   update linux headers with with cpustate changes
> > 
> > David Hildenbrand (4):
> >   s390x/kvm: introduce proper states for s390 cpus
> >   s390x/kvm: proper use of the cpu states OPERATING and STOPPED
> >   s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
> >   s390x/kvm: propagate s390 cpu state to kvm
> > 
> >  hw/s390x/ipl.c|   2 +-
> >  hw/s390x/s390-virtio.c|  32 --
> >  linux-headers/linux/kvm.h |   7 ++-
> >  target-s390x/cpu.c| 106 
> > +++---
> >  target-s390x/cpu.h|  33 +--
> >  target-s390x/helper.c |  11 ++---
> >  target-s390x/kvm.c|  49 ++---
> >  trace-events  |   6 +++
> >  8 files changed, 179 insertions(+), 67 deletions(-)
> > 
> 
> Looks good to me
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

@all thought it was the final internal review :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm

2014-07-10 Thread David Hildenbrand
> This is the qemu part of kernel series "Let user space control the
> cpu states"
> 
> Christian Borntraeger (1):
>   update linux headers with with cpustate changes
> 
> David Hildenbrand (4):
>   s390x/kvm: introduce proper states for s390 cpus
>   s390x/kvm: proper use of the cpu states OPERATING and STOPPED
>   s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
>   s390x/kvm: propagate s390 cpu state to kvm
> 
>  hw/s390x/ipl.c|   2 +-
>  hw/s390x/s390-virtio.c|  32 --
>  linux-headers/linux/kvm.h |   7 ++-
>  target-s390x/cpu.c| 106 
> +++---
>  target-s390x/cpu.h|  33 +--
>  target-s390x/helper.c |  11 ++---
>  target-s390x/kvm.c|  49 ++---
>  trace-events  |   6 +++
>  8 files changed, 179 insertions(+), 67 deletions(-)
> 

Looks good to me

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm

2014-07-10 Thread David Hildenbrand
> This is the qemu part of kernel series "Let user space control the
> cpu states"
> 
> Christian Borntraeger (1):
>   update linux headers with with cpustate changes
> 
> David Hildenbrand (4):
>   s390x/kvm: introduce proper states for s390 cpus
>   s390x/kvm: proper use of the cpu states OPERATING and STOPPED
>   s390x/kvm: test whether a cpu is STOPPED when checking "has_work"
>   s390x/kvm: propagate s390 cpu state to kvm
> 
>  hw/s390x/ipl.c|   2 +-
>  hw/s390x/s390-virtio.c|  32 --
>  linux-headers/linux/kvm.h |   7 ++-
>  target-s390x/cpu.c| 106 
> +++---
>  target-s390x/cpu.h|  33 +--
>  target-s390x/helper.c |  11 ++---
>  target-s390x/kvm.c|  49 ++---
>  trace-events  |   6 +++
>  8 files changed, 179 insertions(+), 67 deletions(-)
> 

Looks good to me.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html