Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Kosuke Tatsukawa
Paolo Bonzini wrote:
> On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
>> async_pf_executekvm_vcpu_block
>> 
>> spin_lock(>async_pf.lock);
>> if (waitqueue_active(>wq))
>> /* The CPU might reorder the test for
>>the waitqueue up here, before
>>prior writes complete */
>> prepare_to_wait(>wq, ,
>>   TASK_INTERRUPTIBLE);
>> /*if (kvm_vcpu_check_block(vcpu) < 0) */
>>  /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>>   ...
>>   return (vcpu->arch.mp_state == 
>> KVM_MP_STATE_RUNNABLE &&
>> !vcpu->arch.apf.halted)
>> || 
>> !list_empty_careful(>async_pf.done)
>>  ...
> 
> The new memory barrier isn't "paired" with any other, and in
> fact I think that the same issue exists on the other side: 
> list_empty_careful(>async_pf.done) may be reordered up,
> before the prepare_to_wait:

smp_store_mb() called from set_current_state(), which is called from
prepare_to_wait() should prevent reordering such as below from
happening.  wait_event*() also calls set_current_state() inside.


> spin_lock(>async_pf.lock);
> (vcpu->arch.mp_state == 
> KVM_MP_STATE_RUNNABLE &&
> !vcpu->arch.apf.halted)
> || 
> !list_empty_careful(>async_pf.done)
> ...
> prepare_to_wait(>wq, ,
>   TASK_INTERRUPTIBLE);
> /*if (kvm_vcpu_check_block(vcpu) < 0) */
>  /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>   ...
>  return 0;
> list_add_tail(>link,
>   >async_pf.done);
> spin_unlock(>async_pf.lock);
> waited = true;
> schedule();
> if (waitqueue_active(>wq))
> 
> So you need another smp_mb() after prepare_to_wait().  I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
> 
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock().  On x86 the former could be a
> simple compiler barrier, and on s390 both of them could.  But that
> should be a separate patch.

The problem on the waiter side occurs if add_wait_queue() or
__add_wait_queue() is directly called without memory barriers nor locks
protecting it.


> Thanks,
> 
> Paolo
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Peter Zijlstra
On Fri, Oct 09, 2015 at 10:45:32AM +0200, Paolo Bonzini wrote:
> So you need another smp_mb() after prepare_to_wait().  I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
> 
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock().  On x86 the former could be a
> simple compiler barrier, and on s390 both of them could.  But that
> should be a separate patch.

Not having actually read or thought about the issue at hand, its
perfectly valid to pair an smp_mb() with either spin_lock() or
spin_unlock().

IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] arm64: Introduce helpers for page table levels

2015-10-09 Thread Suzuki K. Poulose

On 08/10/15 18:28, Catalin Marinas wrote:

On Thu, Oct 08, 2015 at 06:22:34PM +0100, Suzuki K. Poulose wrote:

On 08/10/15 15:45, Christoffer Dall wrote:

On Wed, Oct 07, 2015 at 10:26:14AM +0100, Marc Zyngier wrote:

I just had a chat with Catalin, who did shed some light on this.
It all has to do with rounding up. What you would like to have here is:

#define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, 
PAGE_SHIFT - 3)

where (va_bits - PAGE_SHIFT) is the total number of bits we deal
with during a page table walk, and (PAGE_SHIFT - 3) is the number
of bits we deal with per level.

The clue is in how DIV_ROUND_UP is written:

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

which gives you Suzuki's magic formula.

I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable.


Thanks for the explanation, I vote for DIV_ROUND_UP too.


Btw, DIV_ROUND_UP is defined in linux/kernel.h, including which in the required
headers breaks the build. I could add the definition of the same locally.


Or just keep the original magic formula and add the DIV_ROUND_UP one in
a comment.



OK, will keep proper documentation with the cryptic formula ;)

Suzuki

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation

2015-10-09 Thread Marc Zyngier
On 09/10/15 09:10, Pavel Fedin wrote:
>  Hello!
> 
>> How do you reconcile the two? What's the point of trying to shoehorn a
>> different concept into the existing API?
> 
>  Keeping amount of #define's as small as possible, and try to reuse 
> everything that can be reused.

Reusing existing code is a noble goal, but I think that having clear
abstractions and following the architecture trumps it completely.

Experience has shown that trying to be clever with userspace interfaces
comes and bites us in the rear sooner or later. Not enough bits, being
unable to represent valid architecture features, pointlessly complicated
code that is hard to maintain. Those are the things I care about today.

So a #define is completely free, additional code is still very cheap. My
brain cells are few, and I like to look at the code and get this warm
fuzzy feeling that it is doing the right thing.

Having separate interfaces for different devices is a very good way to
ensure the above.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-09 Thread Joerg Roedel
From: Joerg Roedel 

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das 
Cc: Dirk Mueller 
Tested-by: Dirk Mueller 
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   |  4 +++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2f9ed1f..4084b33 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2365,7 +2365,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (guest_cpuid_has_nrips(>vcpu))
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Paolo Bonzini


On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
> async_pf_executekvm_vcpu_block
> 
> spin_lock(>async_pf.lock);
> if (waitqueue_active(>wq))
> /* The CPU might reorder the test for
>the waitqueue up here, before
>prior writes complete */
> prepare_to_wait(>wq, ,
>   TASK_INTERRUPTIBLE);
> /*if (kvm_vcpu_check_block(vcpu) < 0) */
>  /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>   ...
>   return (vcpu->arch.mp_state == 
> KVM_MP_STATE_RUNNABLE &&
> !vcpu->arch.apf.halted)
> || 
> !list_empty_careful(>async_pf.done)
>  ...

The new memory barrier isn't "paired" with any other, and in
fact I think that the same issue exists on the other side: 
list_empty_careful(>async_pf.done) may be reordered up,
before the prepare_to_wait:

spin_lock(>async_pf.lock);
(vcpu->arch.mp_state == 
KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| 
!list_empty_careful(>async_pf.done)
...
prepare_to_wait(>wq, ,
  TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
 /*if (kvm_arch_vcpu_runnable(vcpu)) { */
  ...
 return 0;
list_add_tail(>link,
  >async_pf.done);
spin_unlock(>async_pf.lock);
waited = true;
schedule();
if (waitqueue_active(>wq))

So you need another smp_mb() after prepare_to_wait().  I'm not sure
if it's needed also for your original tty report, but I think it is
for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
without memory barrier in mei drivers").

I wonder if it makes sense to introduce smp_mb__before_spin_lock()
and smp_mb__after_spin_unlock().  On x86 the former could be a
simple compiler barrier, and on s390 both of them could.  But that
should be a separate patch.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation

2015-10-09 Thread Pavel Fedin
 Hello!

> +  KVM_DEV_ARM_VGIC_CPU_SYSREGS
> +  Attributes:
> +The attr field of kvm_device_attr encodes two values:
> +bits: | 63     32 | 31    16 | 15    0 |
> +values:   | mpidr |  RES |instr|

 One small clarification: do you really want it to be different from 
KVM_DEV_ARM_VGIC_GRP_CPU_REGS?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] KVM: x86: INIT and reset sequences are different

2015-10-09 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2015-10-01:

Hi Paolo

Sorry for the late reply. I am just back from vacation.

> 
> 
> On 13/04/2015 13:34, Nadav Amit wrote:
>> x86 architecture defines differences between the reset and INIT
>> sequences. INIT does not initialize the FPU (including MMX, XMM, YMM,
>> etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC
>> arbitration ID and BSP.
>> 
>> References (from Intel SDM):
>> 
>> "If the MP protocol has completed and a BSP is chosen, subsequent INITs
>> (either to a specific processor or system wide) do not cause the MP
>> protocol to be repeated." [8.4.2: MP Initialization Protocol
>> Requirements and Restrictions]
>> 
>> [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]
>> 
>> "If the processor is reset by asserting the INIT# pin, the x87 FPU state is 
>> not
>> changed." [9.2: X87 FPU INITIALIZATION]
>> 
>> "The state of the local APIC following an INIT reset is the same as it is 
>> after
>> a power-up or hardware reset, except that the APIC ID and arbitration ID
>> registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
>> (“Wait-for-SIPI” State)]
>> 
>> Signed-off-by: Nadav Amit 
>> 
>> ---
>> 
>> v3:
>> 
>> - Leave EFER unchanged on INIT. Instead, set cr0 correctly so vmx_set_cr0
> would
>>   recognize that paging was changed from on to off and clear LMA.
> 
> I wonder if this change from v2 to v3 was correct.
> 
> It means that a 32-bit firmware cannot enter paging mode without
> clearing EFER.LME first (which it should not know about).
> 
> Yang, can you check what real hardware does to EFER on an INIT?  Perhaps
> it only clears EFER.LME (in addition of course to EFER.LMA, which is
> cleared as a side effect of writing CR0).

Sure, I will check it with our hardware expert.

> 
> Thanks,
> 
> Paolo


Best regards,
Yang



Re: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation

2015-10-09 Thread Marc Zyngier
On 09/10/15 08:30, Pavel Fedin wrote:
>  Hello!
> 
>> +  KVM_DEV_ARM_VGIC_CPU_SYSREGS
>> +  Attributes:
>> +The attr field of kvm_device_attr encodes two values:
>> +bits: | 63     32 | 31    16 | 15    0 |
>> +values:   | mpidr |  RES |instr|
> 
>  One small clarification: do you really want it to be different from 
> KVM_DEV_ARM_VGIC_GRP_CPU_REGS?

One is memory mapped, the other one is a system register. One is
addressed by a linear index, the other one by affinity.

How do you reconcile the two? What's the point of trying to shoehorn a
different concept into the existing API?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation

2015-10-09 Thread Pavel Fedin
 Hello!

> How do you reconcile the two? What's the point of trying to shoehorn a
> different concept into the existing API?

 Keeping amount of #define's as small as possible, and try to reuse everything 
that can be reused.
 It's just my personal taste, nothing serious. :) The question was just a 
clarification. Ok, this is
not a problem. By the way, i've just finished v5 patchset, now need to rewrite 
qemu and test before
posting it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] genirq: Move irq_set_vcpu_affinity out of "#ifdef CONFIG_SMP"

2015-10-09 Thread Paolo Bonzini


On 08/10/2015 06:43, Wu, Feng wrote:
> Hi Thomas & Paolo,
> 
> What is your option about this patch, Thanks a lot!
> 
> Thanks,
> Feng
> 
>>> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
>>> index 1c58655..90b378d 100644
>>> --- a/kernel/irq/manage.c
>>> +++ b/kernel/irq/manage.c
>>> @@ -258,37 +258,6 @@ int irq_set_affinity_hint(unsigned int irq, const 
>>> struct cpumask *m)
>>>  }
>>>  EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
>>>
>>> -/**
>>> - * irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
>>> - * @irq: interrupt number to set affinity
>>> - * @vcpu_info: vCPU specific data
>>> - *
>>> - * This function uses the vCPU specific data to set the vCPU
>>> - * affinity for an irq. The vCPU specific data is passed from
>>> - * outside, such as KVM. One example code path is as below:
>>> - * KVM -> IOMMU -> irq_set_vcpu_affinity().
>>> - */
>>> -int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
>>> -{
>>> -   unsigned long flags;
>>> -   struct irq_desc *desc = irq_get_desc_lock(irq, , 0);
>>> -   struct irq_data *data;
>>> -   struct irq_chip *chip;
>>> -   int ret = -ENOSYS;
>>> -
>>> -   if (!desc)
>>> -   return -EINVAL;
>>> -
>>> -   data = irq_desc_get_irq_data(desc);
>>> -   chip = irq_data_get_irq_chip(data);
>>> -   if (chip && chip->irq_set_vcpu_affinity)
>>> -   ret = chip->irq_set_vcpu_affinity(data, vcpu_info);
>>> -   irq_put_desc_unlock(desc, flags);
>>> -
>>> -   return ret;
>>> -}
>>> -EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
>>> -
>>>  static void irq_affinity_notify(struct work_struct *work)
>>>  {
>>> struct irq_affinity_notify *notify =
>>> @@ -424,6 +393,37 @@ setup_affinity(struct irq_desc *desc, struct
>> cpumask *mask)
>>>  }
>>>  #endif
>>>
>>> +/**
>>> + * irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
>>> + * @irq: interrupt number to set affinity
>>> + * @vcpu_info: vCPU specific data
>>> + *
>>> + * This function uses the vCPU specific data to set the vCPU
>>> + * affinity for an irq. The vCPU specific data is passed from
>>> + * outside, such as KVM. One example code path is as below:
>>> + * KVM -> IOMMU -> irq_set_vcpu_affinity().
>>> + */
>>> +int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
>>> +{
>>> +   unsigned long flags;
>>> +   struct irq_desc *desc = irq_get_desc_lock(irq, , 0);
>>> +   struct irq_data *data;
>>> +   struct irq_chip *chip;
>>> +   int ret = -ENOSYS;
>>> +
>>> +   if (!desc)
>>> +   return -EINVAL;
>>> +
>>> +   data = irq_desc_get_irq_data(desc);
>>> +   chip = irq_data_get_irq_chip(data);
>>> +   if (chip && chip->irq_set_vcpu_affinity)
>>> +   ret = chip->irq_set_vcpu_affinity(data, vcpu_info);
>>> +   irq_put_desc_unlock(desc, flags);
>>> +
>>> +   return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
>>> +
>>>  void __disable_irq(struct irq_desc *desc)
>>>  {
>>> if (!desc->depth++)
>>>

Reviewed-by: Paolo Bonzini 

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Kosuke Tatsukawa
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

async_pf_executekvm_vcpu_block

spin_lock(>async_pf.lock);
if (waitqueue_active(>wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
prepare_to_wait(>wq, ,
  TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
 /*if (kvm_arch_vcpu_runnable(vcpu)) { */
  ...
  return (vcpu->arch.mp_state == 
KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| 
!list_empty_careful(>async_pf.done)
 ...
 return 0;
list_add_tail(>link,
  >async_pf.done);
spin_unlock(>async_pf.lock);
waited = true;
schedule();


The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa 
---
v2:
  - Fixed comment based on feedback from Paolo
v1:
  - https://lkml.org/lkml/2015/10/8/994
---
 virt/kvm/async_pf.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 44660ae..a0999d7 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -94,6 +94,12 @@ static void async_pf_execute(struct work_struct *work)
 
trace_kvm_async_pf_completed(addr, gva);
 
+   /*
+* Memory barrier is required here to make sure change to
+* vcpu->async_pf.done is visible from other CPUs.  This memory
+* barrier pairs with prepare_to_wait's set_current_state()
+*/
+   smp_mb();
if (waitqueue_active(>wq))
wake_up_interruptible(>wq);
 
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Peter Zijlstra
On Fri, Oct 09, 2015 at 12:21:55PM +, Kosuke Tatsukawa wrote:

> +  * Memory barrier is required here to make sure change to
> +  * vcpu->async_pf.done is visible from other CPUs.  This memory
> +  * barrier pairs with prepare_to_wait's set_current_state()

That is not how memory barriers work; they don't 'make visible'. They
simply ensure order between operations.

  X = Y = 0

CPU0CPU1

[S] X=1 [S] Y=1
 MB  MB
[L] y=Y [L] x=X

  assert(x || y)

The issue of the memory barrier does not mean the store is visible, it
merely means that the load _must_ happen after the store (in the above
scenario).

This gives a guarantee that not both x and y can be 0. Because either
being 0, means the other has not yet executed and must therefore observe
your store.

Nothing more, nothing less.

So your comment is misleading at best.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Paolo Bonzini


On 09/10/2015 10:50, Peter Zijlstra wrote:
> Not having actually read or thought about the issue at hand, its
> perfectly valid to pair an smp_mb() with either spin_lock() or
> spin_unlock().
> 
> IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.

In this case it's an smp_mb() (store-load barrier) being paired with
another smp_mb(), so spin_unlock()'s release is not enough.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-09 Thread Paolo Bonzini


On 09/10/2015 11:51, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Currently we always write the next_rip of the shadow vmcb to
> the guests vmcb when we emulate a vmexit. This could confuse
> the guest when its cpuid indicated no support for the
> next_rip feature.
> 
> Fix this by only propagating next_rip if the guest actually
> supports it.
> 
> Cc: Bandan Das 
> Cc: Dirk Mueller 
> Tested-by: Dirk Mueller 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/kvm/cpuid.h | 21 +
>  arch/x86/kvm/svm.c   |  4 +++-
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dd05b9c..effca1f 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
> *vcpu)
>   best = kvm_find_cpuid_entry(vcpu, 7, 0);
>   return best && (best->ebx & bit(X86_FEATURE_MPX));
>  }
> +
> +/*
> + * NRIPS is provided through cpuidfn 0x800a.edx bit 3
> + */
> +#define BIT_NRIPS3
> +
> +static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpuid_entry2 *best;
> +
> + best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
> +
> + /*
> +  * NRIPS is a scattered cpuid feature, so we can't use
> +  * X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
> +  * position 8, not 3).
> +  */
> + return best && (best->edx & bit(BIT_NRIPS));
> +}
> +#undef BIT_NRIPS
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 2f9ed1f..4084b33 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2365,7 +2365,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>   nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
>   nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
>   nested_vmcb->control.exit_int_info_err = 
> vmcb->control.exit_int_info_err;
> - nested_vmcb->control.next_rip  = vmcb->control.next_rip;
> +
> + if (guest_cpuid_has_nrips(>vcpu))

This could be a bit expensive to do on every vmexit.  Can you benchmark
it with kvm-unit-tests, or just cache the result in struct vcpu_svm?

Thanks,

Paolo

> + nested_vmcb->control.next_rip  = vmcb->control.next_rip;
>  
>   /*
>* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-09 Thread Paolo Bonzini


On 08/10/2015 20:23, Radim Krčmář wrote:
> v2:
>  * rewritten [1/2] and
>  * refactored [2/2], all thanks to Paolo's comments
> 
> This problem is not fixed for split userspace part as I think that it
> would be better to solve that by excluding edge interrupts from
> eoi_exit_bitmap (see the next patch in kvm-list for discussion).

Nice patches, thanks.  Applying to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Paolo Bonzini


On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
> smp_store_mb() called from set_current_state(), which is called from
> prepare_to_wait() should prevent reordering such as below from
> happening.  wait_event*() also calls set_current_state() inside.

Ah, I missed that set_current_state has a memory barrier in it.  The
patch is okay, but please expand the comment to say that this memory
barrier pairs with prepare_to_wait's set_current_state().

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-10-09 Thread Stefan Hajnoczi
On Wed, Oct 07, 2015 at 10:43:40PM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/07/2015 10:02 PM, Stefan Hajnoczi wrote:
> >On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> >>On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >>>On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >>>Have you thought about live migration?
> >>>
> >>>Are the contents of the NVDIMM migrated since they are registered as a
> >>>RAM region?
> >>
> >>Will fully test live migration and VM save before sending the V3 out. :)
> >
> >Hi,
> >What is the status of this patch series?
> 
> This is huge change in v3, the patchset is ready now and it's being tested.
> Will post it out (hopefully this week) after the long holiday in China. :)

Great, thanks!

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: don't notify userspace IOAPIC on edge EOI

2015-10-09 Thread Paolo Bonzini


On 08/10/2015 20:34, Radim Krčmář wrote:
> 2015-10-08 20:30+0200, Radim Krčmář:
>> On real hardware, edge-triggered interrupts don't set a bit in TMR,
>> which means that IOAPIC isn't notified on EOI.  Do the same here.
>>
>> Staying in guest/kernel mode after edge EOI is what we want for most
>> devices.  If some bugs could be nicely worked around with edge EOI
>> notifications, we should invest in a better interface.
>>
>> Signed-off-by: Radim Krčmář 
>> ---
>>  Completely untested.
>>  
>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>> @@ -389,13 +389,15 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
>> *eoi_exit_bitmap)
>>  for (i = 0; i < nr_ioapic_pins; ++i) {
>>  hlist_for_each_entry(entry, >map[i], link) {
>>  u32 dest_id, dest_mode;
>> +bool level;
>>  
>>  if (entry->type != KVM_IRQ_ROUTING_MSI)
>>  continue;
>>  dest_id = (entry->msi.address_lo >> 12) & 0xff;
>>  dest_mode = (entry->msi.address_lo >> 2) & 0x1;
>> -if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id,
>> -dest_mode)) {
>> +level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
>> +if (level && kvm_apic_match_dest(vcpu, NULL, 0,
> 
> Describing that result is an overkill -- I'll send v2 if the idea is
> accepted.

It's okay and it matches the other lines before.  I'm applying this too,
but I would appreciate Steve's Tested-by before moving it to kvm/next.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Kosuke Tatsukawa
Paolo Bonzini wrote:
> On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
>> smp_store_mb() called from set_current_state(), which is called from
>> prepare_to_wait() should prevent reordering such as below from
>> happening.  wait_event*() also calls set_current_state() inside.
>
> Ah, I missed that set_current_state has a memory barrier in it.  The
> patch is okay, but please expand the comment to say that this memory
> barrier pairs with prepare_to_wait's set_current_state().

thank you for the comment.  I'll send a new version of the patch with
the modification.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c

2015-10-09 Thread Paolo Bonzini


On 09/10/2015 14:55, Peter Zijlstra wrote:
> On Fri, Oct 09, 2015 at 12:21:55PM +, Kosuke Tatsukawa wrote:
> 
>> + * Memory barrier is required here to make sure change to
>> + * vcpu->async_pf.done is visible from other CPUs.  This memory
>> + * barrier pairs with prepare_to_wait's set_current_state()
> 
> That is not how memory barriers work; they don't 'make visible'. They
> simply ensure order between operations.
> 
>   X = Y = 0
> 
>   CPU0CPU1
> 
>   [S] X=1 [S] Y=1
>MB  MB
>   [L] y=Y [L] x=X
> 
>   assert(x || y)
> 
> The issue of the memory barrier does not mean the store is visible, it
> merely means that the load _must_ happen after the store (in the above
> scenario).
> 
> This gives a guarantee that not both x and y can be 0. Because either
> being 0, means the other has not yet executed and must therefore observe
> your store.
> 
> Nothing more, nothing less.
> 
> So your comment is misleading at best.

Yeah, I will just reword the comment when applying.  Thanks Kosuke-san
for your contribution!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-09 Thread Paolo Bonzini


On 01/10/2015 12:28, Paolo Bonzini wrote:
> The format of the role word has changed through the years and the
> plugin was never updated; some VMX exit reasons were missing too.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tools/lib/traceevent/plugin_kvm.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/traceevent/plugin_kvm.c 
> b/tools/lib/traceevent/plugin_kvm.c
> index 88fe83dff7cd..18536f756577 100644
> --- a/tools/lib/traceevent/plugin_kvm.c
> +++ b/tools/lib/traceevent/plugin_kvm.c
> @@ -124,7 +124,10 @@ static const char *disassemble(unsigned char *insn, int 
> len, uint64_t rip,
>   _ER(WBINVD,  54)\
>   _ER(XSETBV,  55)\
>   _ER(APIC_WRITE,  56)\
> - _ER(INVPCID, 58)
> + _ER(INVPCID, 58)\
> + _ER(PML_FULL,62)\
> + _ER(XSAVES,  63)\
> + _ER(XRSTORS, 64)
>  
>  #define SVM_EXIT_REASONS \
>   _ER(EXIT_READ_CR0,  0x000)  \
> @@ -352,15 +355,18 @@ static int kvm_nested_vmexit_handler(struct trace_seq 
> *s, struct pevent_record *
>  union kvm_mmu_page_role {
>   unsigned word;
>   struct {
> - unsigned glevels:4;
>   unsigned level:4;
> + unsigned cr4_pae:1;
>   unsigned quadrant:2;
> - unsigned pad_for_nice_hex_output:6;
>   unsigned direct:1;
>   unsigned access:3;
>   unsigned invalid:1;
> - unsigned cr4_pge:1;
>   unsigned nxe:1;
> + unsigned cr0_wp:1;
> + unsigned smep_and_not_wp:1;
> + unsigned smap_and_not_wp:1;
> + unsigned pad_for_nice_hex_output:8;
> + unsigned smm:8;
>   };
>  };
>  
> @@ -385,15 +391,18 @@ static int kvm_mmu_print_role(struct trace_seq *s, 
> struct pevent_record *record,
>   if (pevent_is_file_bigendian(event->pevent) ==
>   pevent_is_host_bigendian(event->pevent)) {
>  
> - trace_seq_printf(s, "%u/%u q%u%s %s%s %spge %snxe",
> + trace_seq_printf(s, "%u q%u%s %s%s %spae %snxe %swp%s%s%s",
>role.level,
> -  role.glevels,
>role.quadrant,
>role.direct ? " direct" : "",
>access_str[role.access],
>role.invalid ? " invalid" : "",
> -  role.cr4_pge ? "" : "!",
> -  role.nxe ? "" : "!");
> +  role.cr4_pae ? "" : "!",
> +  role.nxe ? "" : "!",
> +  role.cr0_wp ? "" : "!",
> +  role.smep_and_not_wp ? " smep" : "",
> +  role.smap_and_not_wp ? " smap" : "",
> +  role.smm ? " smm" : "");
>   } else
>   trace_seq_printf(s, "WORD: %08x", role.word);
>  
> 

Ping?  Arnaldo, ok to include this patch in my tree?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts

2015-10-09 Thread Pavel Fedin
 Hello!

> I reworked the commit message and applied this patch.

 During testing i discovered a problem with this patch and vITS series by Andre.
 The problem is that compute_pending_for_cpu() does not know anything about 
LPIs. Therefore, we can
reset this bit even if some LPIs (and only LPIs) are pending. This causes LPI 
loss.
 This is the confirmation of that clearing irq_pending_on_cpu anywhere else than
__kvm_vgic_flush_hwstate() is a bad idea. I would suggest to stick back to v1 
of the patch (without
clearing this bit). We can add a clarifying description to the commit message 
like this:

--- cut ---
In some situations level-sensitive IRQ disappears before it has been
processed. This is normal, and in this situation we lose this IRQ, the same
as real HW does. The aim of this patch is to handle this situation more
correctly. However, dist->irq_pending_on_cpu stays set until the vCPU
itself rechecks its status. Therefore, this bit does not guarantee that
something is pending at the given moment, it should be treated as attention
flag, saying that something has happened on this vCPU, and it could have
been even gone since that, but wakeup and status recheck is needed.
--- cut ---

 Would you be happy with this? An alternative would be to add a check for 
pending LPIs, but wouldn't
it just be too complex for a simple problem?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG: unable to handle kernel paging request with v4.3-rc4

2015-10-09 Thread Joerg Roedel
Hi Alex,

while playing around with attaching a 32bit PCI device to a guest via
VFIO I triggered this oops:

[  192.289917] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[  192.298245] BUG: unable to handle kernel paging request at 880224582608
[  192.306195] IP: [] 0x880224582608
[  192.312302] PGD 2026067 PUD 2029067 PMD 8002244001e3 
[  192.318589] Oops: 0011 [#1] PREEMPT SMP 
[  192.323363] Modules linked in: kvm_amd kvm vfio_pci vfio_iommu_type1 
vfio_virqfd vfio bnep bluetooth rfkill iscsi_ibft iscsi_boot_sysfs af_packet 
snd_hda_codec_via snd_hda_codec_generic snd_hda_codec_hdmi raid1 snd_hda_intel 
crct10dif_pclmul crc32_pclmul snd_hda_codec crc32c_intel ghash_clmulni_intel 
snd_hwdep snd_hda_core snd_pcm snd_timer aesni_intel aes_x86_64 md_mod 
glue_helper lrw gf128mul ablk_helper be2net snd serio_raw cryptd sp5100_tco 
pcspkr xhci_pci vxlan ip6_udp_tunnel fam15h_power sky2 udp_tunnel xhci_hcd 
soundcore dm_mod k10temp i2c_piix4 shpchp wmi acpi_cpufreq asus_atk0110 button 
processor ata_generic firewire_ohci firewire_core ohci_pci crc_itu_t radeon 
i2c_algo_bit drm_kms_helper pata_jmicron syscopyarea sysfillrect sysimgblt 
fb_sys_fops ttm drm sg [last unloaded: kvm]
[  192.399986] CPU: 4 PID: 2037 Comm: qemu-system-x86 Not tainted 4.3.0-rc4+ #4
[  192.408260] Hardware name: System manufacturer System Product Name/Crosshair 
IV Formula, BIOS 302710/28/2011
[  192.419746] task: 880223e24040 ti: 8800cae5c000 task.ti: 
8800cae5c000
[  192.428506] RIP: 0010:[]  [] 
0x880224582608
[  192.437376] RSP: 0018:8800cae5fe58  EFLAGS: 00010286
[  192.443940] RAX: 8800cb3c8800 RBX: 8800cba55800 RCX: 0004
[  192.452370] RDX: 0004 RSI: 8802233e7887 RDI: 0001
[  192.460796] RBP: 8800cae5fe98 R08: 0ff8 R09: 0008
[  192.469145] R10: 0001d300 R11:  R12: 8800cba55800
[  192.477584] R13: 8802233e7880 R14: 8800cba55830 R15: 7fff43b30b50
[  192.486025] FS:  7f94375b2c00() GS:88022ed0() 
knlGS:
[  192.495445] CS:  0010 DS:  ES:  CR0: 80050033
[  192.502481] CR2: 880224582608 CR3: cb9d9000 CR4: 000406e0
[  192.510850] Stack:
[  192.514094]  a03f9733 0001 0001 
880223c74600
[  192.522876]  8800ca4f6d88 7fff43b30b50 3b6a 
7fff43b30b50
[  192.531582]  8800cae5ff08 811efc7d 8800cae5fec8 
880223c74600
[  192.540439] Call Trace:
[  192.544145]  [] ? vfio_group_fops_unl_ioctl+0x253/0x410 
[vfio]
[  192.552898]  [] do_vfs_ioctl+0x2cd/0x4c0
[  192.559713]  [] ? __fget+0x77/0xb0
[  192.565998]  [] SyS_ioctl+0x79/0x90
[  192.572373]  [] ? syscall_return_slowpath+0x50/0x130
[  192.580258]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[  192.588049] Code: 88 ff ff d8 25 58 24 02 88 ff ff e8 25 58 24 02 88 ff ff 
e8 25 58 24 02 88 ff ff 58 a2 70 21 02 88 ff ff c0 65 39 cb 00 88 ff ff <08> 2d 
58 24 02 88 ff ff 08 88 3c cb 00 88 ff ff d8 58 c1 24 02 
[  192.610309] RIP  [] 0x880224582608
[  192.616940]  RSP 
[  192.621805] CR2: 880224582608
[  192.632826] ---[ end trace ce135ef0c9b1869f ]---

I am not sure whether this is an IOMMU or VFIO bug, have you seen
something like this before?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] arm/arm64: KVM : Enable vhost device selection under KVM config menu

2015-10-09 Thread Wei Huang
vhost drivers provide guest VMs with better I/O performance and lower
CPU utilization. This patch allows users to select vhost devices under
KVM configuration menu on ARM. This makes vhost support on arm/arm64
on a par with other architectures (e.g. x86, ppc).

Signed-off-by: Wei Huang 
---
 arch/arm/kvm/Kconfig   | 2 ++
 arch/arm64/kvm/Kconfig | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 210ecca..68045e7 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -45,4 +45,6 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+source drivers/vhost/Kconfig
+
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5c7e920..38102f5 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -41,4 +41,6 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+source drivers/vhost/Kconfig
+
 endif # VIRTUALIZATION
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: unable to handle kernel paging request with v4.3-rc4

2015-10-09 Thread Alex Williamson
On Fri, 2015-10-09 at 16:58 +0200, Joerg Roedel wrote:
> Hi Alex,
> 
> while playing around with attaching a 32bit PCI device to a guest via
> VFIO I triggered this oops:
> 
> [  192.289917] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 0)
> [  192.298245] BUG: unable to handle kernel paging request at 880224582608
> [  192.306195] IP: [] 0x880224582608
> [  192.312302] PGD 2026067 PUD 2029067 PMD 8002244001e3 
> [  192.318589] Oops: 0011 [#1] PREEMPT SMP 
> [  192.323363] Modules linked in: kvm_amd kvm vfio_pci vfio_iommu_type1 
> vfio_virqfd vfio bnep bluetooth rfkill iscsi_ibft iscsi_boot_sysfs af_packet 
> snd_hda_codec_via snd_hda_codec_generic snd_hda_codec_hdmi raid1 
> snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec crc32c_intel 
> ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm snd_timer aesni_intel 
> aes_x86_64 md_mod glue_helper lrw gf128mul ablk_helper be2net snd serio_raw 
> cryptd sp5100_tco pcspkr xhci_pci vxlan ip6_udp_tunnel fam15h_power sky2 
> udp_tunnel xhci_hcd soundcore dm_mod k10temp i2c_piix4 shpchp wmi 
> acpi_cpufreq asus_atk0110 button processor ata_generic firewire_ohci 
> firewire_core ohci_pci crc_itu_t radeon i2c_algo_bit drm_kms_helper 
> pata_jmicron syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sg [last 
> unloaded: kvm]
> [  192.399986] CPU: 4 PID: 2037 Comm: qemu-system-x86 Not tainted 4.3.0-rc4+ 
> #4
> [  192.408260] Hardware name: System manufacturer System Product 
> Name/Crosshair IV Formula, BIOS 302710/28/2011
> [  192.419746] task: 880223e24040 ti: 8800cae5c000 task.ti: 
> 8800cae5c000
> [  192.428506] RIP: 0010:[]  [] 
> 0x880224582608
> [  192.437376] RSP: 0018:8800cae5fe58  EFLAGS: 00010286
> [  192.443940] RAX: 8800cb3c8800 RBX: 8800cba55800 RCX: 
> 0004
> [  192.452370] RDX: 0004 RSI: 8802233e7887 RDI: 
> 0001
> [  192.460796] RBP: 8800cae5fe98 R08: 0ff8 R09: 
> 0008
> [  192.469145] R10: 0001d300 R11:  R12: 
> 8800cba55800
> [  192.477584] R13: 8802233e7880 R14: 8800cba55830 R15: 
> 7fff43b30b50
> [  192.486025] FS:  7f94375b2c00() GS:88022ed0() 
> knlGS:
> [  192.495445] CS:  0010 DS:  ES:  CR0: 80050033
> [  192.502481] CR2: 880224582608 CR3: cb9d9000 CR4: 
> 000406e0
> [  192.510850] Stack:
> [  192.514094]  a03f9733 0001 0001 
> 880223c74600
> [  192.522876]  8800ca4f6d88 7fff43b30b50 3b6a 
> 7fff43b30b50
> [  192.531582]  8800cae5ff08 811efc7d 8800cae5fec8 
> 880223c74600
> [  192.540439] Call Trace:
> [  192.544145]  [] ? vfio_group_fops_unl_ioctl+0x253/0x410 
> [vfio]
> [  192.552898]  [] do_vfs_ioctl+0x2cd/0x4c0
> [  192.559713]  [] ? __fget+0x77/0xb0
> [  192.565998]  [] SyS_ioctl+0x79/0x90
> [  192.572373]  [] ? syscall_return_slowpath+0x50/0x130
> [  192.580258]  [] entry_SYSCALL_64_fastpath+0x16/0x75
> [  192.588049] Code: 88 ff ff d8 25 58 24 02 88 ff ff e8 25 58 24 02 88 ff ff 
> e8 25 58 24 02 88 ff ff 58 a2 70 21 02 88 ff ff c0 65 39 cb 00 88 ff ff <08> 
> 2d 58 24 02 88 ff ff 08 88 3c cb 00 88 ff ff d8 58 c1 24 02 
> [  192.610309] RIP  [] 0x880224582608
> [  192.616940]  RSP 
> [  192.621805] CR2: 880224582608
> [  192.632826] ---[ end trace ce135ef0c9b1869f ]---
> 
> I am not sure whether this is an IOMMU or VFIO bug, have you seen
> something like this before?

Hey Joerg,

I have not seen this one yet.  There literally have been no changes for
vfio in 4.3, so if this is new, it may be collateral from changes
elsewhere.  32bit devices really shouldn't make any difference to vfio,
I'll see if I can reproduce it myself though.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: unable to handle kernel paging request with v4.3-rc4

2015-10-09 Thread Joerg Roedel
On Fri, Oct 09, 2015 at 09:30:40AM -0600, Alex Williamson wrote:
> I have not seen this one yet.  There literally have been no changes for
> vfio in 4.3, so if this is new, it may be collateral from changes
> elsewhere.  32bit devices really shouldn't make any difference to vfio,
> I'll see if I can reproduce it myself though.  Thanks,

Great, thanks. It looks like some sort of memory corruption to me. What
I did was trying to assign the 32bit dev together with its PCI bridge.
Probing the bridge in VFIO fails due to the first check in the probe
function and then I get the oops.

I also tried assigning other bridges, or only the bridge to the 32bit
bus, but this did not trigger the oops (but fails too, as it should).

I'll also try to debug this more.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html