Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/07/2015 12:27 AM, Sasha Levin wrote:

On 02/06/2015 09:49 AM, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com


With this patch, my VMs lock up quickly after boot with:


Tried to reproduce the hang myself, and there seems to be still a
barrier (or logic I miss).

Looking closely below, unlock_kick got missed though we see
that SLOWPATH_FLAG is still set:

/me goes back to look closely

(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52504) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc0edb0 in ?? ()
#5  0x in ?? ()

(gdb) p *(arch_spinlock_t *)0x88023ffe8240
$1 = {{head_tail = 3441806612, tickets = {head = 52500, tail = 52517}}}
(gdb) t 2
[Switching to thread 2 (Thread 2)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52502) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x0246 in irq_stack_union ()
#5  0x00080750 in ?? ()
#6  0x0002 in ?? ()
#7  0x0004 in irq_stack_union ()
#8  0xcd16 in nmi_print_seq ()
Cannot access memory at address 0xbfc0
(gdb) t 3
[Switching to thread 3 (Thread 3)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52512) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc8edb0 in ?? ()
#5  0x in ?? ()

[...] //other threads with similar output

(gdb) t 8
[Switching to thread 8 (Thread 8)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52500) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fdcedb0 in ?? ()
#5  0x in ?? ()

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/06/2015 09:55 PM, Linus Torvalds wrote:

On Fri, Feb 6, 2015 at 6:49 AM, Raghavendra K T
raghavendra...@linux.vnet.ibm.com wrote:

Paravirt spinlock clears slowpath flag after doing unlock.

[ fix edited out ]

So I'm not going to be applying this for 3.19, because it's much too
late and the patch is too scary. Plus the bug probably effectively
never shows up in real life (it is probably easy to trigger the
speculative *read* but probably never the actual speculative write
after dropping the lock last).



Understood and agreed.


This will need a lot of testing by the paravirt people - both
performance and correctness. So *maybe* for 3.20, but maybe for even
later, and then marked for stable, of course.

Are there any good paravirt stress-tests that people could run for
extended times?



I have been running several benchmarks (kern, sys, hack, ebizzy etc in
in 1x,2x scenarios. I run them for performance test as well.
(In the current patch I did not get kvm hang in normal run, But
overcommit reproduced it).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Oleg Nesterov
On 02/06, Sasha Levin wrote:

 Can we modify it slightly to avoid potentially accessing invalid memory:
 
 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 5315887..cd22d73 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -144,13 +144,13 @@ static __always_inline void 
 arch_spin_unlock(arch_spinlock_t *lock
 if (TICKET_SLOWPATH_FLAG 
 static_key_false(paravirt_ticketlocks_enabled)) {
 __ticket_t prev_head;
 -
 +   bool needs_kick = lock-tickets.tail  TICKET_SLOWPATH_FLAG;
 prev_head = lock-tickets.head;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
 /* add_smp() is a full mb() */
 
 -   if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) {
 +   if (unlikely(needs_kick)) {

This doesn't look right too...

We need to guarantee that either unlock() sees TICKET_SLOWPATH_FLAG, or
the calller of __ticket_enter_slowpath() sees the result of add_smp().

Suppose that kvm_lock_spinning() is called right before add_smp() and it
sets SLOWPATH. It will block then because .head != want, and it needs
__ticket_unlock_kick().

Oleg.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Jeremy Fitzhardinge
On 02/06/2015 06:49 AM, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.

Yeah, that's an embarrasingly obvious bug in retrospect.

 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.

Yep, that seems like a sound approach.

 However it brings additional case to be handled, viz., slowpath still
 could be set when somebody does arch_trylock. Handle that too by ignoring
 slowpath flag during lock availability check.

 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/spinlock.h | 70 
 -
  1 file changed, 34 insertions(+), 36 deletions(-)

 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 625660f..0829f86 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
 *lock)
   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 + arch_spinlock_t old, new;
 + __ticket_t diff;
 +
 + old.tickets = READ_ONCE(lock-tickets);

Couldn't the caller pass in the lock state that it read rather than
re-reading it?

 + diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
 +
 + /* try to clear slowpath flag when there are no contenders */
 + if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
 + (diff == TICKET_LOCK_INC)) {
 + new = old;
 + new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
 + cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
 + }
 +}
 +
  #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
  static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
   __ticket_t ticket)
 @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
 *lock,
  {
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 +}
 +
  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
  
  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
 *lock)
   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
  
   inc = xadd(lock-tickets, inc);
 - if (likely(inc.head == inc.tail))
 + if (likely(inc.head == (inc.tail  ~TICKET_SLOWPATH_FLAG)))

The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.

   goto out;
  
   inc.tail = ~TICKET_SLOWPATH_FLAG;
 @@ -98,7 +119,10 @@ static __always_inline void 
 arch_spin_lock(arch_spinlock_t *lock)
   } while (--count);
   __ticket_lock_spinning(lock, inc.tail);
   }
 -out: barrier();  /* make sure nothing creeps before the lock is taken */
 +out:
 + __ticket_check_and_clear_slowpath(lock);
 +
 + barrier();  /* make sure nothing creeps before the lock is taken */

Which means that if goto out path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.

  }
  
  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 @@ -115,47 +139,21 @@ static __always_inline int 
 arch_spin_trylock(arch_spinlock_t *lock)
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;
  }
  
 -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
 - arch_spinlock_t old)
 -{
 - arch_spinlock_t new;
 -
 - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 -
 - /* Perform the unlock on the before copy */
 - old.tickets.head += TICKET_LOCK_INC;

NB (see below)

 -
 - /* Clear the slowpath flag */
 - new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
 -
 - /*
 -  * If the lock is uncontended, clear the flag - use cmpxchg in
 -  * case it changes behind our back 

Re: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled.

2015-02-08 Thread Rusty Russell
Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp writes:

From 11fd997d724f520ca628615e7ffbfd7901c40b62 Mon Sep 17 00:00:00 2001
 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 Date: Fri, 6 Feb 2015 13:28:38 +0900
 Subject: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled.

 The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is
 called if vq-num_added == (1  16) - 1 before calling END_USE().
 The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE()
 upon entry, and will call panic() if DEBUG is enabled.
 Move this virtqueue_kick() call to after END_USE() call.

Thanks, applied.

Cheers,
Rusty.


 Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 ---
  drivers/virtio/virtio_ring.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 00ec6b3..596735b 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -245,14 +245,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
   vq-vring.avail-idx = cpu_to_virtio16(_vq-vdev, 
 virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) + 1);
   vq-num_added++;
  
 + pr_debug(Added buffer head %i to %p\n, head, vq);
 + END_USE(vq);
 +
   /* This is very unlikely, but theoretically possible.  Kick
* just in case. */
   if (unlikely(vq-num_added == (1  16) - 1))
   virtqueue_kick(_vq);
  
 - pr_debug(Added buffer head %i to %p\n, head, vq);
 - END_USE(vq);
 -
   return 0;
  }
  
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization