[tip: locking/urgent] locking/qrwlock: Fix ordering in queued_write_lock_slowpath()

2021-04-17 Thread tip-bot2 for Ali Saidi
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896
Gitweb:
https://git.kernel.org/tip/84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896
Author:Ali Saidi 
AuthorDate:Thu, 15 Apr 2021 17:27:11 
Committer: Peter Zijlstra 
CommitterDate: Sat, 17 Apr 2021 13:40:50 +02:00

locking/qrwlock: Fix ordering in queued_write_lock_slowpath()

While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock.  The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads
following the lock acquisition to observe values speculatively before
the write lock is truly acquired.

We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from
under it.

  Writer| Reader
  

  ep_scan_ready_list()  |
  |- write_lock_irq()   |
  |- queued_write_lock_slowpath()   |
|- atomic_cond_read_acquire()   |
| read_lock_irqsave(>lock, flags);
 --> (observes value before unlock) |  chain_epi_lockless()
 |  |epi->next = xchg(>ovflist, 
epi);
 |  | read_unlock_irqrestore(>lock, 
flags);
 |  |
 | atomic_cmpxchg_relaxed() |
 |-- READ_ONCE(ep->ovflist);|

A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire
semantics addresses this issue at which point the atomic_cond_read can
be switched to use relaxed semantics.

Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when 
spinning in qrwlock")
Signed-off-by: Ali Saidi 
[peterz: use try_cmpxchg()]
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Steve Capper 
Acked-by: Will Deacon 
Acked-by: Waiman Long 
Tested-by: Steve Capper 
---
 kernel/locking/qrwlock.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 4786dd2..b94f383 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -60,6 +60,8 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
  */
 void queued_write_lock_slowpath(struct qrwlock *lock)
 {
+   int cnts;
+
/* Put the writer into the wait queue */
arch_spin_lock(>wait_lock);
 
@@ -73,9 +75,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
/* When no more readers or writers, set the locked flag */
do {
-   atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING);
-   } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING,
-   _QW_LOCKED) != _QW_WAITING);
+   cnts = atomic_cond_read_relaxed(>cnts, VAL == 
_QW_WAITING);
+   } while (!atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED));
 unlock:
arch_spin_unlock(>wait_lock);
 }


[PATCH v2] locking/qrwlock: Fix ordering in queued_write_lock_slowpath

2021-04-15 Thread Ali Saidi
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock.  The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads following
the lock acquisition to observe values speculatively before the write
lock is truly acquired.

We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from under it.

Writer   | Reader 2

ep_scan_ready_list() |
|- write_lock_irq()  |
|- queued_write_lock_slowpath()  |
  |- atomic_cond_read_acquire()  |
 | read_lock_irqsave(>lock, flags);
   --> (observes value before unlock)|  chain_epi_lockless()
   | |epi->next = xchg(>ovflist, epi);
   | | read_unlock_irqrestore(>lock, flags);
   | |
   | atomic_cmpxchg_relaxed()|
   |-- READ_ONCE(ep->ovflist);   |

A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics
addresses this issue at which point the atomic_cond_read can be switched
to use relaxed semantics.

Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when 
spinning in qrwlock")
Signed-off-by: Ali Saidi 
Cc: sta...@vger.kernel.org
Acked-by: Will Deacon 
Tested-by: Steve Capper 
Reviewed-by: Steve Capper 

---
 kernel/locking/qrwlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 4786dd271b45..10770f6ac4d9 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
/* When no more readers or writers, set the locked flag */
do {
-   atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING);
-   } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING,
+   atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING);
+   } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING,
_QW_LOCKED) != _QW_WAITING);
 unlock:
arch_spin_unlock(>wait_lock);
-- 
2.24.4.AMZN



Re: [PATCH] locking/qrwlock: Fix ordering in queued_write_lock_slowpath

2021-04-15 Thread Ali Saidi


On Thu, 15 Apr 2021 16:02:29 +0100, Will Deacon wrote:
> On Thu, Apr 15, 2021 at 02:25:52PM +0000, Ali Saidi wrote:
> > While this code is executed with the wait_lock held, a reader can
> > acquire the lock without holding wait_lock.  The writer side loops
> > checking the value with the atomic_cond_read_acquire(), but only truly
> > acquires the lock when the compare-and-exchange is completed
> > successfully which isn’t ordered. The other atomic operations from this
> > point are release-ordered and thus reads after the lock acquisition can
> > be completed before the lock is truly acquired which violates the
> > guarantees the lock should be making.
> 
> I think it would be worth spelling this out with an example. The issue
> appears to be a concurrent reader in interrupt context taking and releasing
> the lock in the window where the writer has returned from the
> atomic_cond_read_acquire() but has not yet performed the cmpxchg(). Loads
> can be speculated during this time, but the A-B-A of the lock word
> from _QW_WAITING to (_QW_WAITING | _QR_BIAS) and back to _QW_WAITING allows
> the atomic_cmpxchg_relaxed() to succeed. Is that right?

You're right. What we're seeing is an A-B-A problem that can allow 
atomic_cond_read_acquire() to succeed and before the cmpxchg succeeds a reader
performs an A-B-A on the lock which allows the core to observe a read that
follows the cmpxchg ahead of the cmpxchg succeeding. 

We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from under it. 

Writer   | Reader 2

ep_scan_ready_list() |
|- write_lock_irq()  |
|- queued_write_lock_slowpath()  |
  |- atomic_cond_read_acquire()  |
 | read_lock_irqsave(>lock, flags);
 | chain_epi_lockless()
 |epi->next = xchg(>ovflist, epi);
 | read_unlock_irqrestore(>lock, flags);
 |   
 atomic_cmpxchg_relaxed()|
  READ_ONCE(ep->ovflist);

> 
> With that in mind, it would probably be a good idea to eyeball the qspinlock
> slowpath as well, as that uses both atomic_cond_read_acquire() and
> atomic_try_cmpxchg_relaxed().

It seems plausible that the same thing could occur here in qspinlock:
  if ((val & _Q_TAIL_MASK) == tail) {
  if (atomic_try_cmpxchg_relaxed(>val, , 
_Q_LOCKED_VAL))
  goto release; /* No contention */
  }

> 
> > Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when 
> > spinning in qrwloc")

Ack, will fix. 

> Typo in the quoted subject ('qrwloc').
> 
> > Signed-off-by: Ali Saidi 
> > Cc: sta...@vger.kernel.org
> > ---
> >  kernel/locking/qrwlock.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index 4786dd271b45..10770f6ac4d9 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
> >  
> > /* When no more readers or writers, set the locked flag */
> > do {
> > -   atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING);
> > -   } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING,
> > +   atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING);
> > +   } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING,
> > _QW_LOCKED) != _QW_WAITING);
> 
> Patch looks good, so with an updated message:
> 
> Acked-by: Will Deacon 
> 
> Will


[PATCH] locking/qrwlock: Fix ordering in queued_write_lock_slowpath

2021-04-15 Thread Ali Saidi
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock.  The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. The other atomic operations from this
point are release-ordered and thus reads after the lock acquisition can
be completed before the lock is truly acquired which violates the
guarantees the lock should be making.

Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when 
spinning in qrwloc")
Signed-off-by: Ali Saidi 
Cc: sta...@vger.kernel.org
---
 kernel/locking/qrwlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 4786dd271b45..10770f6ac4d9 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
/* When no more readers or writers, set the locked flag */
do {
-   atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING);
-   } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING,
+   atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING);
+   } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING,
_QW_LOCKED) != _QW_WAITING);
 unlock:
arch_spin_unlock(>wait_lock);
-- 
2.24.4.AMZN



[PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq

2020-05-28 Thread Ali Saidi
If an interrupt is disabled the ITS driver has sent a discard removing
the DeviceID and EventID from the ITT. After this occurs it can't be
moved to another collection with a MOVI and a command error occurs if
attempted. Before issuing the MOVI command make sure that the IRQ isn't
disabled and change the activate code to try and use the previous
affinity.

Signed-off-by: Ali Saidi 
---
 drivers/irqchip/irq-gic-v3-its.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 124251b0ccba..1235dd9a2fb2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d, const 
struct cpumask *mask_val,
/* don't set the affinity when the target cpu is same as current one */
if (cpu != its_dev->event_map.col_map[id]) {
target_col = _dev->its->collections[cpu];
-   its_send_movi(its_dev, target_col, id);
+
+   /* If the IRQ is disabled a discard was sent so don't move */
+   if (!irqd_irq_disabled(d))
+   its_send_movi(its_dev, target_col, id);
+
its_dev->event_map.col_map[id] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
}
@@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct irq_domain 
*domain,
if (its_dev->its->numa_node >= 0)
cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 
-   /* Bind the LPI to the first possible CPU */
-   cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+   /* If the cpu set to a different CPU that is still online use it */
+   cpu = its_dev->event_map.col_map[event];
+
+   cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
+
+   if (!cpumask_test_cpu(cpu, cpu_mask)) {
+   /* Bind the LPI to the first possible CPU */
+   cpu = cpumask_first(cpu_mask);
+   }
+
if (cpu >= nr_cpu_ids) {
if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
return -EINVAL;
-- 
2.24.1.AMZN



[PATCH 1/2] arm64/mmap: handle worst-case heap randomization in mmap_base

2019-03-12 Thread Ali Saidi
Increase mmap_base by the worst-case brk randomization so that
the stack and heap remain apart.

In Linux 4.13 a change was committed that special cased the kernel ELF
loader when the loader is invoked directly (eab09532d400; binfmt_elf: use
ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked
directly and this issue is limited to cases where it is, (e.g to set a
non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In
those rare cases, the loader doesn't take into account the amount of brk
randomization that will be applied by arch_randomize_brk(). This can
lead to the stack and heap being arbitrarily close to each other.

Signed-off-by: Ali Saidi 

---
 arch/arm64/mm/mmap.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 842c8a5fcd53..0778f7ba8306 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -67,6 +67,14 @@ static unsigned long mmap_base(unsigned long rnd, struct 
rlimit *rlim_stack)
unsigned long gap = rlim_stack->rlim_cur;
unsigned long pad = (STACK_RND_MASK << PAGE_SHIFT) + stack_guard_gap;
 
+   /* Provide space for randomization when randomize_va_space == 2 and
+* ld-linux.so is called directly. Values from arch_randomize_brk()
+*/
+   if (test_thread_flag(TIF_32BIT))
+   pad += SZ_32M;
+   else
+   pad += SZ_1G;
+
/* Values close to RLIM_INFINITY can overflow. */
if (gap + pad > gap)
gap += pad;
-- 
2.15.3.AMZN



[PATCH 0/2] handle worst-case heap randomization in mmap_base

2019-03-12 Thread Ali Saidi
Increase mmap_base by the worst-case brk randomization so that the stack and
heap remain apart.

In Linux 4.13 a change was committed that special cased the kernel ELF
loader when the loader is invoked directly (eab09532d400; binfmt_elf: use
ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked
directly and this issue is limited to cases where it is, (e.g to set a
non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In
those rare cases, the loader doesn't take into account the amount of brk
randomization that will be applied by arch_randomize_brk(). This can
lead to the stack and heap being arbitrarily close to each other.

Ali Saidi (2):
  arm64/mmap: handle worst-case heap randomization in mmap_base
  x86/mmap: handle worst-case heap randomization in mmap_base

 arch/arm64/mm/mmap.c | 8 
 arch/x86/mm/mmap.c   | 4 
 2 files changed, 12 insertions(+)

-- 
2.15.3.AMZN



[PATCH 2/2] x86/mmap: handle worst-case heap randomization in mmap_base

2019-03-12 Thread Ali Saidi
Increase mmap_base by the worst-case brk randomization so that
the stack and heap remain apart.

In Linux 4.13 a change was committed that special cased the kernel ELF
loader when the loader is invoked directly (eab09532d400; binfmt_elf: use
ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked
directly and this issue is limited to cases where it is, (e.g to set a
non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In
those rare cases, the loader doesn't take into account the amount of brk
randomization that will be applied by arch_randomize_brk(). This can
lead to the stack and heap being arbitrarily close to each other.

Signed-off-by: Ali Saidi 
---
 arch/x86/mm/mmap.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index db3165714521..98a2875c37e3 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "physaddr.h"
@@ -97,6 +98,9 @@ static unsigned long mmap_base(unsigned long rnd, unsigned 
long task_size,
unsigned long pad = stack_maxrandom_size(task_size) + stack_guard_gap;
unsigned long gap_min, gap_max;
 
+   /* Provide space for brk randomization */
+   pad += SZ_32M;
+
/* Values close to RLIM_INFINITY can overflow. */
if (gap + pad > gap)
gap += pad;
-- 
2.15.3.AMZN