[tip: locking/urgent] locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
The following commit has been merged into the locking/urgent branch of tip: Commit-ID: 84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896 Gitweb: https://git.kernel.org/tip/84a24bf8c52e66b7ac89ada5e3cfbe72d65c1896 Author:Ali Saidi AuthorDate:Thu, 15 Apr 2021 17:27:11 Committer: Peter Zijlstra CommitterDate: Sat, 17 Apr 2021 13:40:50 +02:00 locking/qrwlock: Fix ordering in queued_write_lock_slowpath() While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer| Reader ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(>lock, flags); --> (observes value before unlock) | chain_epi_lockless() | |epi->next = xchg(>ovflist, epi); | | read_unlock_irqrestore(>lock, flags); | | | atomic_cmpxchg_relaxed() | |-- READ_ONCE(ep->ovflist);| A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Steve Capper Acked-by: Will Deacon Acked-by: Waiman Long Tested-by: Steve Capper --- kernel/locking/qrwlock.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c index 4786dd2..b94f383 100644 --- a/kernel/locking/qrwlock.c +++ b/kernel/locking/qrwlock.c @@ -60,6 +60,8 @@ EXPORT_SYMBOL(queued_read_lock_slowpath); */ void queued_write_lock_slowpath(struct qrwlock *lock) { + int cnts; + /* Put the writer into the wait queue */ arch_spin_lock(>wait_lock); @@ -73,9 +75,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) /* When no more readers or writers, set the locked flag */ do { - atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING); - } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, - _QW_LOCKED) != _QW_WAITING); + cnts = atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING); + } while (!atomic_try_cmpxchg_acquire(>cnts, , _QW_LOCKED)); unlock: arch_spin_unlock(>wait_lock); }
[PATCH v2] locking/qrwlock: Fix ordering in queued_write_lock_slowpath
While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer | Reader 2 ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(>lock, flags); --> (observes value before unlock)| chain_epi_lockless() | |epi->next = xchg(>ovflist, epi); | | read_unlock_irqrestore(>lock, flags); | | | atomic_cmpxchg_relaxed()| |-- READ_ONCE(ep->ovflist); | A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi Cc: sta...@vger.kernel.org Acked-by: Will Deacon Tested-by: Steve Capper Reviewed-by: Steve Capper --- kernel/locking/qrwlock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c index 4786dd271b45..10770f6ac4d9 100644 --- a/kernel/locking/qrwlock.c +++ b/kernel/locking/qrwlock.c @@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) /* When no more readers or writers, set the locked flag */ do { - atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING); - } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, + atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING); + } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING, _QW_LOCKED) != _QW_WAITING); unlock: arch_spin_unlock(>wait_lock); -- 2.24.4.AMZN
Re: [PATCH] locking/qrwlock: Fix ordering in queued_write_lock_slowpath
On Thu, 15 Apr 2021 16:02:29 +0100, Will Deacon wrote: > On Thu, Apr 15, 2021 at 02:25:52PM +0000, Ali Saidi wrote: > > While this code is executed with the wait_lock held, a reader can > > acquire the lock without holding wait_lock. The writer side loops > > checking the value with the atomic_cond_read_acquire(), but only truly > > acquires the lock when the compare-and-exchange is completed > > successfully which isn’t ordered. The other atomic operations from this > > point are release-ordered and thus reads after the lock acquisition can > > be completed before the lock is truly acquired which violates the > > guarantees the lock should be making. > > I think it would be worth spelling this out with an example. The issue > appears to be a concurrent reader in interrupt context taking and releasing > the lock in the window where the writer has returned from the > atomic_cond_read_acquire() but has not yet performed the cmpxchg(). Loads > can be speculated during this time, but the A-B-A of the lock word > from _QW_WAITING to (_QW_WAITING | _QR_BIAS) and back to _QW_WAITING allows > the atomic_cmpxchg_relaxed() to succeed. Is that right? You're right. What we're seeing is an A-B-A problem that can allow atomic_cond_read_acquire() to succeed and before the cmpxchg succeeds a reader performs an A-B-A on the lock which allows the core to observe a read that follows the cmpxchg ahead of the cmpxchg succeeding. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer | Reader 2 ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(>lock, flags); | chain_epi_lockless() |epi->next = xchg(>ovflist, epi); | read_unlock_irqrestore(>lock, flags); | atomic_cmpxchg_relaxed()| READ_ONCE(ep->ovflist); > > With that in mind, it would probably be a good idea to eyeball the qspinlock > slowpath as well, as that uses both atomic_cond_read_acquire() and > atomic_try_cmpxchg_relaxed(). It seems plausible that the same thing could occur here in qspinlock: if ((val & _Q_TAIL_MASK) == tail) { if (atomic_try_cmpxchg_relaxed(>val, , _Q_LOCKED_VAL)) goto release; /* No contention */ } > > > Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when > > spinning in qrwloc") Ack, will fix. > Typo in the quoted subject ('qrwloc'). > > > Signed-off-by: Ali Saidi > > Cc: sta...@vger.kernel.org > > --- > > kernel/locking/qrwlock.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c > > index 4786dd271b45..10770f6ac4d9 100644 > > --- a/kernel/locking/qrwlock.c > > +++ b/kernel/locking/qrwlock.c > > @@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) > > > > /* When no more readers or writers, set the locked flag */ > > do { > > - atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING); > > - } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, > > + atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING); > > + } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING, > > _QW_LOCKED) != _QW_WAITING); > > Patch looks good, so with an updated message: > > Acked-by: Will Deacon > > Will
[PATCH] locking/qrwlock: Fix ordering in queued_write_lock_slowpath
While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. The other atomic operations from this point are release-ordered and thus reads after the lock acquisition can be completed before the lock is truly acquired which violates the guarantees the lock should be making. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwloc") Signed-off-by: Ali Saidi Cc: sta...@vger.kernel.org --- kernel/locking/qrwlock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c index 4786dd271b45..10770f6ac4d9 100644 --- a/kernel/locking/qrwlock.c +++ b/kernel/locking/qrwlock.c @@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) /* When no more readers or writers, set the locked flag */ do { - atomic_cond_read_acquire(>cnts, VAL == _QW_WAITING); - } while (atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, + atomic_cond_read_relaxed(>cnts, VAL == _QW_WAITING); + } while (atomic_cmpxchg_acquire(>cnts, _QW_WAITING, _QW_LOCKED) != _QW_WAITING); unlock: arch_spin_unlock(>wait_lock); -- 2.24.4.AMZN
[PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
If an interrupt is disabled the ITS driver has sent a discard removing the DeviceID and EventID from the ITT. After this occurs it can't be moved to another collection with a MOVI and a command error occurs if attempted. Before issuing the MOVI command make sure that the IRQ isn't disabled and change the activate code to try and use the previous affinity. Signed-off-by: Ali Saidi --- drivers/irqchip/irq-gic-v3-its.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 124251b0ccba..1235dd9a2fb2 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val, /* don't set the affinity when the target cpu is same as current one */ if (cpu != its_dev->event_map.col_map[id]) { target_col = _dev->its->collections[cpu]; - its_send_movi(its_dev, target_col, id); + + /* If the IRQ is disabled a discard was sent so don't move */ + if (!irqd_irq_disabled(d)) + its_send_movi(its_dev, target_col, id); + its_dev->event_map.col_map[id] = cpu; irq_data_update_effective_affinity(d, cpumask_of(cpu)); } @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct irq_domain *domain, if (its_dev->its->numa_node >= 0) cpu_mask = cpumask_of_node(its_dev->its->numa_node); - /* Bind the LPI to the first possible CPU */ - cpu = cpumask_first_and(cpu_mask, cpu_online_mask); + /* If the cpu set to a different CPU that is still online use it */ + cpu = its_dev->event_map.col_map[event]; + + cpumask_and(cpu_mask, cpu_mask, cpu_online_mask); + + if (!cpumask_test_cpu(cpu, cpu_mask)) { + /* Bind the LPI to the first possible CPU */ + cpu = cpumask_first(cpu_mask); + } + if (cpu >= nr_cpu_ids) { if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) return -EINVAL; -- 2.24.1.AMZN
[PATCH 1/2] arm64/mmap: handle worst-case heap randomization in mmap_base
Increase mmap_base by the worst-case brk randomization so that the stack and heap remain apart. In Linux 4.13 a change was committed that special cased the kernel ELF loader when the loader is invoked directly (eab09532d400; binfmt_elf: use ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked directly and this issue is limited to cases where it is, (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In those rare cases, the loader doesn't take into account the amount of brk randomization that will be applied by arch_randomize_brk(). This can lead to the stack and heap being arbitrarily close to each other. Signed-off-by: Ali Saidi --- arch/arm64/mm/mmap.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 842c8a5fcd53..0778f7ba8306 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -67,6 +67,14 @@ static unsigned long mmap_base(unsigned long rnd, struct rlimit *rlim_stack) unsigned long gap = rlim_stack->rlim_cur; unsigned long pad = (STACK_RND_MASK << PAGE_SHIFT) + stack_guard_gap; + /* Provide space for randomization when randomize_va_space == 2 and +* ld-linux.so is called directly. Values from arch_randomize_brk() +*/ + if (test_thread_flag(TIF_32BIT)) + pad += SZ_32M; + else + pad += SZ_1G; + /* Values close to RLIM_INFINITY can overflow. */ if (gap + pad > gap) gap += pad; -- 2.15.3.AMZN
[PATCH 0/2] handle worst-case heap randomization in mmap_base
Increase mmap_base by the worst-case brk randomization so that the stack and heap remain apart. In Linux 4.13 a change was committed that special cased the kernel ELF loader when the loader is invoked directly (eab09532d400; binfmt_elf: use ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked directly and this issue is limited to cases where it is, (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In those rare cases, the loader doesn't take into account the amount of brk randomization that will be applied by arch_randomize_brk(). This can lead to the stack and heap being arbitrarily close to each other. Ali Saidi (2): arm64/mmap: handle worst-case heap randomization in mmap_base x86/mmap: handle worst-case heap randomization in mmap_base arch/arm64/mm/mmap.c | 8 arch/x86/mm/mmap.c | 4 2 files changed, 12 insertions(+) -- 2.15.3.AMZN
[PATCH 2/2] x86/mmap: handle worst-case heap randomization in mmap_base
Increase mmap_base by the worst-case brk randomization so that the stack and heap remain apart. In Linux 4.13 a change was committed that special cased the kernel ELF loader when the loader is invoked directly (eab09532d400; binfmt_elf: use ELF_ET_DYN_BASE only for PIE). Generally, the loader isn’t invoked directly and this issue is limited to cases where it is, (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of the loader). In those rare cases, the loader doesn't take into account the amount of brk randomization that will be applied by arch_randomize_brk(). This can lead to the stack and heap being arbitrarily close to each other. Signed-off-by: Ali Saidi --- arch/x86/mm/mmap.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index db3165714521..98a2875c37e3 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include "physaddr.h" @@ -97,6 +98,9 @@ static unsigned long mmap_base(unsigned long rnd, unsigned long task_size, unsigned long pad = stack_maxrandom_size(task_size) + stack_guard_gap; unsigned long gap_min, gap_max; + /* Provide space for brk randomization */ + pad += SZ_32M; + /* Values close to RLIM_INFINITY can overflow. */ if (gap + pad > gap) gap += pad; -- 2.15.3.AMZN