Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra wrote: > > On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote: > > diff --git a/arch/powerpc/include/asm/local.h > > b/arch/powerpc/include/asm/local.h > > index bc4bd19b7fc2..45492fb5bf22 100644 > > --- a/arch/powerpc/include/asm/local.h > > +++ b/arch/powerpc/include/asm/local.h > > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, > > long n) > > return t; > > } > > > > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n) > > +{ > > + long o = *po, r; > > + > > + r = local_cmpxchg(l, o, n); > > + if (unlikely(r != o)) > > + *po = r; > > + > > + return likely(r == o); > > +} > > + > > Why is the ppc one different from the rest? Why can't it use the > try_cmpxchg_local() fallback and needs to have it open-coded? Please note that ppc directly defines local_cmpxchg that bypasses cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same approach for local_try_cmpxchg, because fallbacks are using arch_cmpxchg_local definitions. PPC should be converted to use arch_cmpxchg_local (to also enable instrumentation), but this is not the scope of the proposed patchset. Uros.
Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
On Thu, Apr 6, 2023 at 10:26 AM David Laight wrote: > > From: Dave Hansen > > Sent: 05 April 2023 17:37 > > > > On 4/5/23 07:17, Uros Bizjak wrote: > > > Add generic and target specific support for local{,64}_try_cmpxchg > > > and wire up support for all targets that use local_t infrastructure. > > > > I feel like I'm missing some context. > > > > What are the actual end user visible effects of this series? Is there a > > measurable decrease in perf overhead? Why go to all this trouble for > > perf? Who else will use local_try_cmpxchg()? > > I'm assuming the local_xxx operations only have to be save wrt interrupts? > On x86 it is possible that an alternate instruction sequence > that doesn't use a locked instruction may actually be faster! Please note that "local" functions do not use lock prefix. Only atomic properties of cmpxchg instruction are exploited since it only needs to be safe wrt interrupts. Uros. > Although, maybe, any kind of locked cmpxchg just needs to ensure > the cache line isn't 'stolen', so apart from possible slight > delays on another cpu that gets a cache miss for the line in > all makes little difference. > The cache line miss costs a lot anyway, line bouncing more > and is best avoided. > So is there actually much of a benefit at all? > > Clearly the try_cmpxchg help - but that is a different issue. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK > Registration No: 1397386 (Wales)
Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
On Wed, Apr 5, 2023 at 6:37 PM Dave Hansen wrote: > > On 4/5/23 07:17, Uros Bizjak wrote: > > Add generic and target specific support for local{,64}_try_cmpxchg > > and wire up support for all targets that use local_t infrastructure. > > I feel like I'm missing some context. > > What are the actual end user visible effects of this series? Is there a > measurable decrease in perf overhead? Why go to all this trouble for > perf? Who else will use local_try_cmpxchg()? This functionality was requested by perf people [1], so perhaps Steven can give us some concrete examples. In general, apart from the removal of unneeded compare instruction on x86, usage of try_cmpxchg also results in slightly better code on non-x86 targets [2], since the code now correctly identifies fast-path through the cmpxchg loop. Also important is that try_cmpxchg code reuses the result of cmpxchg instruction in the loop, so a read from the memory in the loop is eliminated. When reviewing the cmpxchg usage sites, I found numerous places where unnecessary read from memory was present in the loop, two examples can be seen in the last patch of this series. Also, using try_cmpxchg prevents inconsistencies of the cmpxchg loop, where the result of the cmpxchg is compared with the wrong "old" value - one such bug is still lurking in x86 APIC code, please see [3]. Please note that apart from perf subsystem, event subsystem can also be improved by using local_try_cmpxchg. This is the reason that the last patch includes a change in events/core.c. > I'm all for improving things, and perf is an important user. But, if > the goal here is improving performance, it would be nice to see at least > a stab at quantifying the performance delta. [1] https://lore.kernel.org/lkml/20230301131831.6c8d4...@gandalf.local.home/ [2] https://lore.kernel.org/lkml/yo91omfdzttgx...@fvff77s0q05n.cambridge.arm.com/ [3] https://lore.kernel.org/lkml/20230227160917.107820-1-ubiz...@gmail.com/ Uros.
[PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg
This patch illustrates the transition to local{,64}_try_cmpxchg. It is not intended to be merged as-is. Signed-off-by: Uros Bizjak --- arch/x86/events/core.c | 9 - kernel/events/ring_buffer.c | 5 +++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d096b04bf80e..d9310e9363f1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event) * exchange a new raw count - then add that new-prev delta * count to the generic event atomically: */ -again: prev_raw_count = local64_read(>prev_count); - rdpmcl(hwc->event_base_rdpmc, new_raw_count); - if (local64_cmpxchg(>prev_count, prev_raw_count, - new_raw_count) != prev_raw_count) - goto again; + do { + rdpmcl(hwc->event_base_rdpmc, new_raw_count); + } while (!local64_try_cmpxchg(>prev_count, _raw_count, + new_raw_count)); /* * Now we have the new raw value and have updated the prev diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 273a0fe7910a..111ab85ee97d 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle, perf_output_get_handle(handle); + offset = local_read(>head); do { tail = READ_ONCE(rb->user_page->data_tail); - offset = head = local_read(>head); + head = offset; if (!rb->overwrite) { if (unlikely(!ring_buffer_has_space(head, tail, perf_data_size(rb), @@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle, head += size; else head -= size; - } while (local_cmpxchg(>head, offset, head) != offset); + } while (!local_try_cmpxchg(>head, , head)); if (backward) { offset = head; -- 2.39.2
[PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local
Define target specific arch_try_cmpxchg_local. This definition overrides the generic arch_try_cmpxchg_local fallback definition and enables target-specific implementation of try_cmpxchg_local. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Signed-off-by: Uros Bizjak --- arch/x86/include/asm/cmpxchg.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h index 94fbe6ae7431..540573f515b7 100644 --- a/arch/x86/include/asm/cmpxchg.h +++ b/arch/x86/include/asm/cmpxchg.h @@ -221,9 +221,15 @@ extern void __add_wrong_size(void) #define __try_cmpxchg(ptr, pold, new, size)\ __raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX) +#define __try_cmpxchg_local(ptr, pold, new, size) \ + __raw_try_cmpxchg((ptr), (pold), (new), (size), "") + #define arch_try_cmpxchg(ptr, pold, new) \ __try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr))) +#define arch_try_cmpxchg_local(ptr, pold, new) \ + __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr))) + /* * xadd() adds "inc" to "*ptr" and atomically returns the previous * value of "*ptr". -- 2.39.2
[PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg and local_cmpxchg using typed C wrappers that call their _local counterpart and provide additional checking of their input arguments. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Huacai Chen Cc: WANG Xuerui Cc: Jiaxun Yang Cc: Jun Yi Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Signed-off-by: Uros Bizjak --- arch/alpha/include/asm/local.h | 12 ++-- arch/loongarch/include/asm/local.h | 13 +++-- arch/mips/include/asm/local.h | 13 +++-- arch/powerpc/include/asm/local.h | 11 +++ arch/x86/include/asm/local.h | 13 +++-- 5 files changed, 54 insertions(+), 8 deletions(-) diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h index fab26a1c93d5..0fcaad642cc3 100644 --- a/arch/alpha/include/asm/local.h +++ b/arch/alpha/include/asm/local.h @@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l) return result; } -#define local_cmpxchg(l, o, n) \ - (cmpxchg_local(&((l)->a.counter), (o), (n))) +static __inline__ long local_cmpxchg(local_t *l, long old, long new) +{ + return cmpxchg_local(>a.counter, old, new); +} + +static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new) +{ + return try_cmpxchg_local(>a.counter, (s64 *)old, new); +} + #define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n))) /** diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h index 65fbbae9fc4d..83e995b30e47 100644 --- a/arch/loongarch/include/asm/local.h +++ b/arch/loongarch/include/asm/local.h @@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l) return result; } -#define local_cmpxchg(l, o, n) \ - ((long)cmpxchg_local(&((l)->a.counter), (o), (n))) +static inline long local_cmpxchg(local_t *l, long old, long new) +{ + return cmpxchg_local(>a.counter, old, new); +} + +static inline bool local_try_cmpxchg(local_t *l, long *old, long new) +{ + typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old; + return try_cmpxchg_local(>a.counter, __old, new); +} + #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n))) /** diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h index 08366b1fd273..5daf6fe8e3e9 100644 --- a/arch/mips/include/asm/local.h +++ b/arch/mips/include/asm/local.h @@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l) return result; } -#define local_cmpxchg(l, o, n) \ - ((long)cmpxchg_local(&((l)->a.counter), (o), (n))) +static __inline__ long local_cmpxchg(local_t *l, long old, long new) +{ + return cmpxchg_local(>a.counter, old, new); +} + +static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new) +{ + typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old; + return try_cmpxchg_local(>a.counter, __old, new); +} + #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n))) /** diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h index bc4bd19b7fc2..45492fb5bf22 100644 --- a/arch/powerpc/include/asm/local.h +++ b/arch/powerpc/include/asm/local.h @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n) return t; } +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n) +{ + long o = *po, r; + + r = local_cmpxchg(l, o, n); + if (unlikely(r != o)) + *po = r; + + return likely(r == o); +} + static __inline__ long local_xchg(local_t *l, long n) { long t; diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h index 349a47acaa4a..56d4ef604b91 100644 --- a/arch/x86/include/asm/local.h +++ b/arch/x86/include/asm/local.h @@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l) #define local_inc_return(l) (local_add_return(1, l)) #define local_dec_return(l) (local_sub_return(1, l)) -#define local_cmpxchg(l, o, n) \ - (cmpxchg_local(&((l)->a.counter), (o), (n))) +static inline long local_cmpxchg(local_t *l, long old, long new) +{ + return cmpxchg_local(>a.counter, old, new); +} + +static inline bool local_try_cmpxchg(local_t *l, long *old, long new) +{ + typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old; + return try_cmpxchg_local(>a.counter, __old, new); +} + /* Always has a lock prefix */ #define local_xchg(l, n) (xchg(&((l)->a.counter), (n))) -- 2.39.2
[PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
Add generic and target specific support for local{,64}_try_cmpxchg and wire up support for all targets that use local_t infrastructure. The patch enables x86 targets to emit special instruction for local_try_cmpxchg and also local64_try_cmpxchg for x86_64. The last patch changes __perf_output_begin in events/ring_buffer to use new locking primitive and improves code from 4b3: 48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax 4ba: 48 8b b8 08 04 00 00mov0x408(%rax),%rdi 4c1: 8b 42 1cmov0x1c(%rdx),%eax 4c4: 48 8b 4a 28 mov0x28(%rdx),%rcx 4c8: 85 c0 test %eax,%eax ... 4ef: 48 89 c8mov%rcx,%rax 4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx) 4f7: 48 39 c1cmp%rax,%rcx 4fa: 75 b7 jne4b3 <...> to 4b2: 48 8b 4a 28 mov0x28(%rdx),%rcx 4b6: 48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax 4bd: 48 8b b0 08 04 00 00mov0x408(%rax),%rsi 4c4: 8b 42 1cmov0x1c(%rdx),%eax 4c7: 85 c0 test %eax,%eax ... 4d4: 48 89 c8mov%rcx,%rax 4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx) 4dc: 0f 85 d0 00 00 00 jne5b2 <...> ... 5b2: 48 89 c1mov%rax,%rcx 5b5: e9 fc fe ff ff jmp4b6 <...> Please note that in addition to removed compare, the load from 0x28(%rdx) gets moved out of the loop and the code is rearranged according to likely/unlikely tags in the source. --- v2: Implement target specific support for local_try_cmpxchg and local_cmpxchg using typed C wrappers that call their _local counterpart and provide additional checking of their input arguments. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Huacai Chen Cc: WANG Xuerui Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Arnd Bergmann Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Ian Rogers Cc: Will Deacon Cc: Boqun Feng Cc: Jiaxun Yang Cc: Jun Yi Uros Bizjak (5): locking/atomic: Add generic try_cmpxchg{,64}_local support locking/generic: Wire up local{,64}_try_cmpxchg locking/arch: Wire up local_try_cmpxchg locking/x86: Define arch_try_cmpxchg_local events: Illustrate the transition to local{,64}_try_cmpxchg arch/alpha/include/asm/local.h | 12 +-- arch/loongarch/include/asm/local.h | 13 +-- arch/mips/include/asm/local.h | 13 +-- arch/powerpc/include/asm/local.h| 11 ++ arch/x86/events/core.c | 9 arch/x86/include/asm/cmpxchg.h | 6 ++ arch/x86/include/asm/local.h| 13 +-- include/asm-generic/local.h | 1 + include/asm-generic/local64.h | 12 ++- include/linux/atomic/atomic-arch-fallback.h | 24 - include/linux/atomic/atomic-instrumented.h | 20 - kernel/events/ring_buffer.c | 5 +++-- scripts/atomic/gen-atomic-fallback.sh | 4 scripts/atomic/gen-atomic-instrumented.sh | 2 +- 14 files changed, 126 insertions(+), 19 deletions(-) -- 2.39.2
[PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg
Implement generic support for local{,64}_try_cmpxchg. Redirect to the atomic_ family of functions when the target does not provide its own local.h definitions. For 64-bit targets, implement local64_try_cmpxchg and local64_cmpxchg using typed C wrappers that call local_ family of functions and provide additional checking of their input arguments. Cc: Arnd Bergmann Signed-off-by: Uros Bizjak --- include/asm-generic/local.h | 1 + include/asm-generic/local64.h | 12 +++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h index fca7f1d84818..7f97018df66f 100644 --- a/include/asm-generic/local.h +++ b/include/asm-generic/local.h @@ -42,6 +42,7 @@ typedef struct #define local_inc_return(l) atomic_long_inc_return(&(l)->a) #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n)) +#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n)) #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n)) #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u)) #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a) diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h index 765be0b7d883..14963a7a6253 100644 --- a/include/asm-generic/local64.h +++ b/include/asm-generic/local64.h @@ -42,7 +42,16 @@ typedef struct { #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a)) #define local64_inc_return(l) local_inc_return(&(l)->a) -#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n)) +static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new) +{ + return local_cmpxchg(>a, old, new); +} + +static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new) +{ + return local_try_cmpxchg(>a, (long *)old, new); +} + #define local64_xchg(l, n) local_xchg((&(l)->a), (n)) #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u)) #define local64_inc_not_zero(l)local_inc_not_zero(&(l)->a) @@ -81,6 +90,7 @@ typedef struct { #define local64_inc_return(l) atomic64_inc_return(&(l)->a) #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n)) +#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n)) #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n)) #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u)) #define local64_inc_not_zero(l)atomic64_inc_not_zero(&(l)->a) -- 2.39.2
[PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support
Add generic support for try_cmpxchg{,64}_local and their falbacks. These provides the generic try_cmpxchg_local family of functions from the arch_ prefixed version, also adding explicit instrumentation. Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Mark Rutland Signed-off-by: Uros Bizjak --- include/linux/atomic/atomic-arch-fallback.h | 24 - include/linux/atomic/atomic-instrumented.h | 20 - scripts/atomic/gen-atomic-fallback.sh | 4 scripts/atomic/gen-atomic-instrumented.sh | 2 +- 4 files changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h index 77bc5522e61c..36c92851cdee 100644 --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -217,6 +217,28 @@ #endif /* arch_try_cmpxchg64_relaxed */ +#ifndef arch_try_cmpxchg_local +#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg_local */ + +#ifndef arch_try_cmpxchg64_local +#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64_local */ + #ifndef arch_atomic_read_acquire static __always_inline int arch_atomic_read_acquire(const atomic_t *v) @@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) #endif #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// b5e87bdd5ede61470c29f7a7e4de781af3770f09 +// 1f49bd4895a4b7a5383906649027205c52ec80ab diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h index 7a139ec030b0..14a9212cc987 100644 --- a/include/linux/atomic/atomic-instrumented.h +++ b/include/linux/atomic/atomic-instrumented.h @@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v) arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \ }) +#define try_cmpxchg_local(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + +#define try_cmpxchg64_local(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + #define cmpxchg_double(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ @@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v) }) #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ -// 764f741eb77a7ad565dc8d99ce2837d5542e8aee +// 456e206c7e4e681126c482e4edcc6f46921ac731 diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh index 3a07695e3c89..6e853f0dad8d 100755 --- a/scripts/atomic/gen-atomic-fallback.sh +++ b/scripts/atomic/gen-atomic-fallback.sh @@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do gen_try_cmpxchg_fallbacks "${cmpxchg}" done +for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do + gen_try_cmpxchg_fallback "${cmpxchg}" "" +done + grep '^[a-z]' "$1" | while read name meta args; do gen_proto "${meta}" "${name}" "atomic" "int" ${args} done diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh index 77c06526a574..c8165e9431bf 100755 --- a/scripts/atomic/gen-atomic-instrumented.sh +++ b/scripts/atomic/gen-atomic-instrumented.sh @@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do done done -for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do +for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do gen_xchg "${xchg}" "" "" printf "\n" done -- 2.39.2
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Tue, Apr 4, 2023 at 3:19 PM Mark Rutland wrote: > > On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote: > > On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland wrote: > > > > > > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote: > > > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland > > > > wrote: > > > > > > > > > > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote: > > > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote: > > > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > > > > > > > > Cast _oldp to the type of _ptr to avoid > > > > > > > > > incompatible-pointer-types warning. > > > > > > > > > > > > > > > > Can you give an example of where we are passing an incompatible > > > > > > > > pointer? > > > > > > > > > > > > > > An example is patch 10/10 from the series, which will fail without > > > > > > > this fix when fallback code is used. We have: > > > > > > > > > > > > > > - } while (local_cmpxchg(>head, offset, head) != > > > > > > > offset); > > > > > > > + } while (!local_try_cmpxchg(>head, , head)); > > > > > > > > > > > > > > where rb->head is defined as: > > > > > > > > > > > > > > typedef struct { > > > > > > >atomic_long_t a; > > > > > > > } local_t; > > > > > > > > > > > > > > while offset is defined as 'unsigned long'. > > > > > > > > > > > > Ok, but that's because we're doing the wrong thing to start with. > > > > > > > > > > > > Since local_t is defined in terms of atomic_long_t, we should > > > > > > define the > > > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). > > > > > > We'll still > > > > > > have a mismatch between 'long *' and 'unsigned long *', but then we > > > > > > can fix > > > > > > that in the callsite: > > > > > > > > > > > > while (!local_try_cmpxchg(>head, &(long *)offset, head)) > > > > > > > > > > Sorry, that should be: > > > > > > > > > > while (!local_try_cmpxchg(>head, (long *), head)) > > > > > > > > The fallbacks are a bit more complicated than above, and are different > > > > from atomic_try_cmpxchg. > > > > > > > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local > > > > are not defined call arch_cmpxchg_local. Also in patch 2/10, > > > > try_cmpxchg_local is introduced, where it calls > > > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g. > > > > : > > > > > > > > #define local_cmpxchg(l, o, n) \ > > > >(cmpxchg_local(&((l)->a.counter), (o), (n))) > > > > +#define local_try_cmpxchg(l, po, n) \ > > > > + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) > > > > > > > > which is part of the local_t API. Targets should either define all > > > > these #defines, or none. There are no partial fallbacks as is the case > > > > with atomic_t. > > > > > > Whether or not there are fallbacks is immaterial. > > > > > > In those cases, architectures can just as easily write C wrappers, e.g. > > > > > > long local_cmpxchg(local_t *l, long old, long new) > > > { > > > return cmpxchg_local(>a.counter, old, new); > > > } > > > > > > long local_try_cmpxchg(local_t *l, long *old, long new) > > > { > > > return try_cmpxchg_local(>a.counter, old, new); > > > } > > > > Please find attached the complete prototype patch that implements the > > above suggestion. > > > > The patch includes: > > - implementation of instrumented try_cmpxchg{,64}_local definitions > > - corresponding arch_try_cmpxchg{,64}_local fallback definitions > > - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers > > > > - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers > > - x86 specific arch_try_cmpxchg_local definition > > > > - kernel/events/ring_buffer.c change to test local_try_cmpxchg > > implementation and illustrate the transition > > - arch/x86/events/core.c change to test local64_try_cmpxchg > > implementation and illustrate the transition > > > > The definition of atomic_long_t is different for 64-bit and 32-bit > > targets (s64 vs int), so target specific C wrappers have to use > > different casts to account for this difference. > > > > Uros. > > Thanks for this! > > FWIW, the patch (inline below) looks good to me. Thanks, I will prepare a patch series for submission later today. Uros.
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland wrote: > > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote: > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland wrote: > > > > > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote: > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote: > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland > > > > > wrote: > > > > > > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > > > > > > Cast _oldp to the type of _ptr to avoid > > > > > > > incompatible-pointer-types warning. > > > > > > > > > > > > Can you give an example of where we are passing an incompatible > > > > > > pointer? > > > > > > > > > > An example is patch 10/10 from the series, which will fail without > > > > > this fix when fallback code is used. We have: > > > > > > > > > > - } while (local_cmpxchg(>head, offset, head) != offset); > > > > > + } while (!local_try_cmpxchg(>head, , head)); > > > > > > > > > > where rb->head is defined as: > > > > > > > > > > typedef struct { > > > > >atomic_long_t a; > > > > > } local_t; > > > > > > > > > > while offset is defined as 'unsigned long'. > > > > > > > > Ok, but that's because we're doing the wrong thing to start with. > > > > > > > > Since local_t is defined in terms of atomic_long_t, we should define the > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). > > > > We'll still > > > > have a mismatch between 'long *' and 'unsigned long *', but then we can > > > > fix > > > > that in the callsite: > > > > > > > > while (!local_try_cmpxchg(>head, &(long *)offset, head)) > > > > > > Sorry, that should be: > > > > > > while (!local_try_cmpxchg(>head, (long *), head)) > > > > The fallbacks are a bit more complicated than above, and are different > > from atomic_try_cmpxchg. > > > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local > > are not defined call arch_cmpxchg_local. Also in patch 2/10, > > try_cmpxchg_local is introduced, where it calls > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g. > > : > > > > #define local_cmpxchg(l, o, n) \ > >(cmpxchg_local(&((l)->a.counter), (o), (n))) > > +#define local_try_cmpxchg(l, po, n) \ > > + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) > > > > which is part of the local_t API. Targets should either define all > > these #defines, or none. There are no partial fallbacks as is the case > > with atomic_t. > > Whether or not there are fallbacks is immaterial. > > In those cases, architectures can just as easily write C wrappers, e.g. > > long local_cmpxchg(local_t *l, long old, long new) > { > return cmpxchg_local(>a.counter, old, new); > } > > long local_try_cmpxchg(local_t *l, long *old, long new) > { > return try_cmpxchg_local(>a.counter, old, new); > } Please find attached the complete prototype patch that implements the above suggestion. The patch includes: - implementation of instrumented try_cmpxchg{,64}_local definitions - corresponding arch_try_cmpxchg{,64}_local fallback definitions - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers - x86 specific arch_try_cmpxchg_local definition - kernel/events/ring_buffer.c change to test local_try_cmpxchg implementation and illustrate the transition - arch/x86/events/core.c change to test local64_try_cmpxchg implementation and illustrate the transition The definition of atomic_long_t is different for 64-bit and 32-bit targets (s64 vs int), so target specific C wrappers have to use different casts to account for this difference. Uros. diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index d096b04bf80e..d9310e9363f1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event) * exchange a new raw count - then add that new-prev delta * count to the generic event atomically: */ -again: prev_raw_count = local64_read(>prev_count); - rdpmcl(hwc->event_base_rdpmc, new_raw_count); - if (local64_cmpxchg(&
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland wrote: > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote: > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote: > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland wrote: > > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types > > > > > warning. > > > > > > > > Can you give an example of where we are passing an incompatible pointer? > > > > > > An example is patch 10/10 from the series, which will fail without > > > this fix when fallback code is used. We have: > > > > > > - } while (local_cmpxchg(>head, offset, head) != offset); > > > + } while (!local_try_cmpxchg(>head, , head)); > > > > > > where rb->head is defined as: > > > > > > typedef struct { > > >atomic_long_t a; > > > } local_t; > > > > > > while offset is defined as 'unsigned long'. > > > > Ok, but that's because we're doing the wrong thing to start with. > > > > Since local_t is defined in terms of atomic_long_t, we should define the > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll > > still > > have a mismatch between 'long *' and 'unsigned long *', but then we can fix > > that in the callsite: > > > > while (!local_try_cmpxchg(>head, &(long *)offset, head)) > > Sorry, that should be: > > while (!local_try_cmpxchg(>head, (long *), head)) The fallbacks are a bit more complicated than above, and are different from atomic_try_cmpxchg. Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local are not defined call arch_cmpxchg_local. Also in patch 2/10, try_cmpxchg_local is introduced, where it calls arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g. : #define local_cmpxchg(l, o, n) \ (cmpxchg_local(&((l)->a.counter), (o), (n))) +#define local_try_cmpxchg(l, po, n) \ + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) which is part of the local_t API. Targets should either define all these #defines, or none. There are no partial fallbacks as is the case with atomic_t. The core of the local_h API is in the local.h header. If the target doesn't define its own local.h header, then asm-generic/local.h is used that does exactly what you propose above regarding the usage of atomic functions. OTOH, when the target defines its own local.h, then the above target-dependent #define path applies. The target should define its own arch_try_cmpxchg_local, otherwise a "generic" target-dependent fallback that calls target arch_cmpxchg_local applies. In the case of x86, patch 9/10 enables new instruction by defining arch_try_cmpxchg_local. FYI, the patch sequence is carefully chosen so that x86 also exercises fallback code between different patches in the series. Targets are free to define local_t to whatever they like, but for some reason they all define it to: typedef struct { atomic_long_t a; } local_t; so they have to dig the variable out of the struct like: #define local_cmpxchg(l, o, n) \ (cmpxchg_local(&((l)->a.counter), (o), (n))) Regarding the mismatch of 'long *' vs 'unsigned long *': x86 target-specific code does for try_cmpxchg: #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \ ({ \ bool success; \ __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ __typeof__(*(_ptr)) __old = *_old; \ __typeof__(*(_ptr)) __new = (_new); \ so, it *does* cast the "old" pointer to the type of "ptr". The generic code does *not*. This difference is dangerous, since the compilation of some code involving try_cmpxchg will compile OK for x86 but will break for other targets that use try_cmpxchg fallback templates (I was the unlucky one that tripped on this in the past). Please note that this problem is not specific to the proposed local_try_cmpxchg series, but affects the existing try_cmpxchg API. Also, I don't think that "fixing" callsites is the right thing to do. The generic code should follow x86 and cast the "old" pointer to the type of "ptr" inside the fallback. > The fundamenalthing I'm trying to say is that the > atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for > their try_cmpxchg() implementations, the type signature should be: > > ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, > ${inttype} new) This conversion should be performed also for the cmpxchg family of functions, if desired at all. try_cmpxchg fallback is just cmpxchg with some extra code around. Thanks, Uros.
Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland wrote: > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote: > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning. > > Can you give an example of where we are passing an incompatible pointer? An example is patch 10/10 from the series, which will fail without this fix when fallback code is used. We have: - } while (local_cmpxchg(>head, offset, head) != offset); + } while (!local_try_cmpxchg(>head, , head)); where rb->head is defined as: typedef struct { atomic_long_t a; } local_t; while offset is defined as 'unsigned long'. The assignment in existing try_cmpxchg template: typeof(*(_ptr)) *___op = (_oldp) will trigger an initialization from an incompatible pointer type error. Please note that x86 avoids this issue by a cast in its target-dependent definition: #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\ ({ \ bool success; \ __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ __typeof__(*(_ptr)) __old = *_old; \ __typeof__(*(_ptr)) __new = (_new); \ so, the warning/error will trigger only in the fallback code. > That sounds indicative of a bug in the caller, but maybe I'm missing some > reason this is necessary due to some indirection. > > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks") > > I'm not sure that this needs a fixes tag. Does anything go wrong today, or > only > later in this series? The patch at [1] triggered a build error in posix_acl.c/__get.acl due to the same problem. The compilation for x86 target was OK, because x86 defines target-specific arch_try_cmpxchg, but the compilation broke for targets that revert to generic support. Please note that this specific problem was recently fixed in a different way [2], but the issue with the fallback remains. [1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/ [2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/ Uros.
[PATCH 10/10] perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin
Use local_try_cmpxchg instead of local_cmpxchg (*ptr, old, new) == old in __perf_output_begin. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. Also, local_try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Ian Rogers Signed-off-by: Uros Bizjak --- kernel/events/ring_buffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 273a0fe7910a..e07c10f4d141 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle, perf_output_get_handle(handle); + offset = local_read(>head); do { + head = offset; tail = READ_ONCE(rb->user_page->data_tail); - offset = head = local_read(>head); if (!rb->overwrite) { if (unlikely(!ring_buffer_has_space(head, tail, perf_data_size(rb), @@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle, head += size; else head -= size; - } while (local_cmpxchg(>head, offset, head) != offset); + } while (!local_try_cmpxchg(>head, , head)); if (backward) { offset = head; -- 2.39.2
[PATCH 09/10] locking/x86: Enable local{,64}_try_cmpxchg
Enable local_try_cmpxchg and also local64_try_cmpxchg for x86_64. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Signed-off-by: Uros Bizjak --- arch/x86/include/asm/cmpxchg.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h index 94fbe6ae7431..e8f939e8432e 100644 --- a/arch/x86/include/asm/cmpxchg.h +++ b/arch/x86/include/asm/cmpxchg.h @@ -221,9 +221,15 @@ extern void __add_wrong_size(void) #define __try_cmpxchg(ptr, pold, new, size)\ __raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX) +#define __try_cmpxchg_local(ptr, pold, new, size) \ + __raw_try_cmpxchg((ptr), (pold), (new), (size), "") + #define arch_try_cmpxchg(ptr, pold, new) \ __try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr))) +#define arch_try_cmpxchg_local(ptr, pold, new) \ + __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr))) + /* * xadd() adds "inc" to "*ptr" and atomically returns the previous * value of "*ptr". -- 2.39.2
[PATCH 08/10] locking/generic: Wire up local{,64}_try_cmpxchg
Implement generic support for local{,64}_try_cmpxchg. Cc: Arnd Bergmann Signed-off-by: Uros Bizjak --- include/asm-generic/local.h | 1 + include/asm-generic/local64.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h index fca7f1d84818..7f97018df66f 100644 --- a/include/asm-generic/local.h +++ b/include/asm-generic/local.h @@ -42,6 +42,7 @@ typedef struct #define local_inc_return(l) atomic_long_inc_return(&(l)->a) #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n)) +#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), (n)) #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n)) #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u)) #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a) diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h index 765be0b7d883..54b91e93ae76 100644 --- a/include/asm-generic/local64.h +++ b/include/asm-generic/local64.h @@ -43,6 +43,7 @@ typedef struct { #define local64_inc_return(l) local_inc_return(&(l)->a) #define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n)) +#define local64_try_cmpxchg(l, po, n) local_try_cmpxchg((&(l)->a), (po), (n)) #define local64_xchg(l, n) local_xchg((&(l)->a), (n)) #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u)) #define local64_inc_not_zero(l)local_inc_not_zero(&(l)->a) @@ -81,6 +82,7 @@ typedef struct { #define local64_inc_return(l) atomic64_inc_return(&(l)->a) #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n)) +#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), (n)) #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n)) #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u)) #define local64_inc_not_zero(l)atomic64_inc_not_zero(&(l)->a) -- 2.39.2
[PATCH 06/10] locking/powerpc: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg. Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Signed-off-by: Uros Bizjak Signed-off-by: Uros Bizjak --- arch/powerpc/include/asm/local.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h index bc4bd19b7fc2..45492fb5bf22 100644 --- a/arch/powerpc/include/asm/local.h +++ b/arch/powerpc/include/asm/local.h @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, long n) return t; } +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n) +{ + long o = *po, r; + + r = local_cmpxchg(l, o, n); + if (unlikely(r != o)) + *po = r; + + return likely(r == o); +} + static __inline__ long local_xchg(local_t *l, long n) { long t; -- 2.39.2
[PATCH 07/10] locking/x86: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Signed-off-by: Uros Bizjak --- arch/x86/include/asm/local.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h index 349a47acaa4a..4bcc72906524 100644 --- a/arch/x86/include/asm/local.h +++ b/arch/x86/include/asm/local.h @@ -122,6 +122,8 @@ static inline long local_sub_return(long i, local_t *l) #define local_cmpxchg(l, o, n) \ (cmpxchg_local(&((l)->a.counter), (o), (n))) +#define local_try_cmpxchg(l, po, n) \ + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) /* Always has a lock prefix */ #define local_xchg(l, n) (xchg(&((l)->a.counter), (n))) -- 2.39.2
[PATCH 05/10] locking/mips: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg. Cc: Thomas Bogendoerfer Signed-off-by: Uros Bizjak --- arch/mips/include/asm/local.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h index 08366b1fd273..b611361a760b 100644 --- a/arch/mips/include/asm/local.h +++ b/arch/mips/include/asm/local.h @@ -96,6 +96,8 @@ static __inline__ long local_sub_return(long i, local_t * l) #define local_cmpxchg(l, o, n) \ ((long)cmpxchg_local(&((l)->a.counter), (o), (n))) +#define local_try_cmpxchg(l, po, n) \ + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n))) /** -- 2.39.2
[PATCH 04/10] locking/loongarch: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg. Cc: Huacai Chen Cc: WANG Xuerui Cc: Jiaxun Yang Cc: Jun Yi Signed-off-by: Uros Bizjak --- arch/loongarch/include/asm/local.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/loongarch/include/asm/local.h b/arch/loongarch/include/asm/local.h index 65fbbae9fc4d..dff6bcbe4821 100644 --- a/arch/loongarch/include/asm/local.h +++ b/arch/loongarch/include/asm/local.h @@ -58,6 +58,8 @@ static inline long local_sub_return(long i, local_t *l) #define local_cmpxchg(l, o, n) \ ((long)cmpxchg_local(&((l)->a.counter), (o), (n))) +#define local_try_cmpxchg(l, po, n) \ + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n))) /** -- 2.39.2
[PATCH 03/10] locking/alpha: Wire up local_try_cmpxchg
Implement target specific support for local_try_cmpxchg. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Signed-off-by: Uros Bizjak --- arch/alpha/include/asm/local.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h index fab26a1c93d5..7eef027e0dde 100644 --- a/arch/alpha/include/asm/local.h +++ b/arch/alpha/include/asm/local.h @@ -54,6 +54,8 @@ static __inline__ long local_sub_return(long i, local_t * l) #define local_cmpxchg(l, o, n) \ (cmpxchg_local(&((l)->a.counter), (o), (n))) +#define local_try_cmpxchg(l, po, n) \ + (try_cmpxchg_local(&((l)->a.counter), (po), (n))) #define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n))) /** -- 2.39.2
[PATCH 02/10] locking/atomic: Add generic try_cmpxchg{,64}_local support
Add generic support for try_cmpxchg{,64}_local and their falbacks. Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Mark Rutland Signed-off-by: Uros Bizjak --- include/linux/atomic/atomic-arch-fallback.h | 24 - include/linux/atomic/atomic-instrumented.h | 20 - scripts/atomic/gen-atomic-fallback.sh | 4 scripts/atomic/gen-atomic-instrumented.sh | 2 +- 4 files changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h index 19debd501ee7..a7d2e5f4e548 100644 --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -217,6 +217,28 @@ #endif /* arch_try_cmpxchg64_relaxed */ +#ifndef arch_try_cmpxchg_local +#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg_local */ + +#ifndef arch_try_cmpxchg64_local +#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64_local */ + #ifndef arch_atomic_read_acquire static __always_inline int arch_atomic_read_acquire(const atomic_t *v) @@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) #endif #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// 1b4d4c82ae653389cd1538d5b07170267d9b3837 +// 9bb8cca3d4cbc000e7068eb7cb4481cb3e48c45a diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h index 7a139ec030b0..14a9212cc987 100644 --- a/include/linux/atomic/atomic-instrumented.h +++ b/include/linux/atomic/atomic-instrumented.h @@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v) arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \ }) +#define try_cmpxchg_local(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + +#define try_cmpxchg64_local(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + #define cmpxchg_double(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ @@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v) }) #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ -// 764f741eb77a7ad565dc8d99ce2837d5542e8aee +// 456e206c7e4e681126c482e4edcc6f46921ac731 diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh index 39f447161108..26829a75b644 100755 --- a/scripts/atomic/gen-atomic-fallback.sh +++ b/scripts/atomic/gen-atomic-fallback.sh @@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do gen_try_cmpxchg_fallbacks "${cmpxchg}" done +for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do + gen_try_cmpxchg_fallback "${cmpxchg}" "" +done + grep '^[a-z]' "$1" | while read name meta args; do gen_proto "${meta}" "${name}" "atomic" "int" ${args} done diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh index 77c06526a574..c8165e9431bf 100755 --- a/scripts/atomic/gen-atomic-instrumented.sh +++ b/scripts/atomic/gen-atomic-instrumented.sh @@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" "try_cmpxchg64"; do done done -for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do +for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" "try_cmpxchg_local" "try_cmpxchg64_local" ; do gen_xchg "${xchg}" "" "" printf "\n" done -- 2.39.2
[PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks
Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning. Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks") Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Mark Rutland Signed-off-by: Uros Bizjak --- include/linux/atomic/atomic-arch-fallback.h | 18 +- scripts/atomic/gen-atomic-fallback.sh | 2 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h index 77bc5522e61c..19debd501ee7 100644 --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -87,7 +87,7 @@ #ifndef arch_try_cmpxchg #define arch_try_cmpxchg(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -98,7 +98,7 @@ #ifndef arch_try_cmpxchg_acquire #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -109,7 +109,7 @@ #ifndef arch_try_cmpxchg_release #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -120,7 +120,7 @@ #ifndef arch_try_cmpxchg_relaxed #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -157,7 +157,7 @@ #ifndef arch_try_cmpxchg64 #define arch_try_cmpxchg64(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -168,7 +168,7 @@ #ifndef arch_try_cmpxchg64_acquire #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -179,7 +179,7 @@ #ifndef arch_try_cmpxchg64_release #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -190,7 +190,7 @@ #ifndef arch_try_cmpxchg64_relaxed #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \ ({ \ - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \ ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \ if (unlikely(___r != ___o)) \ *___op = ___r; \ @@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) #endif #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// b5e87bdd5ede61470c29f7a7e4de781af3770f09 +// 1b4d4c82ae653389cd1538d5b07170267d9b3837 diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh index 3a07695e3c89..39f447161108 100755 --- a/scripts/atomic/gen-atomic-fallback.sh +++ b/scripts/atomic/gen-atomic-fallback.sh @@ -171,7 +171,7 @@ cat <
[PATCH 00/10] locking: Introduce local{,64}_try_cmpxchg
Add generic and target specific support for local{,64}_try_cmpxchg and wire up support for all targets that use local_t infrastructure. The patch enables x86 targets to emit special instruction for local_try_cmpxchg and also local64_try_cmpxchg for x86_64. The last patch changes __perf_output_begin in events/ring_buffer to use new locking primitive and improves code from 4b3: 48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax 4ba: 48 8b b8 08 04 00 00mov0x408(%rax),%rdi 4c1: 8b 42 1cmov0x1c(%rdx),%eax 4c4: 48 8b 4a 28 mov0x28(%rdx),%rcx 4c8: 85 c0 test %eax,%eax ... 4ef: 48 89 c8mov%rcx,%rax 4f2: 48 0f b1 7a 28 cmpxchg %rdi,0x28(%rdx) 4f7: 48 39 c1cmp%rax,%rcx 4fa: 75 b7 jne4b3 <...> to 4b2: 48 8b 4a 28 mov0x28(%rdx),%rcx 4b6: 48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax 4bd: 48 8b b0 08 04 00 00mov0x408(%rax),%rsi 4c4: 8b 42 1cmov0x1c(%rdx),%eax 4c7: 85 c0 test %eax,%eax ... 4d4: 48 89 c8mov%rcx,%rax 4d7: 48 0f b1 72 28 cmpxchg %rsi,0x28(%rdx) 4dc: 0f 85 d0 00 00 00 jne5b2 <...> ... 5b2: 48 89 c1mov%rax,%rcx 5b5: e9 fc fe ff ff jmp4b6 <...> Please note that in addition to removed compare, the load from 0x28(%rdx) gets moved out of the loop and the code is rearranged according to likely/unlikely tags in the source. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Huacai Chen Cc: WANG Xuerui Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" Cc: Arnd Bergmann Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Ian Rogers Cc: Will Deacon Cc: Boqun Feng Cc: Jiaxun Yang Cc: Jun Yi Uros Bizjak (10): locking/atomic: Add missing cast to try_cmpxchg() fallbacks locking/atomic: Add generic try_cmpxchg{,64}_local support locking/alpha: Wire up local_try_cmpxchg locking/loongarch: Wire up local_try_cmpxchg locking/mips: Wire up local_try_cmpxchg locking/powerpc: Wire up local_try_cmpxchg locking/x86: Wire up local_try_cmpxchg locking/generic: Wire up local{,64}_try_cmpxchg locking/x86: Enable local{,64}_try_cmpxchg perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin arch/alpha/include/asm/local.h | 2 ++ arch/loongarch/include/asm/local.h | 2 ++ arch/mips/include/asm/local.h | 2 ++ arch/powerpc/include/asm/local.h| 11 ++ arch/x86/include/asm/cmpxchg.h | 6 arch/x86/include/asm/local.h| 2 ++ include/asm-generic/local.h | 1 + include/asm-generic/local64.h | 2 ++ include/linux/atomic/atomic-arch-fallback.h | 40 - include/linux/atomic/atomic-instrumented.h | 20 ++- kernel/events/ring_buffer.c | 5 +-- scripts/atomic/gen-atomic-fallback.sh | 6 +++- scripts/atomic/gen-atomic-instrumented.sh | 2 +- 13 files changed, 87 insertions(+), 14 deletions(-) -- 2.39.2
Re: [PATCH] compiler-gcc.h: Remove ancient workaround for gcc PR 58670
On Thu, Aug 25, 2022 at 12:30 PM Naveen N. Rao wrote: > > Uros Bizjak wrote: > > The workaround for 'asm goto' miscompilation introduces a compiler > > barrier quirk that inhibits many useful compiler optimizations. For > > example, __try_cmpxchg_user compiles to: > > > >11375: 41 8b 4d 00 mov0x0(%r13),%ecx > >11379: 41 8b 02mov(%r10),%eax > >1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) > >11380: 0f 94 c2sete %dl > >11383: 84 d2 test %dl,%dl > >11385: 75 c4 jne1134b <...> > >11387: 41 89 02mov%eax,(%r10) > > > > where the barrier inhibits flags propagation from asm when > > compiled with gcc-12. > > > > When the mentioned quirk is removed, the following code is generated: > > > >11553: 41 8b 4d 00 mov0x0(%r13),%ecx > >11557: 41 8b 02mov(%r10),%eax > >1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) > >1155e: 74 c9 je 11529 <...> > >11560: 41 89 02mov%eax,(%r10) > > > > The refered compiler bug: > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > > > was fixed for gcc-4.8.2. > > > > Current minimum required version of GCC is version 5.1 which has > > the above 'asm goto' miscompilation fixed, so remove the workaround. > > > > Signed-off-by: Uros Bizjak > > Cc: Andrew Morton > > --- > > include/linux/compiler-gcc.h | 11 --- > > 1 file changed, 11 deletions(-) > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > > index a0c55eeaeaf1..9b157b71036f 100644 > > --- a/include/linux/compiler-gcc.h > > +++ b/include/linux/compiler-gcc.h > > @@ -66,17 +66,6 @@ > > __builtin_unreachable();\ > > } while (0) > > > > -/* > > - * GCC 'asm goto' miscompiles certain code sequences: > > - * > > - * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > - * > > - * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. > > - * > > - * (asm goto is automatically volatile - the naming reflects this.) > > - */ > > -#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while > > (0) > > - > > #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) > > #define __HAVE_BUILTIN_BSWAP32__ > > #define __HAVE_BUILTIN_BSWAP64__ > > This is causing a build issue on ppc64le with a new patch replacing use > of unreachable() with __builtin_unreachable() in __WARN_FLAGS(): > https://lore.kernel.org/linuxppc-dev/20220808114908.240813-2...@linux.ibm.com/ > > during RTL pass: combine > In file included from /linux/kernel/locking/rtmutex_api.c:9: > /linux/kernel/locking/rtmutex.c: In function '__rt_mutex_slowlock.constprop': > /linux/kernel/locking/rtmutex.c:1612:1: internal compiler error: in > purge_dead_edges, at cfgrtl.c:3369 > 1612 | } > | ^ > 0x142817c internal_error(char const*, ...) > ???:0 > 0x5c8a1b fancy_abort(char const*, int, char const*) > ???:0 > 0x72017f purge_all_dead_edges() > ???:0 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See for instructions. > > > So, it looks like gcc still has issues with certain uses of asm goto. This looks like a powerpc specific bug, and has nothing to do with PR58560 (asm goto miscompilation). If this is indeed a target specific bug, it should be worked around in arch/powerpc/include/asm/compiler.h, but please also report it to the GCC bugzilla. Uros.