Re: [PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

2023-04-12 Thread Uros Bizjak
On Wed, Apr 12, 2023 at 1:33 PM Peter Zijlstra  wrote:
>
> On Wed, Apr 05, 2023 at 04:17:08PM +0200, Uros Bizjak wrote:
> > diff --git a/arch/powerpc/include/asm/local.h 
> > b/arch/powerpc/include/asm/local.h
> > index bc4bd19b7fc2..45492fb5bf22 100644
> > --- a/arch/powerpc/include/asm/local.h
> > +++ b/arch/powerpc/include/asm/local.h
> > @@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, 
> > long n)
> >   return t;
> >  }
> >
> > +static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
> > +{
> > + long o = *po, r;
> > +
> > + r = local_cmpxchg(l, o, n);
> > + if (unlikely(r != o))
> > + *po = r;
> > +
> > + return likely(r == o);
> > +}
> > +
>
> Why is the ppc one different from the rest? Why can't it use the
> try_cmpxchg_local() fallback and needs to have it open-coded?

Please note that ppc directly defines local_cmpxchg that bypasses
cmpxchg_local/arch_cmpxchg_local machinery. The patch takes the same
approach for local_try_cmpxchg, because fallbacks are using
arch_cmpxchg_local definitions.

PPC should be converted to use arch_cmpxchg_local (to also enable
instrumentation), but this is not the scope of the proposed patchset.

Uros.


Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

2023-04-06 Thread Uros Bizjak
On Thu, Apr 6, 2023 at 10:26 AM David Laight  wrote:
>
> From: Dave Hansen
> > Sent: 05 April 2023 17:37
> >
> > On 4/5/23 07:17, Uros Bizjak wrote:
> > > Add generic and target specific support for local{,64}_try_cmpxchg
> > > and wire up support for all targets that use local_t infrastructure.
> >
> > I feel like I'm missing some context.
> >
> > What are the actual end user visible effects of this series?  Is there a
> > measurable decrease in perf overhead?  Why go to all this trouble for
> > perf?  Who else will use local_try_cmpxchg()?
>
> I'm assuming the local_xxx operations only have to be save wrt interrupts?
> On x86 it is possible that an alternate instruction sequence
> that doesn't use a locked instruction may actually be faster!

Please note that "local" functions do not use lock prefix. Only atomic
properties of cmpxchg instruction are exploited since it only needs to
be safe wrt interrupts.

Uros.

> Although, maybe, any kind of locked cmpxchg just needs to ensure
> the cache line isn't 'stolen', so apart from possible slight
> delays on another cpu that gets a cache miss for the line in
> all makes little difference.
> The cache line miss costs a lot anyway, line bouncing more
> and is best avoided.
> So is there actually much of a benefit at all?
>
> Clearly the try_cmpxchg help - but that is a different issue.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)


Re: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

2023-04-05 Thread Uros Bizjak
On Wed, Apr 5, 2023 at 6:37 PM Dave Hansen  wrote:
>
> On 4/5/23 07:17, Uros Bizjak wrote:
> > Add generic and target specific support for local{,64}_try_cmpxchg
> > and wire up support for all targets that use local_t infrastructure.
>
> I feel like I'm missing some context.
>
> What are the actual end user visible effects of this series?  Is there a
> measurable decrease in perf overhead?  Why go to all this trouble for
> perf?  Who else will use local_try_cmpxchg()?

This functionality was requested by perf people [1], so perhaps Steven
can give us some concrete examples. In general, apart from the removal
of unneeded compare instruction on x86, usage of try_cmpxchg also
results in slightly better code on non-x86 targets [2], since the code
now correctly identifies fast-path through the cmpxchg loop.

Also important is that try_cmpxchg code reuses the result of cmpxchg
instruction in the loop, so a read from the memory in the loop is
eliminated. When reviewing the cmpxchg usage sites, I found numerous
places where unnecessary read from memory was present in the loop, two
examples can be seen in the last patch of this series.

Also, using try_cmpxchg prevents inconsistencies of the cmpxchg loop,
where the result of the cmpxchg is compared with the wrong "old" value
- one such bug is still lurking in x86 APIC code, please see [3].

Please note that apart from perf subsystem, event subsystem can also
be improved by using local_try_cmpxchg. This is the reason that the
last patch includes a change in events/core.c.

> I'm all for improving things, and perf is an important user.  But, if
> the goal here is improving performance, it would be nice to see at least
> a stab at quantifying the performance delta.

[1] https://lore.kernel.org/lkml/20230301131831.6c8d4...@gandalf.local.home/
[2] 
https://lore.kernel.org/lkml/yo91omfdzttgx...@fvff77s0q05n.cambridge.arm.com/
[3] https://lore.kernel.org/lkml/20230227160917.107820-1-ubiz...@gmail.com/

Uros.


[PATCH v2 5/5] events: Illustrate the transition to local{,64}_try_cmpxchg

2023-04-05 Thread Uros Bizjak
This patch illustrates the transition to local{,64}_try_cmpxchg.
It is not intended to be merged as-is.

Signed-off-by: Uros Bizjak 
---
 arch/x86/events/core.c  | 9 -
 kernel/events/ring_buffer.c | 5 +++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d096b04bf80e..d9310e9363f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
 * exchange a new raw count - then add that new-prev delta
 * count to the generic event atomically:
 */
-again:
prev_raw_count = local64_read(>prev_count);
-   rdpmcl(hwc->event_base_rdpmc, new_raw_count);
 
-   if (local64_cmpxchg(>prev_count, prev_raw_count,
-   new_raw_count) != prev_raw_count)
-   goto again;
+   do {
+   rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+   } while (!local64_try_cmpxchg(>prev_count, _raw_count,
+ new_raw_count));
 
/*
 * Now we have the new raw value and have updated the prev
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a..111ab85ee97d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,
 
perf_output_get_handle(handle);
 
+   offset = local_read(>head);
do {
tail = READ_ONCE(rb->user_page->data_tail);
-   offset = head = local_read(>head);
+   head = offset;
if (!rb->overwrite) {
if (unlikely(!ring_buffer_has_space(head, tail,
perf_data_size(rb),
@@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
head += size;
else
head -= size;
-   } while (local_cmpxchg(>head, offset, head) != offset);
+   } while (!local_try_cmpxchg(>head, , head));
 
if (backward) {
offset = head;
-- 
2.39.2



[PATCH v2 4/5] locking/x86: Define arch_try_cmpxchg_local

2023-04-05 Thread Uros Bizjak
Define target specific arch_try_cmpxchg_local. This
definition overrides the generic arch_try_cmpxchg_local
fallback definition and enables target-specific
implementation of try_cmpxchg_local.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/x86/include/asm/cmpxchg.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..540573f515b7 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
 #define __try_cmpxchg(ptr, pold, new, size)\
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
 
+#define __try_cmpxchg_local(ptr, pold, new, size)  \
+   __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
 #define arch_try_cmpxchg(ptr, pold, new)   \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
 
+#define arch_try_cmpxchg_local(ptr, pold, new) \
+   __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
 /*
  * xadd() adds "inc" to "*ptr" and atomically returns the previous
  * value of "*ptr".
-- 
2.39.2



[PATCH v2 3/5] locking/arch: Wire up local_try_cmpxchg

2023-04-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg
and local_cmpxchg using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Huacai Chen 
Cc: WANG Xuerui 
Cc: Jiaxun Yang 
Cc: Jun Yi 
Cc: Thomas Bogendoerfer 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/alpha/include/asm/local.h | 12 ++--
 arch/loongarch/include/asm/local.h | 13 +++--
 arch/mips/include/asm/local.h  | 13 +++--
 arch/powerpc/include/asm/local.h   | 11 +++
 arch/x86/include/asm/local.h   | 13 +++--
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1c93d5..0fcaad642cc3 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -52,8 +52,16 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-   (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+   return cmpxchg_local(>a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+   return try_cmpxchg_local(>a.counter, (s64 *)old, new);
+}
+
 #define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))
 
 /**
diff --git a/arch/loongarch/include/asm/local.h 
b/arch/loongarch/include/asm/local.h
index 65fbbae9fc4d..83e995b30e47 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -56,8 +56,17 @@ static inline long local_sub_return(long i, local_t *l)
return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-   ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+   return cmpxchg_local(>a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+   typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+   return try_cmpxchg_local(>a.counter, __old, new);
+}
+
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1fd273..5daf6fe8e3e9 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -94,8 +94,17 @@ static __inline__ long local_sub_return(long i, local_t * l)
return result;
 }
 
-#define local_cmpxchg(l, o, n) \
-   ((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+static __inline__ long local_cmpxchg(local_t *l, long old, long new)
+{
+   return cmpxchg_local(>a.counter, old, new);
+}
+
+static __inline__ bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+   typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+   return try_cmpxchg_local(>a.counter, __old, new);
+}
+
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19b7fc2..45492fb5bf22 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, 
long n)
return t;
 }
 
+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+   long o = *po, r;
+
+   r = local_cmpxchg(l, o, n);
+   if (unlikely(r != o))
+   *po = r;
+
+   return likely(r == o);
+}
+
 static __inline__ long local_xchg(local_t *l, long n)
 {
long t;
diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47acaa4a..56d4ef604b91 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -120,8 +120,17 @@ static inline long local_sub_return(long i, local_t *l)
 #define local_inc_return(l)  (local_add_return(1, l))
 #define local_dec_return(l)  (local_sub_return(1, l))
 
-#define local_cmpxchg(l, o, n) \
-   (cmpxchg_local(&((l)->a.counter), (o), (n)))
+static inline long local_cmpxchg(local_t *l, long old, long new)
+{
+   return cmpxchg_local(>a.counter, old, new);
+}
+
+static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
+{
+   typeof(l->a.counter) *__old = (typeof(l->a.counter) *) old;
+   return try_cmpxchg_local(>a.counter, __old, new);
+}
+
 /* Always has a lock prefix */
 #define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
 
-- 
2.39.2



[PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg

2023-04-05 Thread Uros Bizjak
Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

 4b3:   48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax
 4ba:   48 8b b8 08 04 00 00mov0x408(%rax),%rdi
 4c1:   8b 42 1cmov0x1c(%rdx),%eax
 4c4:   48 8b 4a 28 mov0x28(%rdx),%rcx
 4c8:   85 c0   test   %eax,%eax
 ...
 4ef:   48 89 c8mov%rcx,%rax
 4f2:   48 0f b1 7a 28  cmpxchg %rdi,0x28(%rdx)
 4f7:   48 39 c1cmp%rax,%rcx
 4fa:   75 b7   jne4b3 <...>

to

 4b2:   48 8b 4a 28 mov0x28(%rdx),%rcx
 4b6:   48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax
 4bd:   48 8b b0 08 04 00 00mov0x408(%rax),%rsi
 4c4:   8b 42 1cmov0x1c(%rdx),%eax
 4c7:   85 c0   test   %eax,%eax
 ...
 4d4:   48 89 c8mov%rcx,%rax
 4d7:   48 0f b1 72 28  cmpxchg %rsi,0x28(%rdx)
 4dc:   0f 85 d0 00 00 00   jne5b2 <...>
 ...
 5b2:   48 89 c1mov%rax,%rcx
 5b5:   e9 fc fe ff ff  jmp4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.
---
v2:

Implement target specific support for local_try_cmpxchg and
local_cmpxchg using typed C wrappers that call their _local
counterpart and provide additional checking of their input
arguments.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Huacai Chen 
Cc: WANG Xuerui 
Cc: Thomas Bogendoerfer 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Arnd Bergmann 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Mark Rutland 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Ian Rogers 
Cc: Will Deacon 
Cc: Boqun Feng 
Cc: Jiaxun Yang 
Cc: Jun Yi 

Uros Bizjak (5):
  locking/atomic: Add generic try_cmpxchg{,64}_local support
  locking/generic: Wire up local{,64}_try_cmpxchg
  locking/arch: Wire up local_try_cmpxchg
  locking/x86: Define arch_try_cmpxchg_local
  events: Illustrate the transition to local{,64}_try_cmpxchg

 arch/alpha/include/asm/local.h  | 12 +--
 arch/loongarch/include/asm/local.h  | 13 +--
 arch/mips/include/asm/local.h   | 13 +--
 arch/powerpc/include/asm/local.h| 11 ++
 arch/x86/events/core.c  |  9 
 arch/x86/include/asm/cmpxchg.h  |  6 ++
 arch/x86/include/asm/local.h| 13 +--
 include/asm-generic/local.h |  1 +
 include/asm-generic/local64.h   | 12 ++-
 include/linux/atomic/atomic-arch-fallback.h | 24 -
 include/linux/atomic/atomic-instrumented.h  | 20 -
 kernel/events/ring_buffer.c |  5 +++--
 scripts/atomic/gen-atomic-fallback.sh   |  4 
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 14 files changed, 126 insertions(+), 19 deletions(-)

-- 
2.39.2



[PATCH v2 2/5] locking/generic: Wire up local{,64}_try_cmpxchg

2023-04-05 Thread Uros Bizjak
Implement generic support for local{,64}_try_cmpxchg.

Redirect to the atomic_ family of functions when the target
does not provide its own local.h definitions.

For 64-bit targets, implement local64_try_cmpxchg and
local64_cmpxchg using typed C wrappers that call local_
family of functions and provide additional checking
of their input arguments.

Cc: Arnd Bergmann 
Signed-off-by: Uros Bizjak 
---
 include/asm-generic/local.h   |  1 +
 include/asm-generic/local64.h | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d84818..7f97018df66f 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
 #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
 
 #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), 
(n))
 #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
 #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
 #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b7d883..14963a7a6253 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -42,7 +42,16 @@ typedef struct {
 #define local64_sub_return(i, l) local_sub_return((i), (&(l)->a))
 #define local64_inc_return(l)  local_inc_return(&(l)->a)
 
-#define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+static inline s64 local64_cmpxchg(local64_t *l, s64 old, s64 new)
+{
+   return local_cmpxchg(>a, old, new);
+}
+
+static inline bool local64_try_cmpxchg(local64_t *l, s64 *old, s64 new)
+{
+   return local_try_cmpxchg(>a, (long *)old, new);
+}
+
 #define local64_xchg(l, n) local_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)local_inc_not_zero(&(l)->a)
@@ -81,6 +90,7 @@ typedef struct {
 #define local64_inc_return(l)  atomic64_inc_return(&(l)->a)
 
 #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), 
(n))
 #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)atomic64_inc_not_zero(&(l)->a)
-- 
2.39.2



[PATCH v2 1/5] locking/atomic: Add generic try_cmpxchg{,64}_local support

2023-04-05 Thread Uros Bizjak
Add generic support for try_cmpxchg{,64}_local and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Signed-off-by: Uros Bizjak 
---
 include/linux/atomic/atomic-arch-fallback.h | 24 -
 include/linux/atomic/atomic-instrumented.h  | 20 -
 scripts/atomic/gen-atomic-fallback.sh   |  4 
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h 
b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..36c92851cdee 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@
 
 #endif /* arch_try_cmpxchg64_relaxed */
 
+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+   if (unlikely(___r != ___o)) \
+   *___op = ___r; \
+   likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+   if (unlikely(___r != ___o)) \
+   *___op = ___r; \
+   likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
 #ifndef arch_atomic_read_acquire
 static __always_inline int
 arch_atomic_read_acquire(const atomic_t *v)
@@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1f49bd4895a4b7a5383906649027205c52ec80ab
diff --git a/include/linux/atomic/atomic-instrumented.h 
b/include/linux/atomic/atomic-instrumented.h
index 7a139ec030b0..14a9212cc987 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+   typeof(ptr) __ai_ptr = (ptr); \
+   typeof(oldp) __ai_oldp = (oldp); \
+   instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+   instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+   arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+   typeof(ptr) __ai_ptr = (ptr); \
+   typeof(oldp) __ai_oldp = (oldp); \
+   instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+   instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+   arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
 #define cmpxchg_double(ptr, ...) \
 ({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 })
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
+// 456e206c7e4e681126c482e4edcc6f46921ac731
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..6e853f0dad8d 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
 done
 
+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+   gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
 grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
 done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh 
b/scripts/atomic/gen-atomic-instrumented.sh
index 77c06526a574..c8165e9431bf 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" 
"try_cmpxchg64"; do
done
 done
 
-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" 
"try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
 done
-- 
2.39.2



Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-04 Thread Uros Bizjak
On Tue, Apr 4, 2023 at 3:19 PM Mark Rutland  wrote:
>
> On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote:
> > On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland  wrote:
> > >
> > > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  
> > > > wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 04:14:22PM +0000, Mark Rutland wrote:
> > > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > > > Cast _oldp to the type of _ptr to avoid 
> > > > > > > > > incompatible-pointer-types warning.
> > > > > > > >
> > > > > > > > Can you give an example of where we are passing an incompatible 
> > > > > > > > pointer?
> > > > > > >
> > > > > > > An example is patch 10/10 from the series, which will fail without
> > > > > > > this fix when fallback code is used. We have:
> > > > > > >
> > > > > > > -   } while (local_cmpxchg(>head, offset, head) != 
> > > > > > > offset);
> > > > > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > > > > >
> > > > > > > where rb->head is defined as:
> > > > > > >
> > > > > > > typedef struct {
> > > > > > >atomic_long_t a;
> > > > > > > } local_t;
> > > > > > >
> > > > > > > while offset is defined as 'unsigned long'.
> > > > > >
> > > > > > Ok, but that's because we're doing the wrong thing to start with.
> > > > > >
> > > > > > Since local_t is defined in terms of atomic_long_t, we should 
> > > > > > define the
> > > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). 
> > > > > > We'll still
> > > > > > have a mismatch between 'long *' and 'unsigned long *', but then we 
> > > > > > can fix
> > > > > > that in the callsite:
> > > > > >
> > > > > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> > > > >
> > > > > Sorry, that should be:
> > > > >
> > > > > while (!local_try_cmpxchg(>head, (long *), head))
> > > >
> > > > The fallbacks are a bit more complicated than above, and are different
> > > > from atomic_try_cmpxchg.
> > > >
> > > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > > > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > > > try_cmpxchg_local is introduced, where it calls
> > > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > > > :
> > > >
> > > > #define local_cmpxchg(l, o, n) \
> > > >(cmpxchg_local(&((l)->a.counter), (o), (n)))
> > > > +#define local_try_cmpxchg(l, po, n) \
> > > > +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> > > >
> > > > which is part of the local_t API. Targets should either define all
> > > > these #defines, or none. There are no partial fallbacks as is the case
> > > > with atomic_t.
> > >
> > > Whether or not there are fallbacks is immaterial.
> > >
> > > In those cases, architectures can just as easily write C wrappers, e.g.
> > >
> > > long local_cmpxchg(local_t *l, long old, long new)
> > > {
> > > return cmpxchg_local(>a.counter, old, new);
> > > }
> > >
> > > long local_try_cmpxchg(local_t *l, long *old, long new)
> > > {
> > > return try_cmpxchg_local(>a.counter, old, new);
> > > }
> >
> > Please find attached the complete prototype patch that implements the
> > above suggestion.
> >
> > The patch includes:
> > - implementation of instrumented try_cmpxchg{,64}_local definitions
> > - corresponding arch_try_cmpxchg{,64}_local fallback definitions
> > - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers
> >
> > - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
> > - x86 specific arch_try_cmpxchg_local definition
> >
> > - kernel/events/ring_buffer.c change to test local_try_cmpxchg
> > implementation and illustrate the transition
> > - arch/x86/events/core.c change to test local64_try_cmpxchg
> > implementation and illustrate the transition
> >
> > The definition of atomic_long_t is different for 64-bit and 32-bit
> > targets (s64 vs int), so target specific C wrappers have to use
> > different casts to account for this difference.
> >
> > Uros.
>
> Thanks for this!
>
> FWIW, the patch (inline below) looks good to me.

Thanks, I will prepare a patch series for submission later today.

Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-04 Thread Uros Bizjak
On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland  wrote:
>
> On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
> > >
> > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  
> > > > > wrote:
> > > > > >
> > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > Cast _oldp to the type of _ptr to avoid 
> > > > > > > incompatible-pointer-types warning.
> > > > > >
> > > > > > Can you give an example of where we are passing an incompatible 
> > > > > > pointer?
> > > > >
> > > > > An example is patch 10/10 from the series, which will fail without
> > > > > this fix when fallback code is used. We have:
> > > > >
> > > > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > > >
> > > > > where rb->head is defined as:
> > > > >
> > > > > typedef struct {
> > > > >atomic_long_t a;
> > > > > } local_t;
> > > > >
> > > > > while offset is defined as 'unsigned long'.
> > > >
> > > > Ok, but that's because we're doing the wrong thing to start with.
> > > >
> > > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). 
> > > > We'll still
> > > > have a mismatch between 'long *' and 'unsigned long *', but then we can 
> > > > fix
> > > > that in the callsite:
> > > >
> > > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> > >
> > > Sorry, that should be:
> > >
> > > while (!local_try_cmpxchg(>head, (long *), head))
> >
> > The fallbacks are a bit more complicated than above, and are different
> > from atomic_try_cmpxchg.
> >
> > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > try_cmpxchg_local is introduced, where it calls
> > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > :
> >
> > #define local_cmpxchg(l, o, n) \
> >(cmpxchg_local(&((l)->a.counter), (o), (n)))
> > +#define local_try_cmpxchg(l, po, n) \
> > +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> >
> > which is part of the local_t API. Targets should either define all
> > these #defines, or none. There are no partial fallbacks as is the case
> > with atomic_t.
>
> Whether or not there are fallbacks is immaterial.
>
> In those cases, architectures can just as easily write C wrappers, e.g.
>
> long local_cmpxchg(local_t *l, long old, long new)
> {
> return cmpxchg_local(>a.counter, old, new);
> }
>
> long local_try_cmpxchg(local_t *l, long *old, long new)
> {
> return try_cmpxchg_local(>a.counter, old, new);
> }

Please find attached the complete prototype patch that implements the
above suggestion.

The patch includes:
- implementation of instrumented try_cmpxchg{,64}_local definitions
- corresponding arch_try_cmpxchg{,64}_local fallback definitions
- generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers

- x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
- x86 specific arch_try_cmpxchg_local definition

- kernel/events/ring_buffer.c change to test local_try_cmpxchg
implementation and illustrate the transition
- arch/x86/events/core.c change to test local64_try_cmpxchg
implementation and illustrate the transition

The definition of atomic_long_t is different for 64-bit and 32-bit
targets (s64 vs int), so target specific C wrappers have to use
different casts to account for this difference.

Uros.
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d096b04bf80e..d9310e9363f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
 * exchange a new raw count - then add that new-prev delta
 * count to the generic event atomically:
 */
-again:
prev_raw_count = local64_read(>prev_count);
-   rdpmcl(hwc->event_base_rdpmc, new_raw_count);
 
-   if (local64_cmpxchg(&

Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-26 Thread Uros Bizjak
On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
>
> On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> > > >
> > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > > > warning.
> > > >
> > > > Can you give an example of where we are passing an incompatible pointer?
> > >
> > > An example is patch 10/10 from the series, which will fail without
> > > this fix when fallback code is used. We have:
> > >
> > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > +   } while (!local_try_cmpxchg(>head, , head));
> > >
> > > where rb->head is defined as:
> > >
> > > typedef struct {
> > >atomic_long_t a;
> > > } local_t;
> > >
> > > while offset is defined as 'unsigned long'.
> >
> > Ok, but that's because we're doing the wrong thing to start with.
> >
> > Since local_t is defined in terms of atomic_long_t, we should define the
> > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll 
> > still
> > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > that in the callsite:
> >
> >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
>
> Sorry, that should be:
>
> while (!local_try_cmpxchg(>head, (long *), head))

The fallbacks are a bit more complicated than above, and are different
from atomic_try_cmpxchg.

Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
are not defined call arch_cmpxchg_local. Also in patch 2/10,
try_cmpxchg_local is introduced, where it calls
arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
:

#define local_cmpxchg(l, o, n) \
   (cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))

which is part of the local_t API. Targets should either define all
these #defines, or none. There are no partial fallbacks as is the case
with atomic_t.

The core of the local_h API is in the local.h header. If the target
doesn't define its own local.h header, then asm-generic/local.h is
used that does exactly what you propose above regarding the usage of
atomic functions.

OTOH, when the target defines its own local.h, then the above
target-dependent #define path applies. The target should define its
own arch_try_cmpxchg_local, otherwise a "generic" target-dependent
fallback that calls target arch_cmpxchg_local applies. In the case of
x86, patch 9/10 enables new instruction by defining
arch_try_cmpxchg_local.

FYI, the patch sequence is carefully chosen so that x86 also exercises
fallback code between different patches in the series.

Targets are free to define local_t to whatever they like, but for some
reason they all define it to:

typedef struct {
atomic_long_t a;
} local_t;

so they have to dig the variable out of the struct like:

#define local_cmpxchg(l, o, n) \
 (cmpxchg_local(&((l)->a.counter), (o), (n)))

Regarding the mismatch of 'long *' vs 'unsigned long *': x86
target-specific code does for try_cmpxchg:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
({ \
bool success; \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \

so, it *does* cast the "old" pointer to the type of "ptr". The generic
code does *not*. This difference is dangerous, since the compilation
of some code involving try_cmpxchg will compile OK for x86 but will
break for other targets that use try_cmpxchg fallback templates (I was
the unlucky one that tripped on this in the past). Please note that
this problem is not specific to the proposed local_try_cmpxchg series,
but affects the existing try_cmpxchg API.

Also, I don't think that "fixing" callsites is the right thing to do.
The generic code should follow x86 and cast the "old" pointer to the
type of "ptr" inside the fallback.

> The fundamenalthing I'm trying to say is that the
> atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
> their try_cmpxchg() implementations, the type signature should be:
>
> ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, 
> ${inttype} new)

This conversion should be performed also for the cmpxchg family of
functions, if desired at all. try_cmpxchg fallback is just cmpxchg
with some extra code around.

Thanks,
Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Uros Bizjak
On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
>
> On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
>
> Can you give an example of where we are passing an incompatible pointer?

An example is patch 10/10 from the series, which will fail without
this fix when fallback code is used. We have:

-   } while (local_cmpxchg(>head, offset, head) != offset);
+   } while (!local_try_cmpxchg(>head, , head));

where rb->head is defined as:

typedef struct {
   atomic_long_t a;
} local_t;

while offset is defined as 'unsigned long'.

The assignment in existing try_cmpxchg template:

typeof(*(_ptr)) *___op = (_oldp)

will trigger an initialization from an incompatible pointer type error.

Please note that x86 avoids this issue by a cast in its
target-dependent definition:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\
({  \
   bool success;   \
   __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
   __typeof__(*(_ptr)) __old = *_old;  \
   __typeof__(*(_ptr)) __new = (_new); \

so, the warning/error will trigger only in the fallback code.

> That sounds indicative of a bug in the caller, but maybe I'm missing some
> reason this is necessary due to some indirection.
>
> > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
>
> I'm not sure that this needs a fixes tag. Does anything go wrong today, or 
> only
> later in this series?

The patch at [1] triggered a build error in posix_acl.c/__get.acl due
to the same problem. The compilation for x86 target was OK, because
x86 defines target-specific arch_try_cmpxchg, but the compilation
broke for targets that revert to generic support. Please note that
this specific problem was recently fixed in a different way [2], but
the issue with the fallback remains.

[1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/
[2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/

Uros.


[PATCH 10/10] perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin

2023-03-05 Thread Uros Bizjak
Use local_try_cmpxchg instead of local_cmpxchg (*ptr, old, new) == old in
__perf_output_begin.  x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg.

Also, local_try_cmpxchg implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

No functional change intended.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Mark Rutland 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Ian Rogers 
Signed-off-by: Uros Bizjak 
---
 kernel/events/ring_buffer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a..e07c10f4d141 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -191,9 +191,10 @@ __perf_output_begin(struct perf_output_handle *handle,
 
perf_output_get_handle(handle);
 
+   offset = local_read(>head);
do {
+   head = offset;
tail = READ_ONCE(rb->user_page->data_tail);
-   offset = head = local_read(>head);
if (!rb->overwrite) {
if (unlikely(!ring_buffer_has_space(head, tail,
perf_data_size(rb),
@@ -217,7 +218,7 @@ __perf_output_begin(struct perf_output_handle *handle,
head += size;
else
head -= size;
-   } while (local_cmpxchg(>head, offset, head) != offset);
+   } while (!local_try_cmpxchg(>head, , head));
 
if (backward) {
offset = head;
-- 
2.39.2



[PATCH 09/10] locking/x86: Enable local{,64}_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Enable local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/x86/include/asm/cmpxchg.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..e8f939e8432e 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
 #define __try_cmpxchg(ptr, pold, new, size)\
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
 
+#define __try_cmpxchg_local(ptr, pold, new, size)  \
+   __raw_try_cmpxchg((ptr), (pold), (new), (size), "")
+
 #define arch_try_cmpxchg(ptr, pold, new)   \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
 
+#define arch_try_cmpxchg_local(ptr, pold, new) 
\
+   __try_cmpxchg_local((ptr), (pold), (new), sizeof(*(ptr)))
+
 /*
  * xadd() adds "inc" to "*ptr" and atomically returns the previous
  * value of "*ptr".
-- 
2.39.2



[PATCH 08/10] locking/generic: Wire up local{,64}_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement generic support for local{,64}_try_cmpxchg.

Cc: Arnd Bergmann 
Signed-off-by: Uros Bizjak 
---
 include/asm-generic/local.h   | 1 +
 include/asm-generic/local64.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/asm-generic/local.h b/include/asm-generic/local.h
index fca7f1d84818..7f97018df66f 100644
--- a/include/asm-generic/local.h
+++ b/include/asm-generic/local.h
@@ -42,6 +42,7 @@ typedef struct
 #define local_inc_return(l) atomic_long_inc_return(&(l)->a)
 
 #define local_cmpxchg(l, o, n) atomic_long_cmpxchg((&(l)->a), (o), (n))
+#define local_try_cmpxchg(l, po, n) atomic_long_try_cmpxchg((&(l)->a), (po), 
(n))
 #define local_xchg(l, n) atomic_long_xchg((&(l)->a), (n))
 #define local_add_unless(l, _a, u) atomic_long_add_unless((&(l)->a), (_a), (u))
 #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
diff --git a/include/asm-generic/local64.h b/include/asm-generic/local64.h
index 765be0b7d883..54b91e93ae76 100644
--- a/include/asm-generic/local64.h
+++ b/include/asm-generic/local64.h
@@ -43,6 +43,7 @@ typedef struct {
 #define local64_inc_return(l)  local_inc_return(&(l)->a)
 
 #define local64_cmpxchg(l, o, n) local_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) local_try_cmpxchg((&(l)->a), (po), (n))
 #define local64_xchg(l, n) local_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) local_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)local_inc_not_zero(&(l)->a)
@@ -81,6 +82,7 @@ typedef struct {
 #define local64_inc_return(l)  atomic64_inc_return(&(l)->a)
 
 #define local64_cmpxchg(l, o, n) atomic64_cmpxchg((&(l)->a), (o), (n))
+#define local64_try_cmpxchg(l, po, n) atomic64_try_cmpxchg((&(l)->a), (po), 
(n))
 #define local64_xchg(l, n) atomic64_xchg((&(l)->a), (n))
 #define local64_add_unless(l, _a, u) atomic64_add_unless((&(l)->a), (_a), (u))
 #define local64_inc_not_zero(l)atomic64_inc_not_zero(&(l)->a)
-- 
2.39.2



[PATCH 06/10] locking/powerpc: Wire up local_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Signed-off-by: Uros Bizjak 
Signed-off-by: Uros Bizjak 
---
 arch/powerpc/include/asm/local.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index bc4bd19b7fc2..45492fb5bf22 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -90,6 +90,17 @@ static __inline__ long local_cmpxchg(local_t *l, long o, 
long n)
return t;
 }
 
+static __inline__ bool local_try_cmpxchg(local_t *l, long *po, long n)
+{
+   long o = *po, r;
+
+   r = local_cmpxchg(l, o, n);
+   if (unlikely(r != o))
+   *po = r;
+
+   return likely(r == o);
+}
+
 static __inline__ long local_xchg(local_t *l, long n)
 {
long t;
-- 
2.39.2



[PATCH 07/10] locking/x86: Wire up local_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/x86/include/asm/local.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 349a47acaa4a..4bcc72906524 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -122,6 +122,8 @@ static inline long local_sub_return(long i, local_t *l)
 
 #define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
 /* Always has a lock prefix */
 #define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
 
-- 
2.39.2



[PATCH 05/10] locking/mips: Wire up local_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg.

Cc: Thomas Bogendoerfer 
Signed-off-by: Uros Bizjak 
---
 arch/mips/include/asm/local.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index 08366b1fd273..b611361a760b 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -96,6 +96,8 @@ static __inline__ long local_sub_return(long i, local_t * l)
 
 #define local_cmpxchg(l, o, n) \
((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
-- 
2.39.2



[PATCH 04/10] locking/loongarch: Wire up local_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg.

Cc: Huacai Chen 
Cc: WANG Xuerui 
Cc: Jiaxun Yang 
Cc: Jun Yi 
Signed-off-by: Uros Bizjak 
---
 arch/loongarch/include/asm/local.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/loongarch/include/asm/local.h 
b/arch/loongarch/include/asm/local.h
index 65fbbae9fc4d..dff6bcbe4821 100644
--- a/arch/loongarch/include/asm/local.h
+++ b/arch/loongarch/include/asm/local.h
@@ -58,6 +58,8 @@ static inline long local_sub_return(long i, local_t *l)
 
 #define local_cmpxchg(l, o, n) \
((long)cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
 #define local_xchg(l, n) (atomic_long_xchg((&(l)->a), (n)))
 
 /**
-- 
2.39.2



[PATCH 03/10] locking/alpha: Wire up local_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Implement target specific support for local_try_cmpxchg.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Signed-off-by: Uros Bizjak 
---
 arch/alpha/include/asm/local.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/alpha/include/asm/local.h b/arch/alpha/include/asm/local.h
index fab26a1c93d5..7eef027e0dde 100644
--- a/arch/alpha/include/asm/local.h
+++ b/arch/alpha/include/asm/local.h
@@ -54,6 +54,8 @@ static __inline__ long local_sub_return(long i, local_t * l)
 
 #define local_cmpxchg(l, o, n) \
(cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
 #define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))
 
 /**
-- 
2.39.2



[PATCH 02/10] locking/atomic: Add generic try_cmpxchg{,64}_local support

2023-03-05 Thread Uros Bizjak
Add generic support for try_cmpxchg{,64}_local and their falbacks.

Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Signed-off-by: Uros Bizjak 
---
 include/linux/atomic/atomic-arch-fallback.h | 24 -
 include/linux/atomic/atomic-instrumented.h  | 20 -
 scripts/atomic/gen-atomic-fallback.sh   |  4 
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h 
b/include/linux/atomic/atomic-arch-fallback.h
index 19debd501ee7..a7d2e5f4e548 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -217,6 +217,28 @@
 
 #endif /* arch_try_cmpxchg64_relaxed */
 
+#ifndef arch_try_cmpxchg_local
+#define arch_try_cmpxchg_local(_ptr, _oldp, _new) \
+({ \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
+   ___r = arch_cmpxchg_local((_ptr), ___o, (_new)); \
+   if (unlikely(___r != ___o)) \
+   *___op = ___r; \
+   likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_local */
+
+#ifndef arch_try_cmpxchg64_local
+#define arch_try_cmpxchg64_local(_ptr, _oldp, _new) \
+({ \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
+   ___r = arch_cmpxchg64_local((_ptr), ___o, (_new)); \
+   if (unlikely(___r != ___o)) \
+   *___op = ___r; \
+   likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg64_local */
+
 #ifndef arch_atomic_read_acquire
 static __always_inline int
 arch_atomic_read_acquire(const atomic_t *v)
@@ -2456,4 +2478,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 1b4d4c82ae653389cd1538d5b07170267d9b3837
+// 9bb8cca3d4cbc000e7068eb7cb4481cb3e48c45a
diff --git a/include/linux/atomic/atomic-instrumented.h 
b/include/linux/atomic/atomic-instrumented.h
index 7a139ec030b0..14a9212cc987 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -2066,6 +2066,24 @@ atomic_long_dec_if_positive(atomic_long_t *v)
arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
+#define try_cmpxchg_local(ptr, oldp, ...) \
+({ \
+   typeof(ptr) __ai_ptr = (ptr); \
+   typeof(oldp) __ai_oldp = (oldp); \
+   instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+   instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+   arch_try_cmpxchg_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
+#define try_cmpxchg64_local(ptr, oldp, ...) \
+({ \
+   typeof(ptr) __ai_ptr = (ptr); \
+   typeof(oldp) __ai_oldp = (oldp); \
+   instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+   instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+   arch_try_cmpxchg64_local(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+
 #define cmpxchg_double(ptr, ...) \
 ({ \
typeof(ptr) __ai_ptr = (ptr); \
@@ -2083,4 +2101,4 @@ atomic_long_dec_if_positive(atomic_long_t *v)
 })
 
 #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
-// 764f741eb77a7ad565dc8d99ce2837d5542e8aee
+// 456e206c7e4e681126c482e4edcc6f46921ac731
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
index 39f447161108..26829a75b644 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -225,6 +225,10 @@ for cmpxchg in "cmpxchg" "cmpxchg64"; do
gen_try_cmpxchg_fallbacks "${cmpxchg}"
 done
 
+for cmpxchg in "cmpxchg_local" "cmpxchg64_local"; do
+   gen_try_cmpxchg_fallback "${cmpxchg}" ""
+done
+
 grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic" "int" ${args}
 done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh 
b/scripts/atomic/gen-atomic-instrumented.sh
index 77c06526a574..c8165e9431bf 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -173,7 +173,7 @@ for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg" 
"try_cmpxchg64"; do
done
 done
 
-for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg"; do
+for xchg in "cmpxchg_local" "cmpxchg64_local" "sync_cmpxchg" 
"try_cmpxchg_local" "try_cmpxchg64_local" ; do
gen_xchg "${xchg}" "" ""
printf "\n"
 done
-- 
2.39.2



[PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-05 Thread Uros Bizjak
Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Signed-off-by: Uros Bizjak 
---
 include/linux/atomic/atomic-arch-fallback.h | 18 +-
 scripts/atomic/gen-atomic-fallback.sh   |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h 
b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..19debd501ee7 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -87,7 +87,7 @@
 #ifndef arch_try_cmpxchg
 #define arch_try_cmpxchg(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -98,7 +98,7 @@
 #ifndef arch_try_cmpxchg_acquire
 #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -109,7 +109,7 @@
 #ifndef arch_try_cmpxchg_release
 #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -120,7 +120,7 @@
 #ifndef arch_try_cmpxchg_relaxed
 #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -157,7 +157,7 @@
 #ifndef arch_try_cmpxchg64
 #define arch_try_cmpxchg64(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -168,7 +168,7 @@
 #ifndef arch_try_cmpxchg64_acquire
 #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -179,7 +179,7 @@
 #ifndef arch_try_cmpxchg64_release
 #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -190,7 +190,7 @@
 #ifndef arch_try_cmpxchg64_relaxed
 #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1b4d4c82ae653389cd1538d5b07170267d9b3837
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..39f447161108 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -171,7 +171,7 @@ cat <

[PATCH 00/10] locking: Introduce local{,64}_try_cmpxchg

2023-03-05 Thread Uros Bizjak
Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

 4b3:   48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax
 4ba:   48 8b b8 08 04 00 00mov0x408(%rax),%rdi
 4c1:   8b 42 1cmov0x1c(%rdx),%eax
 4c4:   48 8b 4a 28 mov0x28(%rdx),%rcx
 4c8:   85 c0   test   %eax,%eax
 ...
 4ef:   48 89 c8mov%rcx,%rax
 4f2:   48 0f b1 7a 28  cmpxchg %rdi,0x28(%rdx)
 4f7:   48 39 c1cmp%rax,%rcx
 4fa:   75 b7   jne4b3 <...>

to

 4b2:   48 8b 4a 28 mov0x28(%rdx),%rcx
 4b6:   48 8b 82 e8 00 00 00mov0xe8(%rdx),%rax
 4bd:   48 8b b0 08 04 00 00mov0x408(%rax),%rsi
 4c4:   8b 42 1cmov0x1c(%rdx),%eax
 4c7:   85 c0   test   %eax,%eax
 ...
 4d4:   48 89 c8mov%rcx,%rax
 4d7:   48 0f b1 72 28  cmpxchg %rsi,0x28(%rdx)
 4dc:   0f 85 d0 00 00 00   jne5b2 <...>
 ...
 5b2:   48 89 c1mov%rax,%rcx
 5b5:   e9 fc fe ff ff  jmp4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Huacai Chen 
Cc: WANG Xuerui 
Cc: Thomas Bogendoerfer 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Arnd Bergmann 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Mark Rutland 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Ian Rogers 
Cc: Will Deacon 
Cc: Boqun Feng 
Cc: Jiaxun Yang 
Cc: Jun Yi 

Uros Bizjak (10):
  locking/atomic: Add missing cast to try_cmpxchg() fallbacks
  locking/atomic: Add generic try_cmpxchg{,64}_local support
  locking/alpha: Wire up local_try_cmpxchg
  locking/loongarch: Wire up local_try_cmpxchg
  locking/mips: Wire up local_try_cmpxchg
  locking/powerpc: Wire up local_try_cmpxchg
  locking/x86: Wire up local_try_cmpxchg
  locking/generic: Wire up local{,64}_try_cmpxchg
  locking/x86: Enable local{,64}_try_cmpxchg
  perf/ring_buffer: use local_try_cmpxchg in __perf_output_begin

 arch/alpha/include/asm/local.h  |  2 ++
 arch/loongarch/include/asm/local.h  |  2 ++
 arch/mips/include/asm/local.h   |  2 ++
 arch/powerpc/include/asm/local.h| 11 ++
 arch/x86/include/asm/cmpxchg.h  |  6 
 arch/x86/include/asm/local.h|  2 ++
 include/asm-generic/local.h |  1 +
 include/asm-generic/local64.h   |  2 ++
 include/linux/atomic/atomic-arch-fallback.h | 40 -
 include/linux/atomic/atomic-instrumented.h  | 20 ++-
 kernel/events/ring_buffer.c |  5 +--
 scripts/atomic/gen-atomic-fallback.sh   |  6 +++-
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 13 files changed, 87 insertions(+), 14 deletions(-)

-- 
2.39.2



Re: [PATCH] compiler-gcc.h: Remove ancient workaround for gcc PR 58670

2022-08-25 Thread Uros Bizjak
On Thu, Aug 25, 2022 at 12:30 PM Naveen N. Rao
 wrote:
>
> Uros Bizjak wrote:
> > The workaround for 'asm goto' miscompilation introduces a compiler
> > barrier quirk that inhibits many useful compiler optimizations. For
> > example, __try_cmpxchg_user compiles to:
> >
> >11375: 41 8b 4d 00 mov0x0(%r13),%ecx
> >11379: 41 8b 02mov(%r10),%eax
> >1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx)
> >11380: 0f 94 c2sete   %dl
> >11383: 84 d2   test   %dl,%dl
> >11385: 75 c4   jne1134b <...>
> >11387: 41 89 02mov%eax,(%r10)
> >
> > where the barrier inhibits flags propagation from asm when
> > compiled with gcc-12.
> >
> > When the mentioned quirk is removed, the following code is generated:
> >
> >11553: 41 8b 4d 00 mov0x0(%r13),%ecx
> >11557: 41 8b 02mov(%r10),%eax
> >1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx)
> >1155e: 74 c9   je 11529 <...>
> >11560: 41 89 02mov%eax,(%r10)
> >
> > The refered compiler bug:
> >
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> >
> > was fixed for gcc-4.8.2.
> >
> > Current minimum required version of GCC is version 5.1 which has
> > the above 'asm goto' miscompilation fixed, so remove the workaround.
> >
> > Signed-off-by: Uros Bizjak 
> > Cc: Andrew Morton 
> > ---
> >  include/linux/compiler-gcc.h | 11 ---
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index a0c55eeaeaf1..9b157b71036f 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -66,17 +66,6 @@
> >   __builtin_unreachable();\
> >   } while (0)
> >
> > -/*
> > - * GCC 'asm goto' miscompiles certain code sequences:
> > - *
> > - *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> > - *
> > - * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
> > - *
> > - * (asm goto is automatically volatile - the naming reflects this.)
> > - */
> > -#define asm_volatile_goto(x...)  do { asm goto(x); asm (""); } while 
> > (0)
> > -
> >  #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP)
> >  #define __HAVE_BUILTIN_BSWAP32__
> >  #define __HAVE_BUILTIN_BSWAP64__
>
> This is causing a build issue on ppc64le with a new patch replacing use
> of unreachable() with __builtin_unreachable() in __WARN_FLAGS():
> https://lore.kernel.org/linuxppc-dev/20220808114908.240813-2...@linux.ibm.com/
>
> during RTL pass: combine
> In file included from /linux/kernel/locking/rtmutex_api.c:9:
> /linux/kernel/locking/rtmutex.c: In function '__rt_mutex_slowlock.constprop':
> /linux/kernel/locking/rtmutex.c:1612:1: internal compiler error: in 
> purge_dead_edges, at cfgrtl.c:3369
>  1612 | }
>   | ^
> 0x142817c internal_error(char const*, ...)
> ???:0
> 0x5c8a1b fancy_abort(char const*, int, char const*)
> ???:0
> 0x72017f purge_all_dead_edges()
> ???:0
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
>
> So, it looks like gcc still has issues with certain uses of asm goto.

This looks like a powerpc specific bug, and has nothing to do with
PR58560 (asm goto miscompilation). If this is indeed a target specific
bug, it should be worked around in
arch/powerpc/include/asm/compiler.h, but please also report it to the
GCC bugzilla.

Uros.