Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-04 Thread Uros Bizjak
On Tue, Apr 4, 2023 at 3:19 PM Mark Rutland  wrote:
>
> On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote:
> > On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland  wrote:
> > >
> > > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  
> > > > wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > > > Cast _oldp to the type of _ptr to avoid 
> > > > > > > > > incompatible-pointer-types warning.
> > > > > > > >
> > > > > > > > Can you give an example of where we are passing an incompatible 
> > > > > > > > pointer?
> > > > > > >
> > > > > > > An example is patch 10/10 from the series, which will fail without
> > > > > > > this fix when fallback code is used. We have:
> > > > > > >
> > > > > > > -   } while (local_cmpxchg(>head, offset, head) != 
> > > > > > > offset);
> > > > > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > > > > >
> > > > > > > where rb->head is defined as:
> > > > > > >
> > > > > > > typedef struct {
> > > > > > >atomic_long_t a;
> > > > > > > } local_t;
> > > > > > >
> > > > > > > while offset is defined as 'unsigned long'.
> > > > > >
> > > > > > Ok, but that's because we're doing the wrong thing to start with.
> > > > > >
> > > > > > Since local_t is defined in terms of atomic_long_t, we should 
> > > > > > define the
> > > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). 
> > > > > > We'll still
> > > > > > have a mismatch between 'long *' and 'unsigned long *', but then we 
> > > > > > can fix
> > > > > > that in the callsite:
> > > > > >
> > > > > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> > > > >
> > > > > Sorry, that should be:
> > > > >
> > > > > while (!local_try_cmpxchg(>head, (long *), head))
> > > >
> > > > The fallbacks are a bit more complicated than above, and are different
> > > > from atomic_try_cmpxchg.
> > > >
> > > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > > > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > > > try_cmpxchg_local is introduced, where it calls
> > > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > > > :
> > > >
> > > > #define local_cmpxchg(l, o, n) \
> > > >(cmpxchg_local(&((l)->a.counter), (o), (n)))
> > > > +#define local_try_cmpxchg(l, po, n) \
> > > > +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> > > >
> > > > which is part of the local_t API. Targets should either define all
> > > > these #defines, or none. There are no partial fallbacks as is the case
> > > > with atomic_t.
> > >
> > > Whether or not there are fallbacks is immaterial.
> > >
> > > In those cases, architectures can just as easily write C wrappers, e.g.
> > >
> > > long local_cmpxchg(local_t *l, long old, long new)
> > > {
> > > return cmpxchg_local(>a.counter, old, new);
> > > }
> > >
> > > long local_try_cmpxchg(local_t *l, long *old, long new)
> > > {
> > > return try_cmpxchg_local(>a.counter, old, new);
> > > }
> >
> > Please find attached the complete prototype patch that implements the
> > above suggestion.
> >
> > The patch includes:
> > - implementation of instrumented try_cmpxchg{,64}_local definitions
> > - corresponding arch_try_cmpxchg{,64}_local fallback definitions
> > - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers
> >
> > - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
> > - x86 specific arch_try_cmpxchg_local definition
> >
> > - kernel/events/ring_buffer.c change to test local_try_cmpxchg
> > implementation and illustrate the transition
> > - arch/x86/events/core.c change to test local64_try_cmpxchg
> > implementation and illustrate the transition
> >
> > The definition of atomic_long_t is different for 64-bit and 32-bit
> > targets (s64 vs int), so target specific C wrappers have to use
> > different casts to account for this difference.
> >
> > Uros.
>
> Thanks for this!
>
> FWIW, the patch (inline below) looks good to me.

Thanks, I will prepare a patch series for submission later today.

Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-04 Thread Mark Rutland
On Tue, Apr 04, 2023 at 02:24:38PM +0200, Uros Bizjak wrote:
> On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland  wrote:
> >
> > On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
> > > >
> > > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > > Cast _oldp to the type of _ptr to avoid 
> > > > > > > > incompatible-pointer-types warning.
> > > > > > >
> > > > > > > Can you give an example of where we are passing an incompatible 
> > > > > > > pointer?
> > > > > >
> > > > > > An example is patch 10/10 from the series, which will fail without
> > > > > > this fix when fallback code is used. We have:
> > > > > >
> > > > > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > > > >
> > > > > > where rb->head is defined as:
> > > > > >
> > > > > > typedef struct {
> > > > > >atomic_long_t a;
> > > > > > } local_t;
> > > > > >
> > > > > > while offset is defined as 'unsigned long'.
> > > > >
> > > > > Ok, but that's because we're doing the wrong thing to start with.
> > > > >
> > > > > Since local_t is defined in terms of atomic_long_t, we should define 
> > > > > the
> > > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). 
> > > > > We'll still
> > > > > have a mismatch between 'long *' and 'unsigned long *', but then we 
> > > > > can fix
> > > > > that in the callsite:
> > > > >
> > > > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> > > >
> > > > Sorry, that should be:
> > > >
> > > > while (!local_try_cmpxchg(>head, (long *), head))
> > >
> > > The fallbacks are a bit more complicated than above, and are different
> > > from atomic_try_cmpxchg.
> > >
> > > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > > try_cmpxchg_local is introduced, where it calls
> > > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > > :
> > >
> > > #define local_cmpxchg(l, o, n) \
> > >(cmpxchg_local(&((l)->a.counter), (o), (n)))
> > > +#define local_try_cmpxchg(l, po, n) \
> > > +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> > >
> > > which is part of the local_t API. Targets should either define all
> > > these #defines, or none. There are no partial fallbacks as is the case
> > > with atomic_t.
> >
> > Whether or not there are fallbacks is immaterial.
> >
> > In those cases, architectures can just as easily write C wrappers, e.g.
> >
> > long local_cmpxchg(local_t *l, long old, long new)
> > {
> > return cmpxchg_local(>a.counter, old, new);
> > }
> >
> > long local_try_cmpxchg(local_t *l, long *old, long new)
> > {
> > return try_cmpxchg_local(>a.counter, old, new);
> > }
> 
> Please find attached the complete prototype patch that implements the
> above suggestion.
> 
> The patch includes:
> - implementation of instrumented try_cmpxchg{,64}_local definitions
> - corresponding arch_try_cmpxchg{,64}_local fallback definitions
> - generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers
> 
> - x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
> - x86 specific arch_try_cmpxchg_local definition
> 
> - kernel/events/ring_buffer.c change to test local_try_cmpxchg
> implementation and illustrate the transition
> - arch/x86/events/core.c change to test local64_try_cmpxchg
> implementation and illustrate the transition
> 
> The definition of atomic_long_t is different for 64-bit and 32-bit
> targets (s64 vs int), so target specific C wrappers have to use
> different casts to account for this difference.
> 
> Uros.

Thanks for this!

FWIW, the patch (inline below) looks good to me.

Mark.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index d096b04bf80e..d9310e9363f1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
>* exchange a new raw count - then add that new-prev delta
>* count to the generic event atomically:
>*/
> -again:
>   prev_raw_count = local64_read(>prev_count);
> - rdpmcl(hwc->event_base_rdpmc, new_raw_count);
>  
> - if (local64_cmpxchg(>prev_count, prev_raw_count,
> - new_raw_count) != prev_raw_count)
> - goto again;
> + do {
> + rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> + } while (!local64_try_cmpxchg(>prev_count, _raw_count,
> +   new_raw_count));
>  
>   /*
>* Now we have the new raw value and have updated the prev
> diff --git 

Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-04 Thread Uros Bizjak
On Mon, Apr 3, 2023 at 12:19 PM Mark Rutland  wrote:
>
> On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
> > >
> > > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  
> > > > > wrote:
> > > > > >
> > > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > > Cast _oldp to the type of _ptr to avoid 
> > > > > > > incompatible-pointer-types warning.
> > > > > >
> > > > > > Can you give an example of where we are passing an incompatible 
> > > > > > pointer?
> > > > >
> > > > > An example is patch 10/10 from the series, which will fail without
> > > > > this fix when fallback code is used. We have:
> > > > >
> > > > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > > >
> > > > > where rb->head is defined as:
> > > > >
> > > > > typedef struct {
> > > > >atomic_long_t a;
> > > > > } local_t;
> > > > >
> > > > > while offset is defined as 'unsigned long'.
> > > >
> > > > Ok, but that's because we're doing the wrong thing to start with.
> > > >
> > > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). 
> > > > We'll still
> > > > have a mismatch between 'long *' and 'unsigned long *', but then we can 
> > > > fix
> > > > that in the callsite:
> > > >
> > > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> > >
> > > Sorry, that should be:
> > >
> > > while (!local_try_cmpxchg(>head, (long *), head))
> >
> > The fallbacks are a bit more complicated than above, and are different
> > from atomic_try_cmpxchg.
> >
> > Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> > are not defined call arch_cmpxchg_local. Also in patch 2/10,
> > try_cmpxchg_local is introduced, where it calls
> > arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> > :
> >
> > #define local_cmpxchg(l, o, n) \
> >(cmpxchg_local(&((l)->a.counter), (o), (n)))
> > +#define local_try_cmpxchg(l, po, n) \
> > +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> >
> > which is part of the local_t API. Targets should either define all
> > these #defines, or none. There are no partial fallbacks as is the case
> > with atomic_t.
>
> Whether or not there are fallbacks is immaterial.
>
> In those cases, architectures can just as easily write C wrappers, e.g.
>
> long local_cmpxchg(local_t *l, long old, long new)
> {
> return cmpxchg_local(>a.counter, old, new);
> }
>
> long local_try_cmpxchg(local_t *l, long *old, long new)
> {
> return try_cmpxchg_local(>a.counter, old, new);
> }

Please find attached the complete prototype patch that implements the
above suggestion.

The patch includes:
- implementation of instrumented try_cmpxchg{,64}_local definitions
- corresponding arch_try_cmpxchg{,64}_local fallback definitions
- generic local{,64}_try_cmpxchg (and local{,64}_cmpxchg) C wrappers

- x86 specific local_try_cmpxchg (and local_cmpxchg) C wrappers
- x86 specific arch_try_cmpxchg_local definition

- kernel/events/ring_buffer.c change to test local_try_cmpxchg
implementation and illustrate the transition
- arch/x86/events/core.c change to test local64_try_cmpxchg
implementation and illustrate the transition

The definition of atomic_long_t is different for 64-bit and 32-bit
targets (s64 vs int), so target specific C wrappers have to use
different casts to account for this difference.

Uros.
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d096b04bf80e..d9310e9363f1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -129,13 +129,12 @@ u64 x86_perf_event_update(struct perf_event *event)
 * exchange a new raw count - then add that new-prev delta
 * count to the generic event atomically:
 */
-again:
prev_raw_count = local64_read(>prev_count);
-   rdpmcl(hwc->event_base_rdpmc, new_raw_count);
 
-   if (local64_cmpxchg(>prev_count, prev_raw_count,
-   new_raw_count) != prev_raw_count)
-   goto again;
+   do {
+   rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+   } while (!local64_try_cmpxchg(>prev_count, _raw_count,
+ new_raw_count));
 
/*
 * Now we have the new raw value and have updated the prev
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 94fbe6ae7431..540573f515b7 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,9 +221,15 @@ extern void __add_wrong_size(void)
 #define __try_cmpxchg(ptr, pold, new, size)\
__raw_try_cmpxchg((ptr), (pold), (new), (size), 

Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-04-03 Thread Mark Rutland
On Sun, Mar 26, 2023 at 09:28:38PM +0200, Uros Bizjak wrote:
> On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
> >
> > On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  
> > > > wrote:
> > > > >
> > > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > > > > warning.
> > > > >
> > > > > Can you give an example of where we are passing an incompatible 
> > > > > pointer?
> > > >
> > > > An example is patch 10/10 from the series, which will fail without
> > > > this fix when fallback code is used. We have:
> > > >
> > > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > > +   } while (!local_try_cmpxchg(>head, , head));
> > > >
> > > > where rb->head is defined as:
> > > >
> > > > typedef struct {
> > > >atomic_long_t a;
> > > > } local_t;
> > > >
> > > > while offset is defined as 'unsigned long'.
> > >
> > > Ok, but that's because we're doing the wrong thing to start with.
> > >
> > > Since local_t is defined in terms of atomic_long_t, we should define the
> > > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll 
> > > still
> > > have a mismatch between 'long *' and 'unsigned long *', but then we can 
> > > fix
> > > that in the callsite:
> > >
> > >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
> >
> > Sorry, that should be:
> >
> > while (!local_try_cmpxchg(>head, (long *), head))
> 
> The fallbacks are a bit more complicated than above, and are different
> from atomic_try_cmpxchg.
> 
> Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
> are not defined call arch_cmpxchg_local. Also in patch 2/10,
> try_cmpxchg_local is introduced, where it calls
> arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
> :
> 
> #define local_cmpxchg(l, o, n) \
>(cmpxchg_local(&((l)->a.counter), (o), (n)))
> +#define local_try_cmpxchg(l, po, n) \
> +   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))
> 
> which is part of the local_t API. Targets should either define all
> these #defines, or none. There are no partial fallbacks as is the case
> with atomic_t.

Whether or not there are fallbacks is immaterial.

In those cases, architectures can just as easily write C wrappers, e.g.

long local_cmpxchg(local_t *l, long old, long new)
{
return cmpxchg_local(>a.counter, old, new);
}

long local_try_cmpxchg(local_t *l, long *old, long new)
{
return try_cmpxchg_local(>a.counter, old, new);
}

> The core of the local_h API is in the local.h header. If the target
> doesn't define its own local.h header, then asm-generic/local.h is
> used that does exactly what you propose above regarding the usage of
> atomic functions.
> 
> OTOH, when the target defines its own local.h, then the above
> target-dependent #define path applies. The target should define its
> own arch_try_cmpxchg_local, otherwise a "generic" target-dependent
> fallback that calls target arch_cmpxchg_local applies. In the case of
> x86, patch 9/10 enables new instruction by defining
> arch_try_cmpxchg_local.
> 
> FYI, the patch sequence is carefully chosen so that x86 also exercises
> fallback code between different patches in the series.
> 
> Targets are free to define local_t to whatever they like, but for some
> reason they all define it to:
> 
> typedef struct {
> atomic_long_t a;
> } local_t;

Yes, which is why I used atomic_long() above.

> so they have to dig the variable out of the struct like:
> 
> #define local_cmpxchg(l, o, n) \
>  (cmpxchg_local(&((l)->a.counter), (o), (n)))
> 
> Regarding the mismatch of 'long *' vs 'unsigned long *': x86
> target-specific code does for try_cmpxchg:
> 
> #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
> ({ \
> bool success; \
> __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
> __typeof__(*(_ptr)) __old = *_old; \
> __typeof__(*(_ptr)) __new = (_new); \
> 
> so, it *does* cast the "old" pointer to the type of "ptr". The generic
> code does *not*. This difference is dangerous, since the compilation
> of some code involving try_cmpxchg will compile OK for x86 but will
> break for other targets that use try_cmpxchg fallback templates (I was
> the unlucky one that tripped on this in the past). Please note that
> this problem is not specific to the proposed local_try_cmpxchg series,
> but affects the existing try_cmpxchg API.

I understand the problem of arch code differing from generic code, and that we
want to have *a* consistent behaviour for hte API.

What I'm saying is that the behaviour we should aim for is where the 'old'
pointer has a specific type (long), and we always require that, as we do for
the various atomic_*() APIs of which local_*() is a cousin.

> Also, I don't think that "fixing" callsites is the right thing to 

Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-26 Thread Uros Bizjak
On Fri, Mar 24, 2023 at 5:33 PM Mark Rutland  wrote:
>
> On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> > On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> > > >
> > > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > > > warning.
> > > >
> > > > Can you give an example of where we are passing an incompatible pointer?
> > >
> > > An example is patch 10/10 from the series, which will fail without
> > > this fix when fallback code is used. We have:
> > >
> > > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > > +   } while (!local_try_cmpxchg(>head, , head));
> > >
> > > where rb->head is defined as:
> > >
> > > typedef struct {
> > >atomic_long_t a;
> > > } local_t;
> > >
> > > while offset is defined as 'unsigned long'.
> >
> > Ok, but that's because we're doing the wrong thing to start with.
> >
> > Since local_t is defined in terms of atomic_long_t, we should define the
> > generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll 
> > still
> > have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> > that in the callsite:
> >
> >   while (!local_try_cmpxchg(>head, &(long *)offset, head))
>
> Sorry, that should be:
>
> while (!local_try_cmpxchg(>head, (long *), head))

The fallbacks are a bit more complicated than above, and are different
from atomic_try_cmpxchg.

Please note in patch 2/10, the falbacks when arch_try_cmpxchg_local
are not defined call arch_cmpxchg_local. Also in patch 2/10,
try_cmpxchg_local is introduced, where it calls
arch_try_cmpxchg_local. Targets (and generic code) simply define (e.g.
:

#define local_cmpxchg(l, o, n) \
   (cmpxchg_local(&((l)->a.counter), (o), (n)))
+#define local_try_cmpxchg(l, po, n) \
+   (try_cmpxchg_local(&((l)->a.counter), (po), (n)))

which is part of the local_t API. Targets should either define all
these #defines, or none. There are no partial fallbacks as is the case
with atomic_t.

The core of the local_h API is in the local.h header. If the target
doesn't define its own local.h header, then asm-generic/local.h is
used that does exactly what you propose above regarding the usage of
atomic functions.

OTOH, when the target defines its own local.h, then the above
target-dependent #define path applies. The target should define its
own arch_try_cmpxchg_local, otherwise a "generic" target-dependent
fallback that calls target arch_cmpxchg_local applies. In the case of
x86, patch 9/10 enables new instruction by defining
arch_try_cmpxchg_local.

FYI, the patch sequence is carefully chosen so that x86 also exercises
fallback code between different patches in the series.

Targets are free to define local_t to whatever they like, but for some
reason they all define it to:

typedef struct {
atomic_long_t a;
} local_t;

so they have to dig the variable out of the struct like:

#define local_cmpxchg(l, o, n) \
 (cmpxchg_local(&((l)->a.counter), (o), (n)))

Regarding the mismatch of 'long *' vs 'unsigned long *': x86
target-specific code does for try_cmpxchg:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock) \
({ \
bool success; \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \

so, it *does* cast the "old" pointer to the type of "ptr". The generic
code does *not*. This difference is dangerous, since the compilation
of some code involving try_cmpxchg will compile OK for x86 but will
break for other targets that use try_cmpxchg fallback templates (I was
the unlucky one that tripped on this in the past). Please note that
this problem is not specific to the proposed local_try_cmpxchg series,
but affects the existing try_cmpxchg API.

Also, I don't think that "fixing" callsites is the right thing to do.
The generic code should follow x86 and cast the "old" pointer to the
type of "ptr" inside the fallback.

> The fundamenalthing I'm trying to say is that the
> atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
> their try_cmpxchg() implementations, the type signature should be:
>
> ${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, 
> ${inttype} new)

This conversion should be performed also for the cmpxchg family of
functions, if desired at all. try_cmpxchg fallback is just cmpxchg
with some extra code around.

Thanks,
Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Fri, Mar 24, 2023 at 04:14:22PM +, Mark Rutland wrote:
> On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> > On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> > >
> > > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > > warning.
> > >
> > > Can you give an example of where we are passing an incompatible pointer?
> > 
> > An example is patch 10/10 from the series, which will fail without
> > this fix when fallback code is used. We have:
> > 
> > -   } while (local_cmpxchg(>head, offset, head) != offset);
> > +   } while (!local_try_cmpxchg(>head, , head));
> > 
> > where rb->head is defined as:
> > 
> > typedef struct {
> >atomic_long_t a;
> > } local_t;
> > 
> > while offset is defined as 'unsigned long'.
> 
> Ok, but that's because we're doing the wrong thing to start with.
> 
> Since local_t is defined in terms of atomic_long_t, we should define the
> generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
> have a mismatch between 'long *' and 'unsigned long *', but then we can fix
> that in the callsite:
> 
>   while (!local_try_cmpxchg(>head, &(long *)offset, head))

Sorry, that should be:

while (!local_try_cmpxchg(>head, (long *), head))

The fundamenalthing I'm trying to say is that the
atomic/atomic64/atomic_long/local/local64 APIs should be type-safe, and for
their try_cmpxchg() implementations, the type signature should be:

${atomictype}_try_cmpxchg(${atomictype} *ptr, ${inttype} *old, 
${inttype} new)

Thanks,
Mark.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Fri, Mar 24, 2023 at 04:43:32PM +0100, Uros Bizjak wrote:
> On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
> >
> > On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types 
> > > warning.
> >
> > Can you give an example of where we are passing an incompatible pointer?
> 
> An example is patch 10/10 from the series, which will fail without
> this fix when fallback code is used. We have:
> 
> -   } while (local_cmpxchg(>head, offset, head) != offset);
> +   } while (!local_try_cmpxchg(>head, , head));
> 
> where rb->head is defined as:
> 
> typedef struct {
>atomic_long_t a;
> } local_t;
> 
> while offset is defined as 'unsigned long'.

Ok, but that's because we're doing the wrong thing to start with.

Since local_t is defined in terms of atomic_long_t, we should define the
generic local_try_cmpxchg() in terms of atomic_long_try_cmpxchg(). We'll still
have a mismatch between 'long *' and 'unsigned long *', but then we can fix
that in the callsite:

while (!local_try_cmpxchg(>head, &(long *)offset, head))

... which then won't silently mask issues elsewhere, and will be consistent
with all the other atomic APIs.

Thanks,
Mark.

> 
> The assignment in existing try_cmpxchg template:
> 
> typeof(*(_ptr)) *___op = (_oldp)
> 
> will trigger an initialization from an incompatible pointer type error.
> 
> Please note that x86 avoids this issue by a cast in its
> target-dependent definition:
> 
> #define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\
> ({  \
>bool success;   \
>__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
>__typeof__(*(_ptr)) __old = *_old;  \
>__typeof__(*(_ptr)) __new = (_new); \
> 
> so, the warning/error will trigger only in the fallback code.
> 
> > That sounds indicative of a bug in the caller, but maybe I'm missing some
> > reason this is necessary due to some indirection.
> >
> > > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
> >
> > I'm not sure that this needs a fixes tag. Does anything go wrong today, or 
> > only
> > later in this series?
> 
> The patch at [1] triggered a build error in posix_acl.c/__get.acl due
> to the same problem. The compilation for x86 target was OK, because
> x86 defines target-specific arch_try_cmpxchg, but the compilation
> broke for targets that revert to generic support. Please note that
> this specific problem was recently fixed in a different way [2], but
> the issue with the fallback remains.
> 
> [1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/
> [2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/
> 
> Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Uros Bizjak
On Fri, Mar 24, 2023 at 3:13 PM Mark Rutland  wrote:
>
> On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> > Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.
>
> Can you give an example of where we are passing an incompatible pointer?

An example is patch 10/10 from the series, which will fail without
this fix when fallback code is used. We have:

-   } while (local_cmpxchg(>head, offset, head) != offset);
+   } while (!local_try_cmpxchg(>head, , head));

where rb->head is defined as:

typedef struct {
   atomic_long_t a;
} local_t;

while offset is defined as 'unsigned long'.

The assignment in existing try_cmpxchg template:

typeof(*(_ptr)) *___op = (_oldp)

will trigger an initialization from an incompatible pointer type error.

Please note that x86 avoids this issue by a cast in its
target-dependent definition:

#define __raw_try_cmpxchg(_ptr, _pold, _new, size, lock)\
({  \
   bool success;   \
   __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
   __typeof__(*(_ptr)) __old = *_old;  \
   __typeof__(*(_ptr)) __new = (_new); \

so, the warning/error will trigger only in the fallback code.

> That sounds indicative of a bug in the caller, but maybe I'm missing some
> reason this is necessary due to some indirection.
>
> > Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
>
> I'm not sure that this needs a fixes tag. Does anything go wrong today, or 
> only
> later in this series?

The patch at [1] triggered a build error in posix_acl.c/__get.acl due
to the same problem. The compilation for x86 target was OK, because
x86 defines target-specific arch_try_cmpxchg, but the compilation
broke for targets that revert to generic support. Please note that
this specific problem was recently fixed in a different way [2], but
the issue with the fallback remains.

[1] https://lore.kernel.org/lkml/20220714173819.13312-1-ubiz...@gmail.com/
[2] https://lore.kernel.org/lkml/20221201160103.76012-1-ubiz...@gmail.com/

Uros.


Re: [PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-24 Thread Mark Rutland
On Sun, Mar 05, 2023 at 09:56:19PM +0100, Uros Bizjak wrote:
> Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Can you give an example of where we are passing an incompatible pointer?

That sounds indicative of a bug in the caller, but maybe I'm missing some
reason this is necessary due to some indirection.

> Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")

I'm not sure that this needs a fixes tag. Does anything go wrong today, or only
later in this series?

Thanks,
Mark.

> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Boqun Feng 
> Cc: Mark Rutland 
> Signed-off-by: Uros Bizjak 
> ---
>  include/linux/atomic/atomic-arch-fallback.h | 18 +-
>  scripts/atomic/gen-atomic-fallback.sh   |  2 +-
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/atomic/atomic-arch-fallback.h 
> b/include/linux/atomic/atomic-arch-fallback.h
> index 77bc5522e61c..19debd501ee7 100644
> --- a/include/linux/atomic/atomic-arch-fallback.h
> +++ b/include/linux/atomic/atomic-arch-fallback.h
> @@ -87,7 +87,7 @@
>  #ifndef arch_try_cmpxchg
>  #define arch_try_cmpxchg(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -98,7 +98,7 @@
>  #ifndef arch_try_cmpxchg_acquire
>  #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -109,7 +109,7 @@
>  #ifndef arch_try_cmpxchg_release
>  #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -120,7 +120,7 @@
>  #ifndef arch_try_cmpxchg_relaxed
>  #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -157,7 +157,7 @@
>  #ifndef arch_try_cmpxchg64
>  #define arch_try_cmpxchg64(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -168,7 +168,7 @@
>  #ifndef arch_try_cmpxchg64_acquire
>  #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -179,7 +179,7 @@
>  #ifndef arch_try_cmpxchg64_release
>  #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -190,7 +190,7 @@
>  #ifndef arch_try_cmpxchg64_relaxed
>  #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
>  ({ \
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
>   ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
>   if (unlikely(___r != ___o)) \
>   *___op = ___r; \
> @@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
>  #endif
>  
>  #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// b5e87bdd5ede61470c29f7a7e4de781af3770f09
> +// 1b4d4c82ae653389cd1538d5b07170267d9b3837
> diff --git a/scripts/atomic/gen-atomic-fallback.sh 
> b/scripts/atomic/gen-atomic-fallback.sh
> index 3a07695e3c89..39f447161108 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -171,7 +171,7 @@ cat <  #ifndef arch_try_${cmpxchg}${order}
>  #define arch_try_${cmpxchg}${order}(_ptr, _oldp, _new) \\
>  ({ \\
> - typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
> + typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \\
>   ___r = 

[PATCH 01/10] locking/atomic: Add missing cast to try_cmpxchg() fallbacks

2023-03-05 Thread Uros Bizjak
Cast _oldp to the type of _ptr to avoid incompatible-pointer-types warning.

Fixes: 29f006fdefe6 ("asm-generic/atomic: Add try_cmpxchg() fallbacks")
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Signed-off-by: Uros Bizjak 
---
 include/linux/atomic/atomic-arch-fallback.h | 18 +-
 scripts/atomic/gen-atomic-fallback.sh   |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic/atomic-arch-fallback.h 
b/include/linux/atomic/atomic-arch-fallback.h
index 77bc5522e61c..19debd501ee7 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -87,7 +87,7 @@
 #ifndef arch_try_cmpxchg
 #define arch_try_cmpxchg(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -98,7 +98,7 @@
 #ifndef arch_try_cmpxchg_acquire
 #define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -109,7 +109,7 @@
 #ifndef arch_try_cmpxchg_release
 #define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -120,7 +120,7 @@
 #ifndef arch_try_cmpxchg_relaxed
 #define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -157,7 +157,7 @@
 #ifndef arch_try_cmpxchg64
 #define arch_try_cmpxchg64(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -168,7 +168,7 @@
 #ifndef arch_try_cmpxchg64_acquire
 #define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -179,7 +179,7 @@
 #ifndef arch_try_cmpxchg64_release
 #define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -190,7 +190,7 @@
 #ifndef arch_try_cmpxchg64_relaxed
 #define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \
 ({ \
-   typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+   typeof(*(_ptr)) *___op = (typeof(_ptr))(_oldp), ___o = *___op, ___r; \
___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \
if (unlikely(___r != ___o)) \
*___op = ___r; \
@@ -2456,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// b5e87bdd5ede61470c29f7a7e4de781af3770f09
+// 1b4d4c82ae653389cd1538d5b07170267d9b3837
diff --git a/scripts/atomic/gen-atomic-fallback.sh 
b/scripts/atomic/gen-atomic-fallback.sh
index 3a07695e3c89..39f447161108 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -171,7 +171,7 @@ cat <