On Tue, 2024-01-23 at 11:28 +0100, Jan Beulich wrote:
> On 23.01.2024 11:15, Oleksii wrote:
> > On Mon, 2024-01-22 at 17:27 +0100, Jan Beulich wrote:
> > > On 22.12.2023 16:12, Oleksii Kurochko wrote:
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory" ); \
> > > > + break; \
> > > > + case 8: \
> > > > + asm volatile( \
> > > > + " amoswap.d %0, %2, %1\n" \
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory" ); \
> > > > + break; \
> > > > + default: \
> > > > + ASSERT_UNREACHABLE(); \
> > >
> > > If at all possible this wants to trigger a build failure, not a
> > > runtime
> > > one.
> > I'll update that with BUILD_BUG_ON(size > 8);
>
> What about size accidentally being e.g. 5? I'm also not sure you'll
> be
> able to use BUILD_BUG_ON() here: That'll depend on what use sites
> there
> are. And if all present ones are okay in this regard, you'd still set
> out a trap for someone else to fall into later. We have a common
> approach for this, which currently is under re-work. See
> https://lists.xen.org/archives/html/xen-devel/2024-01/msg01115.html.
Thanks. I'll use that.
>
> > > > + } \
> > > > + ret__; \
> > > > +})
> > > > +
> > > > +#define xchg_relaxed(ptr, x) \
> > > > +({ \
> > > > + __typeof__(*(ptr)) x_ = (x); \
> > > > + (__typeof__(*(ptr))) __xchg_relaxed((ptr), x_,
> > > > sizeof(*(ptr))); \
> > >
> > > Nit: Stray blank after cast. For readability I'd also suggest to
> > > drop parentheses in cases like the first argument passed to
> > > __xchg_relaxed() here.
> > Thanks. I'll take that into account.
> >
> > >
> > > > +})
> > >
> > > For both: What does "relaxed" describe? I'm asking because it's
> > > not
> > > really clear whether the memory clobbers are actually needed.
> > >
> > > > +#define __xchg_acquire(ptr, new, size) \
> > > > +({ \
> > > > + __typeof__(ptr) ptr__ = (ptr); \
> > > > + __typeof__(new) new__ = (new); \
> > > > + __typeof__(*(ptr)) ret__; \
> > > > + switch (size) \
> > > > + { \
> > > > + case 4: \
> > > > + asm volatile( \
> > > > + " amoswap.w %0, %2, %1\n" \
> > > > + RISCV_ACQUIRE_BARRIER \
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory" ); \
> > > > + break; \
> > > > + case 8: \
> > > > + asm volatile( \
> > > > + " amoswap.d %0, %2, %1\n" \
> > > > + RISCV_ACQUIRE_BARRIER \
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory" ); \
> > > > + break; \
> > > > + default: \
> > > > + ASSERT_UNREACHABLE(); \
> > > > + } \
> > > > + ret__; \
> > > > +})
> > >
> > > If I'm not mistaken this differs from __xchg_relaxed() only in
> > > the
> > > use
> > > of RISCV_ACQUIRE_BARRIER, and ...
> > >
> > > > +#define xchg_acquire(ptr, x) \
> > > > +({ \
> > > > + __typeof__(*(ptr)) x_ = (x); \
> > > > + (__typeof__(*(ptr))) __xchg_acquire((ptr), x_,
> > > > sizeof(*(ptr))); \
> > > > +})
> > > > +
> > > > +#define __xchg_release(ptr, new, size) \
> > > > +({ \
> > > > + __typeof__(ptr) ptr__ = (ptr); \
> > > > + __typeof__(new) new__ = (new); \
> > > > + __typeof__(*(ptr)) ret__; \
> > > > + switch (size) \
> > > > + { \
> > > > + case 4: \
> > > > + asm volatile ( \
> > > > + RISCV_RELEASE_BARRIER \
> > > > + " amoswap.w %0, %2, %1\n" \
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory"); \
> > > > + break; \
> > > > + case 8: \
> > > > + asm volatile ( \
> > > > + RISCV_RELEASE_BARRIER \
> > > > + " amoswap.d %0, %2, %1\n" \
> > > > + : "=r" (ret__), "+A" (*ptr__) \
> > > > + : "r" (new__) \
> > > > + : "memory"); \
> > > > + break; \
> > > > + default: \
> > > > + ASSERT_UNREACHABLE(); \
> > > > + } \
> > > > + ret__; \
> > > > +})
> > >
> > > this only in the use of RISCV_RELEASE_BARRIER. If so they likely
> > > want
> > > folding, to limit redundancy and make eventual updating easier.
> > > (Same
> > > for the cmpxchg helper further down, as it seems.)
> > Yes, you are right. The difference is only in RISCV_RELEASE_BARRIER
> > and
> > it is an absence of RISCV_RELEASE_BARRIER is a reason why we have
> > "relaxed" versions.
> >
> > I am not sure that I understand what you mean by folding here. Do
> > you
> > mean that there is no any sense to have to separate macros and it
> > is
> > needed only one with RISCV_RELEASE_BARRIER?
>
> No. You should parameterize the folded common macro for the derived
> macros to simply pass in the barriers needed, with empty macro
> arguments indicating "this barrier not needed".
Now it makes sense to me. Thank you for explanation.
>
> > > > +#define xchg_release(ptr, x) \
> > > > +({ \
> > > > + __typeof__(*(ptr)) x_ = (x); \
> > > > + (__typeof__(*(ptr))) __xchg_release((ptr), x_,
> > > > sizeof(*(ptr))); \
> > > > +})
> > > > +
> > > > +static always_inline uint32_t __xchg_case_4(volatile uint32_t
> > > > *ptr,
> > > > + uint32_t new)
> > > > +{
> > > > + __typeof__(*(ptr)) ret;
> > > > +
> > > > + asm volatile (
> > > > + " amoswap.w.aqrl %0, %2, %1\n"
> > > > + : "=r" (ret), "+A" (*ptr)
> > > > + : "r" (new)
> > > > + : "memory" );
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static always_inline uint64_t __xchg_case_8(volatile uint64_t
> > > > *ptr,
> > > > + uint64_t new)
> > > > +{
> > > > + __typeof__(*(ptr)) ret;
> > > > +
> > > > + asm volatile( \
> > > > + " amoswap.d.aqrl %0, %2, %1\n" \
> > > > + : "=r" (ret), "+A" (*ptr) \
> > > > + : "r" (new) \
> > > > + : "memory" ); \
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > > > uint32_t *ptr,
> > > > + uint32_t
> > > > old,
> > > > + uint32_t
> > > > new);
> > >
> > > Don't you consistently mean uint16_t here (incl the return type)
> > > and
> > > ...
> > >
> > > > +static always_inline unsigned short __cmpxchg_case_1(volatile
> > > > uint32_t *ptr,
> > > > + uint32_t
> > > > old,
> > > > + uint32_t
> > > > new);
> > >
> > > ... uint8_t here?
> > The idea was that we emulate __cmpxchg_case_1 and __cmpxchg_case_2
> > using 4 bytes cmpxchg and __cmpxchg_case_4 expects uint32_t.
>
> I consider this wrong. The functions would better be type-correct.
>
> > > > +static inline unsigned long __xchg(volatile void *ptr,
> > > > unsigned
> > > > long x, int size)
> > > > +{
> > > > + switch (size) {
> > > > + case 1:
> > > > + return __cmpxchg_case_1(ptr, (uint32_t)-1, x);
> > > > + case 2:
> > > > + return __cmpxchg_case_2(ptr, (uint32_t)-1, x);
> > >
> > > How are these going to work? You'll compare against ~0, and if
> > > the
> > > value
> > > in memory isn't ~0, memory won't be updated; you will only
> > > (correctly)
> > > return the value found in memory.
> > >
> > > Or wait - looking at __cmpxchg_case_{1,2}() far further down, you
> > > ignore
> > > "old" there. Which apparently means they'll work for the use
> > > here,
> > > but
> > > not for the use in __cmpxchg().
> > Yes, the trick is that old is ignored and is read in
> > __emulate_cmpxchg_case1_2() before __cmpxchg_case_4 is called:
> > do
> > {
> > read_val =
> > read_func(aligned_ptr);
> > swapped_new = read_val &
> > ~mask;
> > swapped_new |=
> > masked_new;
> > ret = cmpxchg_func(aligned_ptr, read_val,
> > swapped_new);
> > } while ( ret != read_val
> > );
> > read_val it is 'old'.
> >
> > But now I am not 100% sure that it is correct for __cmpxchg...
>
> It just can't be correct - you can't ignore "old" there. I think you
> want simple cmpxchg primitives, which xchg then uses in a loop (while
> cmpxchg uses them plainly).
But xchg doesn't require 'old' value, so it should be ignored in some
way by cmpxchg.
>
> > > > +static always_inline unsigned short __cmpxchg_case_2(volatile
> > > > uint32_t *ptr,
> > > > + uint32_t
> > > > old,
> > > > + uint32_t
> > > > new)
> > > > +{
> > > > + (void) old;
> > > > +
> > > > + if (((unsigned long)ptr & 3) == 3)
> > > > + {
> > > > +#ifdef CONFIG_64BIT
> > > > + return __emulate_cmpxchg_case1_2((uint64_t *)ptr, new,
> > > > + readq,
> > > > __cmpxchg_case_8,
> > > > 0xffffU);
> > >
> > > What if ((unsigned long)ptr & 7) == 7 (which is a sub-case of
> > > what
> > > the
> > > if() above checks for? Isn't it more reasonable to require
> > > aligned
> > > 16-bit quantities here? Or if mis-aligned addresses are okay, you
> > > could
> > > as well emulate using __cmpxchg_case_4().
> > Yes, it will be more reasonable. I'll use IS_ALIGNED instead.
>
> Not sure I get your use of "instead" here correctly. There's more
> to change here than just the if() condition.
I meant something like:
if ( IS_ALIGNED(ptr, 16) )
__emulate_cmpxchg_case1_2(...);
else
assert_failed("ptr isn't aligned\n");
I have to check if it is necessary to have an mis-aligned address for
CAS intstuctions.
If mis-aligned address is okay then it looks like we can use always
__cmpxchng_case_4 without checking if a ptr is ALIGNED or not. Or I
started to loose something...
~ Oleksii