Re: [PATCH 3/3] locking/atomic: openrisc: use generic_cmpxchg[64]_local for arch_cmpxchg[64]_local

2023-10-29 Thread wuqiang.matt

On 2023/10/30 10:22, Vineet Gupta wrote:



On 10/28/23 20:26, Masami Hiramatsu (Google) wrote:

On Sun, 29 Oct 2023 00:40:17 +0800
"wuqiang.matt"  wrote:


On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:

Hi Wuqiang,

On Thu, 26 Oct 2023 19:05:51 +0800
"wuqiang.matt"  wrote:


On 2023/10/26 16:46, Arnd Bergmann wrote:

On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:

arch_cmpxchg[64]_local() are not defined for openrisc. So implement
them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.

Closes:
https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
Closes:
https://lore.kernel.org/oe-kbuild-all/202310241310.ir5uukog-...@intel.com

Signed-off-by: wuqiang.matt 

I think on architectures that have actual atomics, you
generally want to define this to be the same as arch_cmpxchg()
rather than the generic version.

It depends on the relative cost of doing one atomic compared
to an irq-disable/enable pair, but everyone else went with
the former if they could. The exceptions are armv4/armv5,
sparc32 and parisc, which don't have a generic cmpxchg()
or similar operation.

Sure, better native than the generic. I'll try to collect more
insights before next move.

So I will temporally remove the last change (use arch_cmpxchg_local
in objpool) until these series are rewritten with arch native code,
so that the next release will not break the kernel build.

Ok, it's fine to me. Thank you.



But this must be fixed because arch_cmpxchg_local() is required
for each arch anyway.

Yes. I'm working on the new update for arc/openrisc/hexagon. It would
be better resolve this issue first, then consider the objpool update
of using arch_cmpxchg_local.


You could do the thing that sparc64 and xtensa do, which
use the native cmpxchg for supported word sizes but the
generic version for 1- and 2-byte swaps, but that has its
own set of problems if you end up doing operations on both
the entire word and a sub-unit of the same thing.

Thank you for pointing out this. I'll do some research on these
implementations.

arc also has the LL-SC instruction but depends on the core feature,
so I think we can use it.

Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:

arch_cmpxchg_relaxed:
...
  switch(sizeof((_p_))) {
  case 4:


arch_cmpxchg:
...
BUILD_BUG_ON(sizeof(_p_) != 4);
...

_p is the address pointer, so I'm thinking it's a typo but I couldn't
yet confirm. There is not much about arc processors in the web :(

Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
can take a 32bit address or 32bit data register. Usually it should
check the size of data, but need to check with ISA manual.

Vineet, can you check this suspicious bug?


ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
So the pointers will be 32-bit anyways. Is the issue that pointer/cmpxchg 
operation could be on a smaller data type ?


For ARCv2 with CONFIG_ARC_HAS_LLSC, better add the data size checking and
only permit 32bit data size. Even for 32-bit system, data should can be
64bit 'long long'.

And In the case that CONFIG_ARC_HAS_LLSC is undefined, in arch_cmpxchg: the
pointer size checking is unnecessary, since it's using spinlock internally:

https://elixir.bootlin.com/linux/v6.6-rc7/source/arch/arc/include/asm/cmpxchg.h#L60: 


BUILD_BUG_ON(sizeof(_p_) != 4); \
\
/*  \
 * spin lock/unlock provide the needed smp_mb() before/after\
 */ \
atomic_ops_lock(__flags);   \
_prev_ = *_p_;  \
if (_prev_ == _o_)  \
*_p_ = _n_; \
atomic_ops_unlock(__flags);

Another question about the naming: arch_cmpxchg_relaxed() implemented if
CONFIG_ARC_HAS_LLSC is configured and arch_cmpxchg() defined for the rest.
Are there any reasons for difference names ?

As I checked, Synopsys has released 64bit ARC processors (HS66/HS68), but
I don't know the status of Linux kernel support.


-Vineet


Regards,
wuqiang



Re: [PATCH 3/3] locking/atomic: openrisc: use generic_cmpxchg[64]_local for arch_cmpxchg[64]_local

2023-10-29 Thread Vineet Gupta




On 10/28/23 20:26, Masami Hiramatsu (Google) wrote:

On Sun, 29 Oct 2023 00:40:17 +0800
"wuqiang.matt"  wrote:


On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:

Hi Wuqiang,

On Thu, 26 Oct 2023 19:05:51 +0800
"wuqiang.matt"  wrote:


On 2023/10/26 16:46, Arnd Bergmann wrote:

On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:

arch_cmpxchg[64]_local() are not defined for openrisc. So implement
them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.

Closes:
https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
Closes:
https://lore.kernel.org/oe-kbuild-all/202310241310.ir5uukog-...@intel.com

Signed-off-by: wuqiang.matt 

I think on architectures that have actual atomics, you
generally want to define this to be the same as arch_cmpxchg()
rather than the generic version.

It depends on the relative cost of doing one atomic compared
to an irq-disable/enable pair, but everyone else went with
the former if they could. The exceptions are armv4/armv5,
sparc32 and parisc, which don't have a generic cmpxchg()
or similar operation.

Sure, better native than the generic. I'll try to collect more
insights before next move.

So I will temporally remove the last change (use arch_cmpxchg_local
in objpool) until these series are rewritten with arch native code,
so that the next release will not break the kernel build.

Ok, it's fine to me. Thank you.



But this must be fixed because arch_cmpxchg_local() is required
for each arch anyway.

Yes. I'm working on the new update for arc/openrisc/hexagon. It would
be better resolve this issue first, then consider the objpool update
of using arch_cmpxchg_local.


You could do the thing that sparc64 and xtensa do, which
use the native cmpxchg for supported word sizes but the
generic version for 1- and 2-byte swaps, but that has its
own set of problems if you end up doing operations on both
the entire word and a sub-unit of the same thing.

Thank you for pointing out this. I'll do some research on these
implementations.

arc also has the LL-SC instruction but depends on the core feature,
so I think we can use it.

Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:

arch_cmpxchg_relaxed:
...
  switch(sizeof((_p_))) {
  case 4:


arch_cmpxchg:
...
BUILD_BUG_ON(sizeof(_p_) != 4); 
...

_p is the address pointer, so I'm thinking it's a typo but I couldn't
yet confirm. There is not much about arc processors in the web :(

Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
can take a 32bit address or 32bit data register. Usually it should
check the size of data, but need to check with ISA manual.

Vineet, can you check this suspicious bug?


ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
So the pointers will be 32-bit anyways. Is the issue that 
pointer/cmpxchg operation could be on a smaller data type ?


-Vineet