RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:59 > > On Sat, 30 Dec 2023 at 12:41, Linus Torvalds > wrote: > > > > UNTESTED patch to just do the "this_cpu_write()" parts attached. > > Again, note how we do end up doing that this_cpu_ptr conversion later > > anyway, but at least it's off the

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:41 > > On Fri, 29 Dec 2023 at 12:57, David Laight wrote: > > > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > the latter can use an 'offset from register' (%gs for x86-84). > > > > Add a 'self' field to 'struct

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Waiman Long > Sent: 31 December 2023 03:04 > The presence of debug_smp_processor_id in your compiled code is likely > due to the setting of CONFIG_DEBUG_PREEMPT in your kernel config. > > #ifdef CONFIG_DEBUG_PREEMPT >   extern unsigned int debug_smp_processor_id(void); > # define

Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread Waiman Long
On 12/30/23 06:35, David Laight wrote: From: Ingo Molnar Sent: 30 December 2023 11:09 * Waiman Long wrote: On 12/29/23 15:57, David Laight wrote: this_cpu_ptr() is rather more expensive than raw_cpu_read() since the latter can use an 'offset from register' (%gs for x86-84). Add a

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread David Laight
From: Ingo Molnar > Sent: 30 December 2023 20:38 > > * David Laight wrote: > > > bool osq_lock(struct optimistic_spin_queue *lock) > > { > > - struct optimistic_spin_node *node = this_cpu_ptr(_node); > > + struct optimistic_spin_node *node = raw_cpu_read(osq_node.self); > > struct

Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread Linus Torvalds
On Sat, 30 Dec 2023 at 12:41, Linus Torvalds wrote: > > UNTESTED patch to just do the "this_cpu_write()" parts attached. > Again, note how we do end up doing that this_cpu_ptr conversion later > anyway, but at least it's off the critical path. Also note that while 'this_cpu_ptr()' doesn't

Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread Linus Torvalds
On Fri, 29 Dec 2023 at 12:57, David Laight wrote: > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > the latter can use an 'offset from register' (%gs for x86-84). > > Add a 'self' field to 'struct optimistic_spin_node' that can be > read with raw_cpu_read(), initialise on

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread David Laight
From: Ingo Molnar > Sent: 30 December 2023 11:09 > > > * Waiman Long wrote: > > > On 12/29/23 15:57, David Laight wrote: > > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > > the latter can use an 'offset from register' (%gs for x86-84). > > > > > > Add a 'self' field

Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread Ingo Molnar
* Waiman Long wrote: > On 12/29/23 15:57, David Laight wrote: > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > the latter can use an 'offset from register' (%gs for x86-84). > > > > Add a 'self' field to 'struct optimistic_spin_node' that can be > > read with

Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-29 Thread Waiman Long
On 12/29/23 15:57, David Laight wrote: this_cpu_ptr() is rather more expensive than raw_cpu_read() since the latter can use an 'offset from register' (%gs for x86-84). Add a 'self' field to 'struct optimistic_spin_node' that can be read with raw_cpu_read(), initialise on first call.

[PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-29 Thread David Laight
this_cpu_ptr() is rather more expensive than raw_cpu_read() since the latter can use an 'offset from register' (%gs for x86-84). Add a 'self' field to 'struct optimistic_spin_node' that can be read with raw_cpu_read(), initialise on first call. Signed-off-by: David Laight ---