[RFC 3/8] mm/damon/core: expose nr_accesses_bp from damos_before_apply tracepoint

2023-09-09 Thread SeongJae Park
damos_before_apply tracepoint is exposing access rate of DAMON regions
using nr_accesses, which was actually used by DAMOS in the past.
However, it has changed to use nr_accesses_bp instead.  Update the
tracepoint to expose the value that DAMOS is really using.  Note that it
doesn't expose the value as is in the basis point, but after converting
it to the natural number by dividing it by 10,000.  That's for avoiding
confuses for old users.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 9e7b39495b05..6f98198c0104 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -34,7 +34,7 @@ TRACE_EVENT(damos_before_apply,
__entry->target_idx = target_idx;
__entry->start = r->ar.start;
__entry->end = r->ar.end;
-   __entry->nr_accesses = r->nr_accesses;
+   __entry->nr_accesses = r->nr_accesses_bp / 1;
__entry->age = r->age;
__entry->nr_regions = nr_regions;
),
-- 
2.25.1



Re: [GIT PULL] bcachefs

2023-09-09 Thread Kent Overstreet
On Wed, Sep 06, 2023 at 01:20:59PM -0700, Linus Torvalds wrote:
> On Wed, 6 Sept 2023 at 13:02, Linus Torvalds
>  wrote:
> >
> > And guess what happens when you have (unsigned char)-1? It does *not*
> > cast back to -1.
> 
> Side note: again, this may be one of those "it works in practice",
> because if we have -fshort-enums, I think 'enum
> btree_node_locked_type' in turn ends up being represented as a 'signed
> char', because that's the smallest simple type that can fit all those
> values.
> 
> I don't think gcc ever uses less than that (ie while a six_lock_type
> could fit in two bits, it's still going to be considered at least a
> 8-bit value in practice).
> 
> So we may have 'enum six_lock_type' essentially being 'unsigned char',
> and when the code does
> 
> mark_btree_node_locked(trans, path, 0, BTREE_NODE_UNLOCKED);
> 
> that BTREE_NODE_UNLOCKED value might actually be 255.
> 
> And then when it's cast to 'enum btree_node_locked_type' in the inline
> function, the 255 will be cast to 'signed char', and we'll end up
> compatible with '(enum btree_node_locked_type)-1' again.
> 
> So it's one of those things that are seriously wrong to do, but might
> generate the expected code anyway.
> 
> Unless the compiler adds any other sanity checks, like UBSAN or
> something, that actually uses the exact range of the enums.
> 
> It could happen even without UBSAN, if the compiler ends up going "I
> can see that the original value came from a 'enum six_lock_type', so I
> know the original value can't be signed, so any comparison with
> BTREE_NODE_UNLOCKED can never be true.
> 
> But again, I suspect that in practice this all just happens to work.
> That doesn't make it right.

No, this was just broken - it should have been
mark_btree_node_unlocked(), we never should've been passing that enum
val there.


Re: [PATCH] Fix typo in tpmrm class definition

2023-09-09 Thread Ivan Orlov

On 08.09.2023 18:06, Justin M. Forbes wrote:

Commit d2e8071bed0be ("tpm: make all 'class' structures const")
unfortunately had a typo for the name on tpmrm.

Fixes: d2e8071bed0b ("tpm: make all 'class' structures const")
Signed-off-by: Justin M. Forbes 
---
  drivers/char/tpm/tpm-chip.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 23f6f2eda84c..42b1062e33cd 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -33,7 +33,7 @@ const struct class tpm_class = {
.shutdown_pre = tpm_class_shutdown,
  };
  const struct class tpmrm_class = {
-   .name = "tmprm",
+   .name = "tpmrm",
  };
  dev_t tpm_devt;



Hi Justin,

Thank you for fixing this mistake, it's my bad :(

I'm not sure if you know this or not, but merge window is still opened, 
so the patch could be accidentally skipped by maintainers as they have a 
huge load during this period... Perhaps resending it in a few days might 
be a good idea.


Also, probably adding 'tpm:' prefix to the commit title could make it 
more noticeable to the maintainers.


Thank you again for doing this.

--
Kind regards,
Ivan Orlov


Re: [PATCH 04/10] rcu/nocb: Remove needless full barrier after callback advancing

2023-09-09 Thread Boqun Feng
On Sat, Sep 09, 2023 at 04:31:25AM +, Joel Fernandes wrote:
> On Fri, Sep 08, 2023 at 10:35:57PM +0200, Frederic Weisbecker wrote:
> > A full barrier is issued from nocb_gp_wait() upon callbacks advancing
> > to order grace period completion with callbacks execution.
> > 
> > However these two events are already ordered by the
> > smp_mb__after_unlock_lock() barrier within the call to
> > raw_spin_lock_rcu_node() that is necessary for callbacks advancing to
> > happen.
> > 
> > The following litmus test shows the kind of guarantee that this barrier
> > provides:
> > 
> > C smp_mb__after_unlock_lock
> > 
> > {}
> > 
> > // rcu_gp_cleanup()
> > P0(spinlock_t *rnp_lock, int *gpnum)
> > {
> > // Grace period cleanup increase gp sequence number
> > spin_lock(rnp_lock);
> > WRITE_ONCE(*gpnum, 1);
> > spin_unlock(rnp_lock);
> > }
> > 
> > // nocb_gp_wait()
> > P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int 
> > *cb_ready)
> > {
> > int r1;
> > 
> > // Call rcu_advance_cbs() from nocb_gp_wait()
> > spin_lock(nocb_lock);
> > spin_lock(rnp_lock);
> > smp_mb__after_unlock_lock();
> > r1 = READ_ONCE(*gpnum);
> > WRITE_ONCE(*cb_ready, 1);
> > spin_unlock(rnp_lock);
> > spin_unlock(nocb_lock);
> > }
> > 
> > // nocb_cb_wait()
> > P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed)
> > {
> > int r2;
> > 
> > // rcu_do_batch() -> rcu_segcblist_extract_done_cbs()
> > spin_lock(nocb_lock);
> > r2 = READ_ONCE(*cb_ready);
> > spin_unlock(nocb_lock);
> > 
> > // Actual callback execution
> > WRITE_ONCE(*cb_executed, 1);
> 
> So related to this something in the docs caught my attention under "Callback
> Invocation" [1]
> 
> 
> However, if the callback function communicates to other CPUs, for example,
> doing a wakeup, then it is that function's responsibility to maintain
> ordering. For example, if the callback function wakes up a task that runs on
> some other CPU, proper ordering must in place in both the callback function
> and the task being awakened. To see why this is important, consider the top
> half of the grace-period cleanup diagram. The callback might be running on a
> CPU corresponding to the leftmost leaf rcu_node structure, and awaken a task
> that is to run on a CPU corresponding to the rightmost leaf rcu_node
> structure, and the grace-period kernel thread might not yet have reached the
> rightmost leaf. In this case, the grace period's memory ordering might not
> yet have reached that CPU, so again the callback function and the awakened
> task must supply proper ordering.
> 
> 
> I believe this text is for non-nocb but if we apply that to the nocb case,
> lets see what happens.
> 
> In the litmus, he rcu_advance_cbs() happened on P1, however the callback is
> executing on P2. That sounds very similar to the non-nocb world described in
> the text where a callback tries to wake something up on a different CPU and
> needs to take care of all the ordering.
> 
> So unless I'm missing something (quite possible), P2 must see the update to
> gpnum as well. However, per your limus test, the only thing P2  does is
> acquire the nocb_lock. I don't see how it is guaranteed to see gpnum == 1.

Because P1 writes cb_ready under nocb_lock, and P2 reads cb_ready under
nocb_lock as well and if P2 read P1's write, then we know the serialized
order of locking is P1 first (i.e. the spin_lock(nocb_lock) on P2 reads
from the spin_unlock(nocb_lock) on P1), in other words:

(fact #1)

unlock(nocb_lock) // on P1
->rfe
lock(nocb_lock) // on P2

so if P1 reads P0's write on gpnum

(assumption #1)

W(gpnum)=1 // on P0
->rfe
R(gpnum)=1 // on P1

and we have

(fact #2)

R(gpnum)=1 // on P1
->(po; [UL])
unlock(nocb_lock) // on P1

combine them you get

W(gpnum)=1 // on P0
->rfe   // fact #1
->(po; [UL])// fact #2
->rfe   // assumption #1
lock(nocb_lock) // on P2
->([LKR]; po)
M // any access on P2 after spin_lock(nocb_lock);

so
W(gpnum)=1 // on P0
->rfe ->po-unlock-lock-po
M // on P2

and po-unlock-lock-po is A-culum, hence "->rfe ->po-unlock-lock-po" or
"rfe; po-unlock-lock-po" is culum-fence, hence it's a ->prop, which
means the write of gpnum on P0 propagates to P2 before any memory
accesses after spin_lock(nocb_lock)?

Of course, I haven't looked into the bigger picture here (whether the
barrier is for something else, etc.) ;-)

Regards,
Boqun

> I am curious what happens in your litmus if you read gpnum in a register and
> checked for it.
> 
> So maybe the memory barriers you are deleting need to be kept in place? Idk.
> 
> thanks,
> 
>  - Joel
> 
> 

Re: [PATCH V6 5/7] cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically

2023-09-09 Thread Peter Zijlstra
On Fri, Sep 08, 2023 at 03:46:51PM +0800, Meng Li wrote:
> +static void amd_pstate_update_highest_perf(unsigned int cpu)
> +{
> + struct cpufreq_policy *policy;
> + struct amd_cpudata *cpudata;
> + u32 prev_high = 0, cur_high = 0;
> + u64 highest_perf;
> + int ret;
> +
> + if (!prefcore)
> + return;
> +
> + ret = amd_pstate_get_highest_perf(cpu, _perf);
> + if (ret)
> + return;
> +
> + policy = cpufreq_cpu_get(cpu);
> + cpudata = policy->driver_data;
> + cur_high = highest_perf;
> + prev_high = READ_ONCE(cpudata->prefcore_ranking);
> +
> + if (prev_high != cur_high) {
> + WRITE_ONCE(cpudata->prefcore_ranking, cur_high);
> + sched_set_itmt_core_prio(cur_high, cpu);
> + }
> +
> + cpufreq_cpu_put(policy);
> +}

Idem -- I told to clarify the u32 vs int thing, nothing here.


Re: [PATCH V6 3/7] cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.

2023-09-09 Thread Peter Zijlstra
On Fri, Sep 08, 2023 at 03:46:49PM +0800, Meng Li wrote:
> +static void amd_pstate_init_prefcore(void)
> +{
> + int cpu, ret;
> + u64 highest_perf;
> +
> + if (!prefcore)
> + return;
> +
> + for_each_online_cpu(cpu) {
> + ret = amd_pstate_get_highest_perf(cpu, _perf);
> + if (ret)
> + break;
> +
> + sched_set_itmt_core_prio(highest_perf, cpu);
> +
> + /* check if CPPC preferred core feature is enabled*/
> + if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
> + pr_debug("AMD CPPC preferred core is unsupported!\n");
> + hw_prefcore = false;
> + prefcore = false;
> + return;
> + }
> + }
> +
> + /*
> +  * This code can be run during CPU online under the
> +  * CPU hotplug locks, so sched_set_amd_prefcore_support()
> +  * cannot be called from here.  Queue up a work item
> +  * to invoke it.
> +  */
> + schedule_work(_prefcore_work);
> +}

Brilliant, repost without addressing prior feedback..  :-(


Re: [PATCH v4 4/9] fprobe: rethook: Use ftrace_regs in fprobe exit handler and rethook

2023-09-09 Thread Google
Hi Sven,

On Wed, 06 Sep 2023 08:49:11 +0200
Sven Schnelle  wrote:

> Hi Masami,
> 
> Masami Hiramatsu (Google)  writes:
> 
> > Thus, we need to ensure that the ftrace_regs which is saved in the ftrace
> > *without* FTRACE_WITH_REGS flags, can be used for hooking the function
> > return. I saw;
> >
> > void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs, 
> > bool mcount)
> > {
> > rh->ret_addr = regs->gprs[14];
> > rh->frame = regs->gprs[15];
> >
> > /* Replace the return addr with trampoline addr */
> > regs->gprs[14] = (unsigned long)_rethook_trampoline;
> > }
> >
> > gprs[15] is a stack pointer, so it is saved in ftrace_regs too, but what 
> > about
> > gprs[14]? (I guess it is a link register)
> > We need to read the gprs[14] and ensure that is restored to gpr14 when the
> > ftrace is exit even without FTRACE_WITH_REGS flag.
> >
> > IOW, it is ftrace save regs/restore regs code issue. I need to check how the
> > function_graph implements it.
> 
> gpr2-gpr14 are always saved in ftrace_caller/ftrace_regs_caller(),
> regardless of the FTRACE_WITH_REGS flags. The only difference is that
> without the FTRACE_WITH_REGS flag the program status word (psw) is not
> saved because collecting that is a rather expensive operation.

Thanks for checking that! So s390 will recover those saved registers
even if FTRACE_WITH_REGS flag is not set? (I wonder what is the requirement
of the ftrace_regs when returning from ftrace_call() without
FTRACE_WITH_REGS?)

> 
> I used the following commands to test rethook (is that the correct
> testcase?)
> 
> #!/bin/bash
> cd /sys/kernel/tracing
> 
> echo 'r:icmp_rcv icmp_rcv' >kprobe_events
> echo 1 >events/kprobes/icmp_rcv/enable
> ping -c 1 127.0.0.1
> cat trace

No, the kprobe will path pt_regs to rethook.
Cna you run

echo "f:icmp_rcv%return icmp_rcv" >> dynamic_events

instead of kprobe_events?

Thank you,

> 
> which gave me:
> 
> ping-686 [001] ..s1.96.890817: icmp_rcv: 
> (ip_protocol_deliver_rcu+0x42/0x218 <- icmp_rcv)
> 
> I applied the following patch on top of your patches to make it compile,
> and rethook still seems to work:
> 
> commit dab51b0a5b885660630433ac89f8e64a2de0eb86
> Author: Sven Schnelle 
> Date:   Wed Sep 6 08:06:23 2023 +0200
> 
> rethook wip
> 
> Signed-off-by: Sven Schnelle 
> 
> diff --git a/arch/s390/kernel/rethook.c b/arch/s390/kernel/rethook.c
> index af10e6bdd34e..4e86c0a1a064 100644
> --- a/arch/s390/kernel/rethook.c
> +++ b/arch/s390/kernel/rethook.c
> @@ -3,8 +3,9 @@
>  #include 
>  #include "rethook.h"
>  
> -void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs, 
> bool mcount)
> +void arch_rethook_prepare(struct rethook_node *rh, struct ftrace_regs 
> *fregs, bool mcount)
>  {
> + struct pt_regs *regs = (struct pt_regs *)fregs;
>   rh->ret_addr = regs->gprs[14];
>   rh->frame = regs->gprs[15];
>  
> @@ -13,10 +14,11 @@ void arch_rethook_prepare(struct rethook_node *rh, struct 
> pt_regs *regs, bool mc
>  }
>  NOKPROBE_SYMBOL(arch_rethook_prepare);
>  
> -void arch_rethook_fixup_return(struct pt_regs *regs,
> +void arch_rethook_fixup_return(struct ftrace_regs *fregs,
>  unsigned long correct_ret_addr)
>  {
>   /* Replace fake return address with real one. */
> + struct pt_regs *regs = (struct pt_regs *)fregs;
>   regs->gprs[14] = correct_ret_addr;
>  }
>  NOKPROBE_SYMBOL(arch_rethook_fixup_return);
> @@ -24,9 +26,9 @@ NOKPROBE_SYMBOL(arch_rethook_fixup_return);
>  /*
>   * Called from arch_rethook_trampoline
>   */
> -unsigned long arch_rethook_trampoline_callback(struct pt_regs *regs)
> +unsigned long arch_rethook_trampoline_callback(struct ftrace_regs *fregs)
>  {
> - return rethook_trampoline_handler(regs, regs->gprs[15]);
> + return rethook_trampoline_handler(fregs, fregs->regs.gprs[15]);
>  }
>  NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
>  
> diff --git a/arch/s390/kernel/rethook.h b/arch/s390/kernel/rethook.h
> index 32f069eed3f3..0fe62424fc78 100644
> --- a/arch/s390/kernel/rethook.h
> +++ b/arch/s390/kernel/rethook.h
> @@ -2,6 +2,6 @@
>  #ifndef __S390_RETHOOK_H
>  #define __S390_RETHOOK_H
>  
> -unsigned long arch_rethook_trampoline_callback(struct pt_regs *regs);
> +unsigned long arch_rethook_trampoline_callback(struct ftrace_regs *fregs);
>  
>  #endif
> 


-- 
Masami Hiramatsu (Google) 


Re: [PATCH RFC] selftests/rseq: fix kselftest Clang build warnings

2023-09-09 Thread Mathieu Desnoyers

On 9/8/23 19:03, Justin Stitt wrote:

Hi,

I am experiencing many warnings when trying to build tools/testing/selftests.

Here's one such example from rseq tree:
|  param_test.c:1234:10: error: address argument to atomic operation must be a 
pointer to _Atomic type ('intptr_t *' (aka 'long *') invalid)
|   1234 | while (!atomic_load(>percpu_list_ptr)) {}
|| ^   ~~
|  
/usr/local/google/home/justinstitt/repos/tc-build/build/llvm/final/lib/clang/18/include/stdatomic.h:140:29:
 note: expanded from macro 'atomic_load'
|140 | #define atomic_load(object) __c11_atomic_load(object, 
__ATOMIC_SEQ_CST)
|| ^ ~~

I added the _Atomic type in various locations to silence _all_ (10) of these
warnings. I'm wondering, though, perhaps the absence of these _Atomic
types in the first place is on purpose? Am I on the right track to fix
these warnings without damaging the legitimacy of the tests at hand?

I'd like some feedback about where to go from here and if others are
experiencing the same issues. Thanks!

FWIW here's my specific build incantation on Clang-18 
(49d41de57896e935cd5726719c5208bce22694ae):
$ make LLVM=1 -j128 ARCH=x86_64 mrproper headers defconfig kselftest-merge
$ make LLVM=1 ARCH=x86_64 -C tools/testing/selftests


I should have used the __atomic_load_n() compiler builtin rather than 
atomic_load(), mainly because it does not require the _Atomic annotation 
to each type it touches.


Thanks,

Mathieu




Link: https://github.com/ClangBuiltLinux/linux/issues/1698
Link: https://github.com/ClangBuiltLinux/continuous-integration2/issues/61
Signed-off-by: Justin Stitt 
---
  tools/testing/selftests/rseq/param_test.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/rseq/param_test.c 
b/tools/testing/selftests/rseq/param_test.c
index bf951a490bb4..94802aeed2c6 100644
--- a/tools/testing/selftests/rseq/param_test.c
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -356,7 +356,7 @@ struct inc_thread_test_data {
  };
  
  struct percpu_list_node {

-   intptr_t data;
+   _Atomic intptr_t data;
struct percpu_list_node *next;
  };
  
@@ -1212,8 +1212,8 @@ static int set_signal_handler(void)

  /* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
  #ifdef TEST_MEMBARRIER
  struct test_membarrier_thread_args {
-   int stop;
-   intptr_t percpu_list_ptr;
+   _Atomic int stop;
+   _Atomic intptr_t percpu_list_ptr;
  };
  
  /* Worker threads modify data in their "active" percpu lists. */

@@ -1240,7 +1240,7 @@ void *test_membarrier_worker_thread(void *arg)
int cpu = get_current_cpu_id();
  
  			ret = rseq_offset_deref_addv(RSEQ_MO_RELAXED, RSEQ_PERCPU,

-   >percpu_list_ptr,
+   (intptr_t*)>percpu_list_ptr,
sizeof(struct percpu_list_entry) * cpu, 1, cpu);
} while (rseq_unlikely(ret));
}

---
base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
change-id: 20230908-kselftest-param_test-c-1763b62e762f

Best regards,
--
Justin Stitt 



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com