Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 17 May 2024 10:36:37 -0700 Guenter Roeck wrote: > Building csky:allmodconfig (and others) ... failed > -- > Error log: > In file included from include/trace/trace_events.h:419, > from include/trace/define_trace.h:102, > from drivers/cxl/core/trace.h:737, > from drivers/cxl/core/trace.c:8: > drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 > arguments, but takes just 1 > > This is with the patch applied on top of v6.9-8410-gff2632d7d08e. > So far that seems to be the only build failure. > Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to > cxl_general_media and cxl_dram events"). Guess we'll see more of those > towards the end of the commit window. Looks like I made this patch just before this commit was pulled into Linus's tree. Which is why I'll apply and rerun the above again probably on Tuesday of next week against Linus's latest. This patch made it through both an allyesconfig and an allmodconfig, but on the commit I had applied it to, which was: 1b294a1f3561 ("Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") I'll be compiling those two builds after I update it then. -- Steve
Re: ftrace_direct_func_count ?
On Sat, 4 May 2024 13:35:26 + "Dr. David Alan Gilbert" wrote: > Hi, > I've just posted a patch 'ftrace: Remove unused list 'ftrace_direct_funcs'' > that clears out some old code, but while at it I noticed the global > 'ftrace_direct_func_count'. > > As far as I can tell, it's never assigned (or initialised) but it is tested: > > kernel/trace/fgraph.c: > #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS > /* >* Skip graph tracing if the return location is served by direct trampoline, >* since call sequence and return addresses are unpredictable anyway. >* Ex: BPF trampoline may call original function and may skip frame >* depending on type of BPF programs attached. >*/ > if (ftrace_direct_func_count && > ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE)) > return -EBUSY; > #endif > > So I wasn't sure whether it was just safe to nuke that section > or whether it really needed fixing? Yes, after commit 8788ca164eb4bad ("ftrace: Remove the legacy _ftrace_direct API") that variable is no longer used. -- Steve
Re: [PATCH] ftrace: Replace ftrace_disabled variable with ftrace_is_dead function
On Sat, 20 Apr 2024 11:50:29 +0800 "Bang Li" wrote: > Thank you for your explanation, let me understand this. How about > replacing ftrace_disabled with unlike(ftrace_disabled)? Why? They are slow paths. No need to optimize them. -- Steve
Re: [PATCH net-next v6 0/7] Implement reset reason mechanism to detect
On Fri, 19 Apr 2024 16:00:20 +0800 Jason Xing wrote: > If other experts see this thread, please help me. I would appreciate > it. I have strong interests and feel strong responsibility to > implement something like this patch series. It can be very useful!! I'm not a networking expert, but as I'm Cc'd and this is about tracing, I'll jump in to see if I can help. Honestly, reading the thread, it appears that you and Eric are talking past each other. I believe Eric is concerned about losing the value of the enum. Enums are types, and if you typecast them to another type, they lose the previous type, and all the safety that goes with it. Now, I do not really understand the problem trying to be solved here. I understand how TCP works but I never looked into the implementation of MPTCP. You added this: +static inline enum sk_rst_reason convert_mptcp_reason(u32 reason) +{ + return reason += RST_REASON_START; +} And used it for places like this: @@ -309,8 +309,13 @@ static struct dst_entry *subflow_v4_route_req(const struct sock *sk, return dst; dst_release(dst); - if (!req->syncookie) - tcp_request_sock_ops.send_reset(sk, skb, SK_RST_REASON_NOT_SPECIFIED); + if (!req->syncookie) { + struct mptcp_ext *mpext = mptcp_get_ext(skb); + enum sk_rst_reason reason; + + reason = convert_mptcp_reason(mpext->reset_reason); + tcp_request_sock_ops.send_reset(sk, skb, reason); + } return NULL; } As I don't know this code or how MPTCP works, I do not understand the above. It use to pass to send_reset() SK_RST_REASON_NOT_SPECIFIED. But now it takes a "reset_reason" calls the "convert_mptcp_reason()" to get back a enum value. If you are mapping the reset_reason to enum sk_rst_reason, why not do it via a real conversion instead of this fragile arithmetic between the two values? static inline enum sk_rst_reason convert_mptcp_reason(u32 reason) { switch(reason) { case 0: return SK_RST_REASON_MPTCP_RST_EUNSPEC; case 1: return SK_RST_REASON_MPTCP_RST_EMPTCP; case 2: return SK_RST_REASON_MPTCP_RST_ERESOURCE; [..] default: return SK_RST_REASON_MAX; // or some other error value ] } I'm not sure if this is any better, but it's not doing any casting and it's easier to understand. It's a simple mapping between the reason and the enum and there's no inherit dependency between the values. Could possibly create enums for the reason numbers and replace the hard coded values with them. That way that helper function is at least doing a real conversion of one type to another. But like I said from the beginning. I don't understand the details here and have not spent the time to dig deeper. I just read the thread and I agree with Eric that the arithmetic conversion of reason to an enum looks fragile at best and buggy at worst. -- Steve
Re: [PATCH] ftrace: Replace ftrace_disabled variable with ftrace_is_dead function
On Fri, 19 Apr 2024 22:38:44 +0800 "Bang Li" wrote: > Use the existing function ftrace_is_dead to replace the variable to make > the code clearer. > > Signed-off-by: Bang Li > --- > kernel/trace/ftrace.c | 46 +-- > 1 file changed, 23 insertions(+), 23 deletions(-) > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 50ca4d4f8840..4a08c79db677 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -2693,7 +2693,7 @@ void __weak ftrace_replace_code(int mod_flags) > int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL; > int failed; > > - if (unlikely(ftrace_disabled)) > + if (unlikely(ftrace_is_dead())) > return; > NACK! ftrace_is_dead() is only there to make the static variable "ftrace_disabled" available for code outside of ftrace.c. In ftrace.c, it is perfectly fine to use ftrace_disabled. -- Steve
Re: [PATCH net-next v3 6/6] rstreason: make it work in trace world
On Tue, 9 Apr 2024 18:09:34 +0800 Jason Xing wrote: > /* > * tcp event with arguments sk and skb > @@ -74,20 +75,38 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb, > TP_ARGS(sk, skb) > ); > > +#undef FN1 > +#define FN1(reason) TRACE_DEFINE_ENUM(SK_RST_REASON_##reason); > +#undef FN2 > +#define FN2(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason); > +DEFINE_RST_REASON(FN1, FN1) Interesting. I've never seen the passing of the internal macros to the main macro before. I see that you are using it for handling both the SK_RST_REASON and the SK_DROP_REASON. > + > +#undef FN1 > +#undef FNe1 > +#define FN1(reason) { SK_RST_REASON_##reason, #reason }, > +#define FNe1(reason) { SK_RST_REASON_##reason, #reason } > + > +#undef FN2 > +#undef FNe2 > +#define FN2(reason) { SKB_DROP_REASON_##reason, #reason }, > +#define FNe2(reason) { SKB_DROP_REASON_##reason, #reason } Anyway, from a tracing point of view, as it looks like it would work (I haven't tested it). Reviewed-by: Steven Rostedt (Google) -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Tue, 2 Apr 2024 22:21:00 -0700 Andrii Nakryiko wrote: > > I just checked our fleet-wide production data for the last 24 hours. > > Within the kprobe/kretprobe code path (ftrace_trampoline and > > everything called from it), rcu_is_watching (both calls, see below) > > cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd > > prefer to be able to avoid that in production use cases. > > > > I just ran synthetic microbenchmark testing multi-kretprobe > throughput. We get (in millions of BPF kretprobe-multi program > invocations per second): > - 5.568M/s as baseline; > - 5.679M/s with changes in this patch (+2% throughput improvement); > - 5.808M/s with disabling rcu_is_watching in rethook_try_get() > (+2.3% more vs just one of rcu_is_watching, and +4.3% vs baseline). > > It's definitely noticeable. Ah, thanks for verifying (I should have read the thread before replying to the previous email). -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Tue, 2 Apr 2024 21:00:19 -0700 Andrii Nakryiko wrote: > I just noticed another rcu_is_watching() call, in rethook_try_get(), > which seems to be a similar and complementary validation check to the > one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option > in this patch. It feels like both of them should be controlled by the > same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING > guard around rcu_is_watching() check in rethook_try_get() as well? That is totally up to Masami. It may have even less overhead as I'm not sure how many times that gets called, and there may be more work to do than with function tracing. -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Wed, 3 Apr 2024 09:40:48 +0900 Masami Hiramatsu (Google) wrote: > OK, for me, this last sentence is preferred for the help message. That > explains > what this is for. > > All callbacks that attach to the function tracing have some sort > of protection against recursion. This option is only to verify that > ftrace (and other users of ftrace_test_recursion_trylock()) are not > called outside of RCU, as if they are, it can cause a race. > But it also has a noticeable overhead when enabled. > > BTW, how much overhead does this introduce? and the race case a kernel crash? > or just messed up the ftrace buffer? There's a hypothetical race where it can cause a use after free. That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(), which requires RCU to be watching. There's a theoretical case where that task calls the trampoline and misses the synchronization. Note, these locations are with preemption disabled, as rcu is always watching when preemption is enabled. Thus it would be extremely fast where as the synchronize_rcu_tasks() is rather slow. We also have synchronize_rcu_tasks_rude() which would actually keep the trace from happening, as it would schedule on each CPU forcing all CPUs to have RCU watching. I have never heard of this race being hit. I guess it could happen on a VM where a vCPU gets preempted at the right moment for a long time and the other CPUs synchronize. But like lockdep, where deadlocks can crash the kernel, we don't enable it for production. The overhead is another function call within the function tracer. I had numbers before, but I guess I could run tests again and get new numbers. Thanks, -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Mon, 1 Apr 2024 19:29:46 -0700 Andrii Nakryiko wrote: > On Mon, Apr 1, 2024 at 5:38 PM Masami Hiramatsu wrote: > > > > On Mon, 1 Apr 2024 12:09:18 -0400 > > Steven Rostedt wrote: > > > > > On Mon, 1 Apr 2024 20:25:52 +0900 > > > Masami Hiramatsu (Google) wrote: > > > > > > > > Masami, > > > > > > > > > > Are you OK with just keeping it set to N. > > > > > > > > OK, if it is only for the debugging, I'm OK to set N this. > > > > > > > > > > > > > > We could have other options like PROVE_LOCKING enable it. > > > > > > > > Agreed (but it should say this is a debug option) > > > > > > It does say "Validate" which to me is a debug option. What would you > > > suggest? > > > > I think the help message should have "This is for debugging ftrace." > > > > Sent v2 with adjusted wording, thanks! You may want to wait till Masami and I agree ;-) Masami, But it isn't really for "debugging", it's for validating. That is, it doesn't give us any information to debug ftrace. It only validates if it is executed properly. In other words, I never want to be asked "How can I use this option to debug ftrace?" For example, we also have: "Verify ring buffer time stamp deltas" that makes sure the time stamps of the ring buffer are not buggy. And there's: "Verify compile time sorting of ftrace functions" Which is also used to make sure things are working properly. Neither of the above says they are for "debugging", even though they are more useful for debugging than this option. I'm not sure saying this is "debugging ftrace" is accurate. It may help debug ftrace if it is called outside of an RCU location, which has a 1 in 100,000,000,000 chance of causing an actual bug, as the race window is extremely small. Now if it is also called outside of instrumentation, that will likely trigger other warnings even without this code, and this will not be needed to debug that. ftrace has all sorts of "verifiers" that is used to make sure things are working properly. And yes, you can consider it as "debugging". But I would not consider this an option to enable if ftrace was broken, and you are looking into why it is broken. To me, this option is only to verify that ftrace (and other users of ftrace_test_recursion_trylock()) are not called outside of RCU, as if they are, it can cause a race. But it also has a noticeable overhead when enabled. -- Steve -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Mon, 1 Apr 2024 20:25:52 +0900 Masami Hiramatsu (Google) wrote: > > Masami, > > > > Are you OK with just keeping it set to N. > > OK, if it is only for the debugging, I'm OK to set N this. > > > > > We could have other options like PROVE_LOCKING enable it. > > Agreed (but it should say this is a debug option) It does say "Validate" which to me is a debug option. What would you suggest? -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Tue, 26 Mar 2024 09:16:33 -0700 Andrii Nakryiko wrote: > > It's no different than lockdep. Test boxes should have it enabled, but > > there's no reason to have this enabled in a production system. > > > > I tend to agree with Steven here (which is why I sent this patch as it > is), but I'm happy to do it as an opt-out, if Masami insists. Please > do let me know if I need to send v2 or this one is actually the one > we'll end up using. Thanks! Masami, Are you OK with just keeping it set to N. We could have other options like PROVE_LOCKING enable it. -- Steve
Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional
On Mon, 25 Mar 2024 11:38:48 +0900 Masami Hiramatsu (Google) wrote: > On Fri, 22 Mar 2024 09:03:23 -0700 > Andrii Nakryiko wrote: > > > Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to > > control whether ftrace low-level code performs additional > > rcu_is_watching()-based validation logic in an attempt to catch noinstr > > violations. > > > > This check is expected to never be true in practice and would be best > > controlled with extra config to let users decide if they are willing to > > pay the price. > > Hmm, for me, it sounds like "WARN_ON(something) never be true in practice > so disable it by default". I think CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING > is OK, but tht should be set to Y by default. If you have already verified > that your system never make it true and you want to optimize your ftrace > path, you can manually set it to N at your own risk. > Really, it's for debugging. I would argue that it should *not* be default y. Peter added this to find all the locations that could be called where RCU is not watching. But the issue I have is that this is that it *does cause overhead* with function tracing. I believe we found pretty much all locations that were an issue, and we should now just make it an option for developers. It's no different than lockdep. Test boxes should have it enabled, but there's no reason to have this enabled in a production system. -- Steve > > > > Cc: Steven Rostedt > > Cc: Masami Hiramatsu > > Cc: Paul E. McKenney > > Signed-off-by: Andrii Nakryiko > > --- > > include/linux/trace_recursion.h | 2 +- > > kernel/trace/Kconfig| 13 + > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/trace_recursion.h > > b/include/linux/trace_recursion.h > > index d48cd92d2364..24ea8ac049b4 100644 > > --- a/include/linux/trace_recursion.h > > +++ b/include/linux/trace_recursion.h > > @@ -135,7 +135,7 @@ extern void ftrace_record_recursion(unsigned long ip, > > unsigned long parent_ip); > > # define do_ftrace_record_recursion(ip, pip) do { } while (0) > > #endif > > > > -#ifdef CONFIG_ARCH_WANTS_NO_INSTR > > +#ifdef CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING > > # define trace_warn_on_no_rcu(ip) \ > > ({ \ > > bool __ret = !rcu_is_watching();\ > > BTW, maybe we can add "unlikely" in the next "if" line? > > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > > index 61c541c36596..19bce4e217d6 100644 > > --- a/kernel/trace/Kconfig > > +++ b/kernel/trace/Kconfig > > @@ -974,6 +974,19 @@ config FTRACE_RECORD_RECURSION_SIZE > > This file can be reset, but the limit can not change in > > size at runtime. > > > > +config FTRACE_VALIDATE_RCU_IS_WATCHING > > + bool "Validate RCU is on during ftrace recursion check" > > + depends on FUNCTION_TRACER > > + depends on ARCH_WANTS_NO_INSTR > > default y > > > + help > > + All callbacks that attach to the function tracing have some sort > > + of protection against recursion. This option performs additional > > + checks to make sure RCU is on when ftrace callbacks recurse. > > + > > + This will add more overhead to all ftrace-based invocations. > > ... invocations, but keep it safe. > > > + > > + If unsure, say N > > If unsure, say Y > > Thank you, > > > + > > config RING_BUFFER_RECORD_RECURSION > > bool "Record functions that recurse in the ring buffer" > > depends on FTRACE_RECORD_RECURSION > > -- > > 2.43.0 > > > >
Re: [PATCH] tracing: probes: Fix to zero initialize a local variable
On Wed, 20 Mar 2024 17:10:38 +0900 "Masami Hiramatsu (Google)" wrote: > From: Masami Hiramatsu (Google) > > Fix to initialize 'val' local variable with zero. > Dan reported that Smatch static code checker reports an error that a local > 'val' variable needs to be initialized. Actually, the 'val' is expected to > be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So > initialize it with zero. BTW, that loop should really have a comment stating that FETCH_OP_ARG is expected to happen before FETCH_OP_ST_EDATA. -- Steve
Re: [PATCH] tracing: probes: Fix to zero initialize a local variable
On Wed, 20 Mar 2024 17:10:38 +0900 "Masami Hiramatsu (Google)" wrote: > From: Masami Hiramatsu (Google) > > Fix to initialize 'val' local variable with zero. > Dan reported that Smatch static code checker reports an error that a local > 'val' variable needs to be initialized. Actually, the 'val' is expected to > be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So > initialize it with zero. > > Reported-by: Dan Carpenter > Closes: > https://lore.kernel.org/all/b010488e-68aa-407c-add0-3e059254aaa0@moroto.mountain/ > Fixes: 25f00e40ce79 ("tracing/probes: Support $argN in return probe (kprobe > and fprobe)") > Signed-off-by: Masami Hiramatsu (Google) > --- Reviewed-by: Steven Rostedt (Google) -- Steve > kernel/trace/trace_probe.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c > index 217169de0920..dfe3ee6035ec 100644 > --- a/kernel/trace/trace_probe.c > +++ b/kernel/trace/trace_probe.c > @@ -839,7 +839,7 @@ int traceprobe_get_entry_data_size(struct trace_probe *tp) > void store_trace_entry_data(void *edata, struct trace_probe *tp, struct > pt_regs *regs) > { > struct probe_entry_arg *earg = tp->entry_arg; > - unsigned long val; > + unsigned long val = 0; > int i; > > if (!earg)
Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Wed, 20 Mar 2024 12:44:23 +0900 Masami Hiramatsu (Google) wrote: > > > kernel/trace/trace_probe.c > > > 846 return; > > > 847 > > > 848 for (i = 0; i < earg->size; i++) { > > > 849 struct fetch_insn *code = >code[i]; > > > 850 > > > 851 switch (code->op) { > > > 852 case FETCH_OP_ARG: > > > 853 val = regs_get_kernel_argument(regs, > > > code->param); > > > 854 break; > > > 855 case FETCH_OP_ST_EDATA: > > > --> 856 *(unsigned long *)((unsigned long)edata + > > > code->offset) = val; > > > > > > Probably the earg->code[i] always has FETCH_OP_ARG before > > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... > > > > Looks that way: > > > > case FETCH_OP_END: > > earg->code[i].op = FETCH_OP_ARG; > > earg->code[i].param = argnum; > > earg->code[i + 1].op = FETCH_OP_ST_EDATA; > > earg->code[i + 1].offset = offset; > > return offset; > > > > But probably should still initialize val to zero or have a WARN_ON() if > > that doesn't happen. > > OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop > is a bit strange. I think we should have a verifiler. Initializing to zero is fine. -- Steve
Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)
On Tue, 19 Mar 2024 10:19:09 +0300 Dan Carpenter wrote: > Hello Masami Hiramatsu (Google), > > Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe > (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the > following Smatch static checker warning: > > kernel/trace/trace_probe.c:856 store_trace_entry_data() > error: uninitialized symbol 'val'. > > kernel/trace/trace_probe.c > 846 return; > 847 > 848 for (i = 0; i < earg->size; i++) { > 849 struct fetch_insn *code = >code[i]; > 850 > 851 switch (code->op) { > 852 case FETCH_OP_ARG: > 853 val = regs_get_kernel_argument(regs, > code->param); > 854 break; > 855 case FETCH_OP_ST_EDATA: > --> 856 *(unsigned long *)((unsigned long)edata + > code->offset) = val; > > Probably the earg->code[i] always has FETCH_OP_ARG before > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out... Looks that way: case FETCH_OP_END: earg->code[i].op = FETCH_OP_ARG; earg->code[i].param = argnum; earg->code[i + 1].op = FETCH_OP_ST_EDATA; earg->code[i + 1].offset = offset; return offset; But probably should still initialize val to zero or have a WARN_ON() if that doesn't happen. -- Steve > > 857 break; > 858 case FETCH_OP_END: > 859 goto end; > 860 default: > 861 break; > 862 } > 863 } > 864 end: > 865 return; > 866 } > > regards, > dan carpenter
Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Thu, 14 Mar 2024 09:57:57 -0700 Alison Schofield wrote: > On Fri, Feb 23, 2024 at 12:56:34PM -0500, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" > > > > [ > >This is a treewide change. I will likely re-create this patch again in > >the second week of the merge window of v6.9 and submit it then. Hoping > >to keep the conflicts that it will cause to a minimum. > > ] Note, change of plans. I plan on sending this in the next merge window, as this merge window I have this patch: https://lore.kernel.org/linux-trace-kernel/20240312113002.00031...@gandalf.local.home/ That will warn if the source string of __string() is different than the source string of __assign_str(). I want to make sure they are identical before just dropping one of them. > > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > > index bdf117a33744..07ba4e033347 100644 > > --- a/drivers/cxl/core/trace.h > > +++ b/drivers/cxl/core/trace.h > > snip to poison > > > @@ -668,8 +668,8 @@ TRACE_EVENT(cxl_poison, > > ), > > > > TP_fast_assign( > > - __assign_str(memdev, dev_name(>dev)); > > - __assign_str(host, dev_name(cxlmd->dev.parent)); > > + __assign_str(memdev); > > + __assign_str(host); > > I think I get that the above changes work because the TP_STRUCT__entry for > these did: > __string(memdev, dev_name(>dev)) > __string(host, dev_name(cxlmd->dev.parent)) That's the point. They have to be identical or you will likely bug. The __string(name, src) is used to find the string length of src which allocates the necessary length on the ring buffer. The __assign_str(name, src) will copy src into the ring buffer. Similar to: len = strlen(src); buf = malloc(len); strcpy(buf, str); Where __string() is strlen() and __assign_str() is strcpy(). It doesn't make sense to use two different strings, and if you did, it would likely be a bug. But the magic behind __string() does much more than just get the length of the string, and it could easily save the pointer to the string (along with its length) and have it copy that in the __assign_str() call, making the src parameter of __assign_str() useless. > > > __entry->serial = cxlmd->cxlds->serial; > > __entry->overflow_ts = cxl_poison_overflow(flags, overflow_ts); > > __entry->dpa = cxl_poison_record_dpa(record); > > @@ -678,12 +678,12 @@ TRACE_EVENT(cxl_poison, > > __entry->trace_type = trace_type; > > __entry->flags = flags; > > if (region) { > > - __assign_str(region, dev_name(>dev)); > > + __assign_str(region); > > memcpy(__entry->uuid, >params.uuid, 16); > > __entry->hpa = cxl_trace_hpa(region, cxlmd, > > __entry->dpa); > > } else { > > - __assign_str(region, ""); > > + __assign_str(region); > > memset(__entry->uuid, 0, 16); > > __entry->hpa = ULLONG_MAX; > > For the above 2, there was no helper in TP_STRUCT__entry. A recently > posted patch is fixing that up to be __string(region, NULL) See [1], > with the actual assignment still happening in TP_fast_assign. __string(region, NULL) doesn't make sense. It's like: len = strlen(NULL); buf = malloc(len); strcpy(buf, NULL); ?? I'll reply to that email. -- Steve > > Does that assign logic need to move to the TP_STRUCT__entry definition > when you merge these changes? I'm not clear how much logic is able to be > included, ie like 'C' style code in the TP_STRUCT__entry. > > [1] > https://lore.kernel.org/linux-cxl/20240314044301.2108650-1-alison.schofi...@intel.com/
Re: How to display a ktime value as trace timestamp in trace output?
On Wed, 31 Jan 2024 14:47:31 + David Howells wrote: > Hi Steven, Hi David, Sorry, I just noticed this email as it was buried in other unread emails :-p > > I have a tracepoint in AF_RXRPC that displays information about a timeout I'm > going to set. I have the timeout in a ktime_t as an absolute time. Is there > a way to display this in the trace output such that it looks like a trace > timestamp and can be (roughly) correlated with the displayed timestamps? Have you tried the other clocks? { ktime_get_mono_fast_ns, "mono", 1 }, { ktime_get_raw_fast_ns,"mono_raw", 1 }, { ktime_get_boot_fast_ns, "boot", 1 }, The above are the functions used for the tracing timestamps. -- Steve > > I tried subtracting ktime_get_read() - ktime_get_boottime() from it and > displaying the result, but it looked about one and a bit seconds out from the > trace timestamp. >
Re: tprobe event tracing error
On Wed, 28 Feb 2024 10:52:52 -0500 Steven Rostedt wrote: > The prototype of fchownat() is: > > int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags); > > Where pathname is the third parameter, not the first, and mode is the third. I meant pathname is the second parameter. -- Steve
Re: tprobe event tracing error
On Wed, 28 Feb 2024 17:25:40 +0300 Максим Морсков wrote: > Dear colleagues, > One last question — is it bug or feature that trobe event tracing can not > correctly dereference string pointers from pt_regs? > For example: > echo 't:tmy_chmod sys_enter id=$arg2 filename=+8($arg1):string > mode=+16($arg1)' | sudo tee ‘/sys/kernel/tracing/dynamic_events’ So the tprobe attaches to the tracepoint, which is this: trace_sys_enter(regs, syscall); Where arg1 is pt_regs, which on x86_64 (I'm assuming that's what you are using) has: struct pt_regs { /* * C ABI says these regs are callee-preserved. They aren't saved on kernel entry * unless syscall needs a complete, fully filled "struct pt_regs". */ unsigned long r15; unsigned long r14; unsigned long r13; unsigned long r12; unsigned long rbp; unsigned long rbx; /* These regs are callee-clobbered. Always saved on kernel entry. */ unsigned long r11; unsigned long r10; unsigned long r9; unsigned long r8; unsigned long rax; unsigned long rcx; unsigned long rdx; unsigned long rsi; unsigned long rdi; /* * On syscall entry, this is syscall#. On CPU exception, this is error code. * On hw interrupt, it's IRQ number: */ unsigned long orig_rax; /* Return frame for iretq */ unsigned long rip; unsigned long cs; unsigned long eflags; unsigned long rsp; unsigned long ss; /* top of stack page */ }; Where regs+8 is register r14. and regs+16 is r13. Is that what you really want? No, it's not. Also, I noticed that you are not tracing chmod, but you are tracing id = 268 which is fchownat() (I noticed via strace, that this is what "chmod" uses). The prototype of fchownat() is: int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags); Where pathname is the third parameter, not the first, and mode is the third. The calling convention for x86_64 is: rdi rsi rdx rcx r8-9 That is, arg1 is in register rdi, arg2 is rsi, arg3 is rdx. We want arguments 2 and 3. Which is: regs: $arg1 regs->rsi:+104($arg1) regs->rdx:+96($arg1) And since the file name is a string, we need to do one more dereference to get to it: pathname: +0(+104($arg1)):ustring (notice I used "ustring" as we now differentiate between kernel and user space) With the above information I can do: # cd /sys/kernel/tracing # echo 't:tmy_chmod sys_enter id=$arg2 filename=+0(+104($arg1)):ustring mode=+96($arg1):x16' > dynamic_events # echo 'id == 268' > events/tracepoints/tmy_chmod/filter # echo 1 > events/tracepoints/tmy_chmod/enable # mkdir /tmp/x # chmod 100 /tmp/x # cat trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:8 # #_-=> irqs-off/BH-disabled # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # / delay # TASK-PID CPU# | TIMESTAMP FUNCTION # | | | | | | chmod-1035[004] ...1. 1744.492490: tmy_chmod: (__probestub_sys_enter+0x4/0x10) id=0x10c filename="/tmp/x" mode=0x40 TADA!!! -- Steve > echo 'id == 268' | sudo tee > ‘/sys/kernel/tracing/events/tracepoints/tmy_chmod/filter’ > echo '1' | sudo tee ‘/sys/kernel/tracing/events/tracepoints/tmy_chmod/enable’ > echo ‘1’ | sudo tee ‘/sys/kernel/tracing/tracing_on’ > > cat ‘/sys/kernel/tracing/trace’ > # TASK-PID CPU# | TIMESTAMP FUNCTION > # | | | | | | > chmod-10522 [010] ...1. 8533.321703: tmy_chmod: > (__probestub_sys_enter+0x0/0x10) id=0x10c fd=0x81ed filename="" mode=0x1ed > > The pointer is correct (it corresponds to kprobe event args), but dereference > never happens >
Re: tprobe event tracing error
On Mon, 26 Feb 2024 23:41:56 +0900 Masami Hiramatsu (Google) wrote: > Hi, > (Cc: linux-kernel-trace ML for sharing this knowledge) > > On Mon, 26 Feb 2024 16:36:29 +0300 > Максим Морсков wrote: > > > > > Hello, dear Masami. > > I am researching Linux event tracing subsystem in part of tprobes, > > and found interesting behavior in kernel version 6.6: > > > > echo 't:my_fchmodat sys_enter_fchmodat' | sudo tee ‘/sys/kernel/tracing/dynamic_events’ > > bash: line 1: echo: write error: Invalid argument > > Yeah, I understand that you are confused by this behavior, but it is > actually expected behavior. syscalls:* events looks like trace events > based on tracepoint, but those are software generated trace event. > > You can find raw_syscalls:* trace events, that is based on the tracepoint, > and other syscalls:* are based on that raw_syscalls:* trace points. > (IOW, those are a kind of pre-compiled dynamic events) > > e.g. > > /sys/kernel/tracing # echo "t sys_enter \$arg*" >> dynamic_events > /sys/kernel/tracing # cat dynamic_events > t:tracepoints/sys_enter sys_enter regs=regs id=id > > /sys/kernel/tracing # echo "t sys_enter_open \$arg*" >> dynamic_events > sh: write error: Invalid argument > /sys/kernel/tracing # cat error_log > [ 227.981347] trace_fprobe: error: Tracepoint is not found > Command: t sys_enter_open $arg* > ^ > > So, tprobe can not find the hard-coded tracepoints for those dynamically > generated syscall trace events. But raw_syscall sys_enter/sys_exit are OK. Ah, that's because "tprobes" are attached to tracepoints and not trace events. If you want to attach to trace events, you need to use eprobes (which I need to add documentation for!). # echo 'e:my_fchmodat syscalls/sys_enter_fchmodat' > dynamic_events Works. -- Steve
Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 23 Feb 2024 13:46:53 -0500 Steven Rostedt wrote: > Now one thing I could do is to not remove the parameter, but just add: > > WARN_ON_ONCE((src) != __data_offsets->item##_ptr_); > > in the __assign_str() macro to make sure that it's still the same that is > assigned. But I'm not sure how useful that is, and still causes burden to > have it. I never really liked the passing of the string in two places to > begin with. Hmm, maybe I'll just add this patch for 6.9 and then in 6.10 do the parameter removal. -- Steve diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h index 0c0f50bcdc56..7372e2c2a0c4 100644 --- a/include/trace/stages/stage6_event_callback.h +++ b/include/trace/stages/stage6_event_callback.h @@ -35,6 +35,7 @@ #define __assign_str(dst, src) do {\ char *__str__ = __get_str(dst); \ int __len__ = __get_dynamic_array_len(dst) - 1; \ + WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_); \ memcpy(__str__, __data_offsets.dst##_ptr_ ? : \ EVENT_NULL_STR, __len__);\ __str__[__len__] = '\0';\
Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 23 Feb 2024 14:50:49 -0500 Kent Overstreet wrote: > Tangentially related though, what would make me really happy is if we > could create the string with in the TP__fast_assign() section. I have to > have a bunch of annoying wrappers right now because the string length > has to be known when we invoke the tracepoint. You can use __string_len() to determine the string length in the tracepoint (which is executed in the TP_fast_assign() section). My clean up patches will make __assign_str_len() obsolete too (I'm working on them now), and you can just use __assign_str(). I noticed that I don't have a string_len example in the sample code and I'm actually writing it now. // cutting out everything else: TRACE_EVENT(foo_bar, TP_PROTO(const char *foo, int bar), TP_ARGS(foo, bar), TP_STRUCT__entry( __string_len( lstr, foo,bar < strlen(foo) ? bar : strlen(foo) ) ), TP_fast_assign( __assign_str(lstr, foo); // Note, the above is with my updates, without them, you need to duplicate the logic // __assign_str_len(lstr, foo, bar < strlen(foo) ? bar : strlen(foo)); ), TP_printk("%s", __get_str(lstr)) ); The above will allocate "bar < strlen(foo) ? bar : strlen(foo)" size on the ring buffer. As the size is already stored, my clean up code uses that instead of requiring duplicating the logic again. -- Steve
Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 23 Feb 2024 10:30:45 -0800 Jeff Johnson wrote: > On 2/23/2024 9:56 AM, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" > > > > [ > >This is a treewide change. I will likely re-create this patch again in > >the second week of the merge window of v6.9 and submit it then. Hoping > >to keep the conflicts that it will cause to a minimum. > > ] > > > > With the rework of how the __string() handles dynamic strings where it > > saves off the source string in field in the helper structure[1], the > > assignment of that value to the trace event field is stored in the helper > > value and does not need to be passed in again. > > Just curious if this could be done piecemeal by first changing the > macros to be variadic macros which allows you to ignore the extra > argument. The callers could then be modified in their separate trees. > And then once all the callers have be merged, the macros could be > changed to no longer be variadic. I weighed doing that, but I think ripping off the band-aid is a better approach. One thing I found is that leaving unused parameters in the macros can cause bugs itself. I found one case doing my clean up, where an unused parameter in one of the macros was bogus, and when I made it a used parameter, it broke the build. I think for tree-wide changes, the preferred approach is to do one big patch at once. And since this only affects TRACE_EVENT() macros, it hopefully would not be too much of a burden (although out of tree users may suffer from this, but do we care?) Now one thing I could do is to not remove the parameter, but just add: WARN_ON_ONCE((src) != __data_offsets->item##_ptr_); in the __assign_str() macro to make sure that it's still the same that is assigned. But I'm not sure how useful that is, and still causes burden to have it. I never really liked the passing of the string in two places to begin with. -- Steve
Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()
On Fri, 23 Feb 2024 12:56:34 -0500 Steven Rostedt wrote: > Note, the same updates will need to be done for: > > __assign_str_len() > __assign_rel_str() > __assign_rel_str_len() Correction: The below macros do not pass in their source to the entry macros, so they will not need to be updated. -- Steve > __assign_bitmask() > __assign_rel_bitmask() > __assign_cpumask() > __assign_rel_cpumask()
Re: [v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event
On Thu, 1 Feb 2024 14:24:55 +0800 Andy Chiu wrote: > If the task happens to run after cpu hot-plug offline, then it would not > be running in a percpu_thread. Instead, it would be re-queued into a > UNBOUND workqueue. This would trigger a warning if we enable kernel > preemption. > > Signed-off-by: Andy Chiu > --- > kernel/trace/trace_hwlat.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > index b791524a6536..87258ddc2141 100644 > --- a/kernel/trace/trace_hwlat.c > +++ b/kernel/trace/trace_hwlat.c > @@ -511,7 +511,16 @@ static int start_cpu_kthread(unsigned int cpu) > static void hwlat_hotplug_workfn(struct work_struct *dummy) > { > struct trace_array *tr = hwlat_trace; > - unsigned int cpu = smp_processor_id(); > + unsigned int cpu; > + > + /* > + * If the work is scheduled after CPU hotplug offline being invoked, > + * then it would be queued into UNBOUNDED workqueue Is it due to a CPU going to online and then right back to offline? This function is called by the online path. From the queue_work_on() comment: /** * queue_work_on - queue work on specific cpu * @cpu: CPU number to execute work on * @wq: workqueue to use * @work: work to queue * * We queue the work to a specific CPU, the caller must ensure it * can't go away. Callers that fail to ensure that the specified * CPU cannot go away will execute on a randomly chosen CPU. * But note well that callers specifying a CPU that never has been * online will get a splat. * * Return: %false if @work was already on a queue, %true otherwise. */ So the work gets queued on the CPU but then the CPU immediately goes offline. I wonder what happens if it comes back online? The above states that if the work is already queued on a CPU it won't queue it again. Hmm, reading the comments, I'm not sure you can run the same workqueue on muliple CPUs. It looks to do only one and all the others will fail. I think we need to change this to a global work queue and use CPU masks. Where in the *cpu_init() function, it just sets the current CPU bit in a bit mask and calls the workqueue, and that workqueue is responsible for all CPUs. Then the workqueue function will call cpu_read_lock() and just iterate the CPU mask starting the threads for each of the bits that are set and clear the bit (possibly with __test_and_clear_bit). I don't think the schedule_on_cpu() is doing what we think it should be doing. -- Steve > + */ > + if (!is_percpu_thread()) > + return; > + > + cpu = smp_processor_id(); > > mutex_lock(_types_lock); > mutex_lock(_data.lock);
Re: [v1] trace/osnoise: prevent osnoise hotplog worker running in UNBOUND workqueue
On Thu, 1 Feb 2024 14:18:45 +0800 Andy Chiu wrote: > smp_processor_id() should be called with migration disabled. This mean > we may safely call smp_processor_id() in percpu thread. However, this is > not the case if the work is (re-)queued into unbound workqueue, during > cpu-hotplog. So, detect and return early if this work happens to run on > an unbound wq. > Yeah I triggered this too, but never made it a priority to fix. > Signed-off-by: Andy Chiu > --- > kernel/trace/trace_osnoise.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c > index bd0d01d00fb9..cf7f716d3f35 100644 > --- a/kernel/trace/trace_osnoise.c > +++ b/kernel/trace/trace_osnoise.c > @@ -2068,7 +2068,12 @@ static int start_per_cpu_kthreads(void) > #ifdef CONFIG_HOTPLUG_CPU > static void osnoise_hotplug_workfn(struct work_struct *dummy) > { > - unsigned int cpu = smp_processor_id(); > + unsigned int cpu; > + > + if (!is_percpu_thread()) > + return; But doesn't this then fail to register the CPU thread? I wonder if this has some race with schedule_work_on() and hotplug? -- Steve > + > + cpu = smp_processor_id(); > > mutex_lock(_types_lock); >
Re: [PATCH] eventfs: Create dentries and inodes at dir open
On Tue, 16 Jan 2024 10:53:36 -0800 Linus Torvalds wrote: > Let's at least *try* to move towards a better and simpler world, in other > words. OK, but I did just finish the below. I'll save it for another time if the single inode becomes an issue. I cropped it to just 31 bits, so I'm not sure how much that would still expose kernel addresses. But as you prefer a single inode number, I'll submit that instead, and store this as one of my many git branches that I hoard. This patch would give me: ~# ls -i /sys/kernel/tracing/events 2024739183 9p 801480813 iocost281269778 qdisc 401505785 alarmtimer 641794539 iomap1206040657 ras 1930290274 avc 1239754069 iommu 275816503 raw_syscalls 1108006611 block138722947 io_uring 1758073266 rcu 1528091656 bpf_test_run 694421747 ipi 1384703124 regmap 557364529 bpf_trace 2765888 irq30507018 regulator 1351362737 cfg80211 369905250 irq_matrix 2078862821 resctrl 886445085 cgroup 1115008967 irq_vectors 324090967 rpm 796209754 clk 1448778784 jbd2 1141318882 rseq 478739129 compaction99410253 kmem 1274783780 rtc 1714397712 cpuhp 2001779594 ksm 1409072807 sched 720701943 csd 51728677 kyber 347239934 scsi 1824588347 dev 1507974437 libata 1768671172 sd 1640041299 devfreq 1421146927 lock 1167562598 signal 121399632 dma_fence956825721 mac802114116590 skb 975908383 drm 738868061 maple_tree 1435501164 smbus 1227060804 e1000e_trace 969175760 mce 1664441095 sock 1770307058 enable 1225375766 mdio 1634697993 spi 1372107864 error_report 744198394 migrate 556841757 swiotlb 1356665351 exceptions 602669807 mmap 400337480 syscalls [..] ~# ls -i /sys/kernel/tracing/events/sched 1906992193 enable1210369853 sched_process_wait 592505447 filter1443020725 sched_stat_blocked 1742488280 sched_kthread_stop1213185672 sched_stat_iowait 1674576234 sched_kthread_stop_ret1908510325 sched_stat_runtime 1999376743 sched_kthread_work_execute_end1570203600 sched_stat_sleep 1041533096 sched_kthread_work_execute_start 608113040 sched_stat_wait 166824308 sched_kthread_work_queue_work 2033295833 sched_stick_numa 492462616 sched_migrate_task1617500395 sched_swap_numa 1786868196 sched_move_numa 1865216679 sched_switch 691687388 sched_pi_setprio 1465729401 sched_wait_task 1610977146 sched_process_exec 969941227 sched_wake_idle_without_ipi 610128037 sched_process_exit 84398030 sched_wakeup 2139284699 sched_process_fork 750220489 sched_wakeup_new 169172804 sched_process_free1024064201 sched_waking -- Steve diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index a0598eb3e26e..57448bf4acb4 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -32,6 +33,9 @@ */ static DEFINE_MUTEX(eventfs_mutex); +/* Used for making inode numbers */ +static siphash_key_t inode_key; + /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). @@ -57,6 +61,28 @@ static int eventfs_dir_open(struct inode *inode, struct file *file); static int eventfs_iterate(struct file *file, struct dir_context *ctx); static int eventfs_dir_release(struct inode *inode, struct file *file); +/* Copied from scripts/kconfig/symbol.c */ +static unsigned strhash(const char *s) +{ + /* fnv32 hash */ + unsigned hash = 2166136261U; + for (; *s; s++) + hash = (hash ^ *s) * 0x01000193; + return hash; +} + +/* Just try to make something consistent and unique */ +static int eventfs_inode_ino(struct eventfs_inode *ei, const char *name) +{ + unsigned long sip = (unsigned long)ei; + + sip += strhash(name); + sip = siphash_1u32((int)sip, _key); + + /* keep it positive */ + return sip & ((1U << 31) - 1); +} + static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) { unsigned int ia_valid = iattr->ia_valid; @@ -491,6 +517,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx, mutex_unlock(_mutex); dentry = create_file(name, mode, attr, parent, data, fops); + if (!IS_ERR_OR_NULL(dentry)) + dentry->d_inode->i_ino = eventfs_inode_ino(ei, name); mutex_lock(_mutex); @@ -585,6 +613,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, mutex_unlock(_mutex); dentry = create_dir(ei, parent); + if (!IS_ERR_OR_NULL(dentry)) +
Re: [PATCH] eventfs: Create dentries and inodes at dir open
On Tue, 16 Jan 2024 10:21:49 -0800 Linus Torvalds wrote: > Here's a clue: just fix your inode numbers. > > I can think of many ways to do it. Here's a couple: > > - use a fixed inode number for all inodes. It's fine. Really. You might > confuse some programs that still do getpwd() the legacy way, but hey, > nobody cares > > - just put the inode number in the same data structure everything else is > > > - make the inode number be a hash of the address of your data structure. > That's actually the closest to a traditional "real" inode number, which is > just an index to some on-disk thing > > I'm sure there are other *trivial* solutions. > > None of this is an excuse to misuse sentries. > > Try the first one - one single inode number - first. You shouldn't be doing > iget() anyway, so why do you care so deeply about a number that makes no > sense and nobody should care about? It was me being paranoid that using the same inode number would break user space. If that is not a concern, then I'm happy to just make it either the same, or maybe just hash the ei and name that it is associated with. If I do not fully understand how something is used, I try hard to make it act the same as it does for other use cases. That is, I did all this to keep inodes unique and consistent because I did not know if it would break something if I didn't. Removing that requirement does make it much easier to implement readdir. I think I'll do the hashing, just because I'm still paranoid that something might still break if they are all the same. -- Steve
Re: Unable to trace nf_nat function using kprobe with latest kernel 6.1.66-1
On Tue, 2 Jan 2024 11:23:54 +0530 P K wrote: > Hi, > > I am unable to trace nf_nat functions using kprobe with the latest kernel. > Previously in kernel 6.1.55-1 it was working fine. > Can someone please check if it's broken with the latest commit or i > have to use it differently? > Note, attaching to kernel functions is never considered stable API and may break at any kernel release. Also, if the compiler decides to inline the function, and makes it no longer visible in /proc/kallsyms then that too will cause this to break. -- Steve > Mentioned below are output: > Kernel - 6.1.55-1 > / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }' > Attaching 1 probe... > cannot attach kprobe, probe entry may not exist > ERROR: Error attaching probe: 'kprobe:nf_nat_ipv4_manip_pkt' > > > > Kernel 6.1.55-1 > / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }' > Attaching 1 probe... > func called > func called
Re: trace_event names with more than NAME_MAX chars
On Fri, 8 Dec 2023 18:36:01 + Beau Belgrave wrote: > While developing some unrelated features I happened to create a > trace_event that was more than NAME_MAX (255) characters. When this > happened the creation worked, but tracefs would hang any task that tried > to list the directory of the trace_event or remove it. > > I followed the code down to the reason being eventfs would call > simple_lookup(), and if it failed, it would still try to create the > dentry. In this case DCACHE_PAR_LOOKUP would get set and never cleared. > This caused d_wait_lookup() to loop forever, since that flag is used in > d_in_lookup(). > > Both tracefs and eventfs use simple_lookup() and it fails for > dentries that exceed NAME_MAX. Should we even allow trace_events to > be created that exceed this limit? Or should tracefs/eventfs allow > this but somehow represent these differently? Yeah I think it's easiest to just prevent dentries from exceeding NAME_MAX, and yes the patch below should be added. Care to send a real patch? Thanks, -- Steve > > I have a fix that appears to work for myself, but unsure if there are > other locations (attached at the end of this mail). > > Thanks, > -Beau > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c > index f8a594a50ae6..d2c06ba26db4 100644 > --- a/fs/tracefs/event_inode.c > +++ b/fs/tracefs/event_inode.c > @@ -561,6 +561,8 @@ static struct dentry *eventfs_root_lookup(struct inode > *dir, > if (strcmp(ei_child->name, name) != 0) > continue; > ret = simple_lookup(dir, dentry, flags); > + if (IS_ERR(ret)) > + goto out; > create_dir_dentry(ei, ei_child, ei_dentry, true); > created = true; > break; > @@ -583,6 +585,8 @@ static struct dentry *eventfs_root_lookup(struct inode > *dir, > if (r <= 0) > continue; > ret = simple_lookup(dir, dentry, flags); > + if (IS_ERR(ret)) > + goto out; > create_file_dentry(ei, i, ei_dentry, name, mode, > cdata, >fops, true); > break;
Re: [PATCH net-next] tcp: add tracepoints for data send/recv/acked
On Tue, 5 Dec 2023 20:39:28 +0100 Eric Dumazet wrote: > > So, we do not want to add some tracepoint to do some unknow debug. > > We have a clear goal. debugging is just an incidental capability. > > > > We have powerful mechanisms in the stack already that ordinary (no > privilege requested) applications can readily use. > I'm not arguing for or against this patch set, but tracepoints are available for other utilities that may have non privilege access. They are not just for tracers. -- Steve
Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint
On Sun, 3 Dec 2023 10:33:32 +0900 Dominique Martinet wrote: > > TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n", > > (unsigned long)__entry->clnt, > > show_9p_op(__entry->type), > > __entry->tag, 0, __get_dynamic_array(line), 16, > > __get_dynamic_array(line) + 16) > > This was just printing garbage in the previous version but %16ph with a > dynamic alloc would be out of range (even the start of the next buffer, > _get_dynamic_array(line) + 16, can be out of range) > > Also, for custom tracepoints e.g. bpftrace the program needs to know how > many bytes can be read safely even if it's just for dumping -- unless > dynamic_array is a "fat pointer" that conveys its own size? > (Sorry didn't take the time to check) Yes, there's also a __get_dynamic_array_len(line) that will return the allocated length of the line. Is that what you need? -- Steve
Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint
On Sat, 02 Dec 2023 14:05:24 +0100 Christian Schoenebeck wrote: > > > --- a/include/trace/events/9p.h > > > +++ b/include/trace/events/9p.h > > > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump, > > > __entry->clnt = clnt; > > > __entry->type = pdu->id; > > > __entry->tag= pdu->tag; > > > - memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ); > > > + memcpy(__entry->line, pdu->sdata, > > > + min(pdu->capacity, P9_PROTO_DUMP_SZ)); > > > ), > > > TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n", > > > (unsigned long)__entry->clnt, show_9p_op(__entry->type), > > AFAICS __entry is a local variable on stack, and array __entry->line not > intialized with zeros, i.e. the dump would contain trash at the end. Maybe > prepending memset() before memcpy()? __entry is a macro that points into the ring buffer that gets allocated before this is called. TRACE_EVENT() has a __dynamic_array() field that can handle variable length arrays. What you can do is turn this into something like: TRACE_EVENT(9p_protocol_dump, TP_PROTO(struct p9_client *clnt, struct p9_fcall *pdu), TP_ARGS(clnt, pdu), TP_STRUCT__entry( __field(void *, clnt ) __field(__u8, type ) __field(__u16, tag ) __dynamic_array(unsigned char, line, min(pdu->capacity, P9_PROTO_DUMP_SZ) ) ), TP_fast_assign( __entry->clnt = clnt; __entry->type = pdu->id; __entry->tag= pdu->tag; memcpy(__get_dynamic_array(line), pdu->sdata, min(pdu->capacity, P9_PROTO_DUMP_SZ)); ), TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n", (unsigned long)__entry->clnt, show_9p_op(__entry->type), __entry->tag, 0, __get_dynamic_array(line), 16, __get_dynamic_array(line) + 16) ); -- Steve
Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER
On Fri, 1 Dec 2023 09:25:59 -0800 Justin Chen wrote: > > It appears the sub instruction at 0x6dd0 correctly accounts for the > > extra 8 bytes, so the frame pointer is valid. So it is our assumption > > that there are no gaps between the stack frames is invalid. > > Thanks for the assistance. The gap between the stack frame depends on > the function. Most do not have a gap. Some have 8 (as shown above), some > have 12. A single assumption here is not going to work. I'm having a > hard time finding out the reasoning for this gap. I tried disabling a > bunch of gcc flags as well as -O2 and the gap still exists. That code was originally added because of some strange things that gcc did with mcount (for example, it made a copy of the stack frame that it passed to mcount, where the function graph tracer replaced the copy of the return stack making the shadow stack go out of sync and crash). This was very hard to debug and I added this code to detect it if it happened again. Well it's been over a decade since that happened (2009). 71e308a239c09 ("function-graph: add stack frame test") I'm happy assuming that the compiler folks are aware of our tricks with hijacking return calls and I don't expect it to happen again. We can just rip out those checks. That is, if it's only causing false positives, I don't think it's worth keeping around. Has it detected any real issues on the Arm platforms? -- Steve
[PATCH v4] libtracecmd: Use an rbtree for mapping of cache pages
From: "Steven Rostedt (Google)" I was loading a very large trace.dat file into kernelshark when it just stopped near the end off the load and hung there for a very long time. I ran it under gdb and hit Ctrl^C when it hit the hang to see what it was doing and found that it was spinning in: libtracecmd trace-input.c: free_zpage static void free_zpage(struct cpu_data *cpu_data, void *map) { struct zchunk_cache *cache; list_for_each_entry(cache, _data->compress.cache, list) { if (map <= cache->map && map > (cache->map + cache->chunk->size)) goto found; } return; found: It seems that there can be over 10 thousand cache pages in that link list and that kernelshark is constantly hitting that. So it's doing a linear search for the mapped page to free it. I ran: time kernelshark trace.dat And exited out when it finished loading and the result was: real6m14.772s user6m0.649s sys 0m12.718s That's over 6 minutes to load the trace.dat file!!! I ran perf record on it and it showed 77% of the time was in free_zpage(). I pulled out my old algorithms book and wrote up a rbtree for internal use of libtracecmd. Then I switched the cache into a binary rbtree to do the look ups. As the lookups used both where the memory of the compressed page is mapped as well as the offset depending on how the search was done, I found that it only used the memory allocation address in one location. Luckily, the memory allocation mapping lookup also had access to the offset of the file the memory represented. That allowed me to make all lookups use the file offset. After converting the cache to an rbtree lookup, I ran kernelshark again on opening that file and exited out as soon as it finished loading and the timings was: real1m22.356s user1m10.532s sys 0m10.901s Still a bit long, but it dropped from over 6 minutes to under 1 1/2 minutes. Also, free_zpages() was no longer in the perf record output. Running the same file under trace-cmd produced: Without this change: $ time trace-cmd report trace.dat > /dev/null real9m20.390s user9m16.391s sys 0m3.529s With this change: $ time trace-cmd report trace.dat > /dev/null real6m22.935s user6m19.537s sys 0m3.139s Not as drastic as the KernelShark speedup, but still brings it down by a third. Signed-off-by: Steven Rostedt (Google) --- Changse since v3: https://lore.kernel.org/all/20231019165205.371e8...@gandalf.local.home/ - Check for node not NULL before checking node->left Changes since v2: https://lore.kernel.org/all/20231017104307.662aa...@gandalf.local.home/ - Fixed a few more bugs - Added a nice little iterator (although not used) - Added a "check_tree()" routine in case I need to debug it again This is #if out but nice to keep in the code anyway. Changes since v1: https://lore.kernel.org/linux-trace-devel/20231016230058.2f0e9...@gandalf.local.home/ - Removed some leftover debug - Fix deletion in setting the color of the node that replaces the deleted node. lib/trace-cmd/Makefile | 1 + lib/trace-cmd/include/private/trace-rbtree.h | 34 ++ lib/trace-cmd/trace-input.c | 81 +++- lib/trace-cmd/trace-rbtree.c | 440 +++ 4 files changed, 533 insertions(+), 23 deletions(-) create mode 100644 lib/trace-cmd/include/private/trace-rbtree.h create mode 100644 lib/trace-cmd/trace-rbtree.c diff --git a/lib/trace-cmd/Makefile b/lib/trace-cmd/Makefile index e9d26b2bb367..aba6fda5b0f6 100644 --- a/lib/trace-cmd/Makefile +++ b/lib/trace-cmd/Makefile @@ -9,6 +9,7 @@ DEFAULT_TARGET = $(LIBTRACECMD_STATIC) OBJS = OBJS += trace-hash.o +OBJS += trace-rbtree.o OBJS += trace-hooks.o OBJS += trace-input.o OBJS += trace-output.o diff --git a/lib/trace-cmd/include/private/trace-rbtree.h b/lib/trace-cmd/include/private/trace-rbtree.h new file mode 100644 index ..822111998e7a --- /dev/null +++ b/lib/trace-cmd/include/private/trace-rbtree.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: LGPL-2.1 */ +/* + * Copyright (C) 2023 Google, Steven Rostedt + * + */ +#ifndef _TRACE_RBTREE_H +#define _TRACE_RBTREE_H + +struct trace_rbtree_node { + struct trace_rbtree_node*parent; + struct trace_rbtree_node*left; + struct trace_rbtree_node*right; + int color; +}; + +typedef int (*trace_rbtree_cmp_fn)(const struct trace_rbtree_node *A, const struct trace_rbtree_node *B); + +typedef int (*trace_rbtree_search_fn)(const struct trace_rbtree_node *n, const void *data); + +struct trace_rbtree { + struct trace_rbtree_node*node; + trace_rbtree_search_fn search; + trace_rbtree_cmp_fn cmp; + size_t nr_nodes; +}; + +void trace_rbtree_
Re: [PATCH v5 0/2] Return EADDRNOTAVAIL when func matches several symbols during kprobe creation
On Thu, 19 Oct 2023 21:18:43 +0900 Masami Hiramatsu (Google) wrote: > > So why is this adding stable? (and as Greg's form letter states, that's not > > how you do that) > > > > I don't see this as a fix but a new feature. > > I asked him to make this a fix since the current kprobe event' behavior is > somewhat strange. It puts the probe on only the "first symbol" if user > specifies a symbol name which has multiple instances. In this case, the > actual probe address can not be solved by name. User must specify the > probe address by unique name + offset. Unless, it can put a probe on > unexpected address, especially if it specifies non-unique symbol + offset, > the address may NOT be the instruction boundary. > To avoid this issue, it should check the given symbol is unique. > OK, so what is broken is that when you add a probe to a function that has multiple names, it will attach to the first one and not necessarily the one you want. The change log needs to be more explicit in what the "bug" is. It does state this in a round about way, but it is written in a way that it doesn't stand out. Previously to this commit, if func matches several symbols, a kprobe, being either sysfs or PMU, would only be installed for the first matching address. This could lead to some misunderstanding when some BPF code was never called because it was attached to a function which was indeed not called, because the effectively called one has no kprobes attached. So, this commit returns EADDRNOTAVAIL when func matches several symbols. This way, user needs to use address to remove the ambiguity. What it should say is: When a kprobe is attached to a function that's name is not unique (is static and shares the name with other functions in the kernel), the kprobe is attached to the first function it finds. This is a bug as the function that it is attaching to is not necessarily the one that the user wants to attach to. Instead of blindly picking a function to attach to what is ambiguous, error with EADDRNOTAVAIL to let the user know that this function is not unique, and that the user must use another unique function with an address offset to get to the function they want to attach to. And yes, it should have: Cc: sta...@vger.kernel.org which is how to mark something for stable, and Fixes: ... To the commit that caused the bug. -- Steve
Re: [PATCH v5 0/2] Return EADDRNOTAVAIL when func matches several symbols during kprobe creation
On Wed, 18 Oct 2023 17:40:28 +0300 Francis Laniel wrote: > Changes since: > v1: > * Use EADDRNOTAVAIL instead of adding a new error code. > * Correct also this behavior for sysfs kprobe. > v2: > * Count the number of symbols corresponding to function name and return > EADDRNOTAVAIL if higher than 1. > * Return ENOENT if above count is 0, as it would be returned later by while > registering the kprobe. > v3: > * Check symbol does not contain ':' before testing its uniqueness. > * Add a selftest to check this is not possible to install a kprobe for a non > unique symbol. > v5: > * No changes, just add linux-stable as recipient. So why is this adding stable? (and as Greg's form letter states, that's not how you do that) I don't see this as a fix but a new feature. -- Steve
Re: [PATCH 3/8] usb: udc: trace: reduce buffer usage of trace event
On Fri, 15 Sep 2023 09:11:06 +0800 Linyu Yuan wrote: > >> + snprintf(__s, 9, "ep%d%s", te.address, \ > >> + (te.caps.dir_in && te.caps.dir_out) ? "" : \ > >> + te.caps.dir_in ? "in" : "out"); > > Note, there's a temp buffer trace_seq 'p' available for use as well. See > > both include/trace/events/libata.h and include/trace/events/scsi.h: > > > >const char *libata_trace_parse_status(struct trace_seq*, unsigned char); > >#define __parse_status(s) libata_trace_parse_status(p, s) > > > > I think that can be used instead of adding this TP_printk_init(). > > > the reason add TP_printk_init() because when i first design some macro > which not > > related to tracepoint, it use too much stack. > Not sure what you mean about 'uses too much stack'. This is called by the reading code and not some arbitrary location, and the above macros are done in the same location as your "init" call, so I'm not sure how that makes a difference on the stack. > > but i think TP_printk_init() is good as it following most common way > to print. > I really do not want to add more versions of TRACE_EVENT() that I need to maintain unless there is a really good reason to do so. And I really don't want to encourage the use of a "TP_printk_init()" because that just encourages more use cases that will make it hard for user space to parse the TP_printk(). -- Steve
Re: [PATCH 3/8] usb: udc: trace: reduce buffer usage of trace event
On Thu, 14 Sep 2023 18:02:57 +0800 Linyu Yuan wrote: > Save u32 members into trace event ring buffer and parse it for possible > bit fields. > > Use new DECLARE_EVENT_CLASS_PRINT_INIT() class macro for output stage. > > Signed-off-by: Linyu Yuan > --- > drivers/usb/gadget/udc/trace.h | 154 +++-- > 1 file changed, 69 insertions(+), 85 deletions(-) > > diff --git a/drivers/usb/gadget/udc/trace.h b/drivers/usb/gadget/udc/trace.h > index a5ed26fbc2da..e1754667f1d2 100644 > --- a/drivers/usb/gadget/udc/trace.h > +++ b/drivers/usb/gadget/udc/trace.h > @@ -17,7 +17,7 @@ > #include > #include > > -DECLARE_EVENT_CLASS(udc_log_gadget, > +DECLARE_EVENT_CLASS_PRINT_INIT(udc_log_gadget, > TP_PROTO(struct usb_gadget *g, int ret), > TP_ARGS(g, ret), > TP_STRUCT__entry( > @@ -25,20 +25,7 @@ DECLARE_EVENT_CLASS(udc_log_gadget, > __field(enum usb_device_speed, max_speed) > __field(enum usb_device_state, state) > __field(unsigned, mA) > - __field(unsigned, sg_supported) > - __field(unsigned, is_otg) > - __field(unsigned, is_a_peripheral) > - __field(unsigned, b_hnp_enable) > - __field(unsigned, a_hnp_support) > - __field(unsigned, hnp_polling_support) > - __field(unsigned, host_request_flag) > - __field(unsigned, quirk_ep_out_aligned_size) > - __field(unsigned, quirk_altset_not_supp) > - __field(unsigned, quirk_stall_not_supp) > - __field(unsigned, quirk_zlp_not_supp) > - __field(unsigned, is_selfpowered) > - __field(unsigned, deactivated) > - __field(unsigned, connected) > + __field(u32, gdw1) > __field(int, ret) > ), > TP_fast_assign( > @@ -46,39 +33,35 @@ DECLARE_EVENT_CLASS(udc_log_gadget, > __entry->max_speed = g->max_speed; > __entry->state = g->state; > __entry->mA = g->mA; > - __entry->sg_supported = g->sg_supported; > - __entry->is_otg = g->is_otg; > - __entry->is_a_peripheral = g->is_a_peripheral; > - __entry->b_hnp_enable = g->b_hnp_enable; > - __entry->a_hnp_support = g->a_hnp_support; > - __entry->hnp_polling_support = g->hnp_polling_support; > - __entry->host_request_flag = g->host_request_flag; > - __entry->quirk_ep_out_aligned_size = > g->quirk_ep_out_aligned_size; > - __entry->quirk_altset_not_supp = g->quirk_altset_not_supp; > - __entry->quirk_stall_not_supp = g->quirk_stall_not_supp; > - __entry->quirk_zlp_not_supp = g->quirk_zlp_not_supp; > - __entry->is_selfpowered = g->is_selfpowered; > - __entry->deactivated = g->deactivated; > - __entry->connected = g->connected; > + __entry->gdw1 = g->dw1; > __entry->ret = ret; > ), > - TP_printk("speed %d/%d state %d %dmA [%s%s%s%s%s%s%s%s%s%s%s%s%s%s] --> > %d", > + TP_printk("speed %d/%d state %d %dmA > [%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s] --> %d", > __entry->speed, __entry->max_speed, __entry->state, __entry->mA, > - __entry->sg_supported ? "sg:" : "", > - __entry->is_otg ? "OTG:" : "", > - __entry->is_a_peripheral ? "a_peripheral:" : "", > - __entry->b_hnp_enable ? "b_hnp:" : "", > - __entry->a_hnp_support ? "a_hnp:" : "", > - __entry->hnp_polling_support ? "hnp_poll:" : "", > - __entry->host_request_flag ? "hostreq:" : "", > - __entry->quirk_ep_out_aligned_size ? "out_aligned:" : "", > - __entry->quirk_altset_not_supp ? "no_altset:" : "", > - __entry->quirk_stall_not_supp ? "no_stall:" : "", > - __entry->quirk_zlp_not_supp ? "no_zlp" : "", > - __entry->is_selfpowered ? "self-powered:" : "bus-powered:", > - __entry->deactivated ? "deactivated:" : "activated:", > - __entry->connected ? "connected" : "disconnected", > - __entry->ret) > + tg.sg_supported ? "sg:" : "", > + tg.is_otg ? "OTG:" : "", > + tg.is_a_peripheral ? "a_peripheral:" : "", > + tg.b_hnp_enable ? "b_hnp:" : "", > + tg.a_hnp_support ? "a_hnp:" : "", > + tg.a_alt_hnp_support ? "a_alt_hnp:" : "", > + tg.hnp_polling_support ? "hnp_poll:" : "", > + tg.host_request_flag ? "hostreq:" : "", > + tg.quirk_ep_out_aligned_size ? "out_aligned:" : "", > + tg.quirk_altset_not_supp ? "no_altset:" : "", > + tg.quirk_stall_not_supp ? "no_stall:" : "", > + tg.quirk_zlp_not_supp ? "no_zlp" : "", > + tg.quirk_avoids_skb_reserve ? "no_skb_reserve" : "", > + tg.is_selfpowered ? "self-powered:" :
Re: [PATCH 0/8] usb: gadget: reduce usb gadget trace event buffer usage
On Thu, 14 Sep 2023 18:02:54 +0800 Linyu Yuan wrote: > some trace event use an interger to to save a bit field info of gadget, > also some trace save endpoint name in string forat, it all can be > chagned to other way at trace event store phase. > > bit field can be replace with a union interger member which include > multiple bit fields. > > ep name stringe can be replace to a interger which contaion number > and dir info. > > to allow trace output stage can get bit info from save interger, > add DECLARE_EVENT_CLASS_PRINT_INIT() clas which allow user defined > operation before print. > > v1: > https://lore.kernel.org/linux-usb/20230911042843.2711-1-quic_linyy...@quicinc.com/ > v2: fix two compile issues that COMPILE_TEST not covered > > https://lore.kernel.org/linux-usb/2023092446.1791-1-quic_linyy...@quicinc.com/ > v3: fix reviewer comments, allow bit fields work on both little and big endian > > https://lore.kernel.org/linux-usb/20230912104455.7737-1-quic_linyy...@quicinc.com/ > v4: add DECLARE_EVENT_CLASS_PRINT_INIT() new trace class and use it > All these changes make it useless for user space. :-( -- Steve > Linyu Yuan (8): > trace: add new DECLARE_EVENT_CLASS_PRINT_INIT class type > usb: gadget: add anonymous definition in some struct for trace purpose > usb: udc: trace: reduce buffer usage of trace event > usb: cdns3: trace: reduce buffer usage of trace event > usb: dwc3: trace: reduce buffer usage of trace event > usb: cdns2: trace: reduce buffer usage of trace event > usb: mtu3: trace: reduce buffer usage of trace event > usb: musb: trace: reduce buffer usage of trace event > > drivers/usb/cdns3/cdns3-trace.h| 201 ++--- > drivers/usb/cdns3/cdnsp-trace.h| 105 +++ > drivers/usb/dwc3/trace.h | 99 ++ > drivers/usb/gadget/udc/cdns2/cdns2-trace.h | 175 -- > drivers/usb/gadget/udc/trace.h | 154 +++- > drivers/usb/mtu3/mtu3_trace.h | 76 +--- > drivers/usb/musb/musb_trace.h | 20 +- > include/linux/tracepoint.h | 22 +++ > include/linux/usb/gadget.h | 113 +++- > include/trace/bpf_probe.h | 4 + > include/trace/perf.h | 43 + > include/trace/stages/stage3_trace_output.h | 3 + > include/trace/trace_events.h | 118 > 13 files changed, 784 insertions(+), 349 deletions(-) >
Re: [PATCH v5] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
On Tue, 12 Sep 2023 21:47:52 +0800 Jinjie Ruan wrote: > v5: > - Move the ef below dir to keep the "upside-down x-mas tree" format. That was quick! I'll add it to my queue and start testing it. Thanks Ruan! -- Steve
Re: [PATCH v4] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
On Tue, 12 Sep 2023 10:58:08 +0800 Jinjie Ruan wrote: This looks good. But I have a small nit. > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index ed367d713be0..8e0593d4c6a6 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2296,6 +2296,7 @@ event_subsystem_dir(struct trace_array *tr, const char > *name, > struct trace_event_file *file, struct dentry *parent) > { > struct event_subsystem *system, *iter; > + struct eventfs_file *ef; > struct trace_subsystem_dir *dir; > int res; > Can you move the ef below dir to keep the "upside-down x-mas tree" format: struct event_subsystem *system, *iter; struct trace_subsystem_dir *dir; struct eventfs_file *ef; int res; That's easier to read. Thanks! -- Steve
Re: [PATCH v3] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
On Mon, 11 Sep 2023 20:51:25 +0900 Masami Hiramatsu (Google) wrote: > Instead, > > ef = eventfs_add_subsystem_dir(name, parent); > if (IS_ERR(ef)) { > ... > } else > dir->ef = ef; Note, as the error has a goto out_free, it just needs to be: if (IS_ERR(ef)) { ... goto out_free; } dir->ef = ef; And for event_create_dir() ef = eventfs_add_dir(name, ef_subsystem); if (IS_ERR(ef)) { ... return -1; } file->ef = ef; -- Steve