Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-05-17 Thread Steven Rostedt
On Fri, 17 May 2024 10:36:37 -0700
Guenter Roeck  wrote:

> Building csky:allmodconfig (and others) ... failed
> --
> Error log:
> In file included from include/trace/trace_events.h:419,
>  from include/trace/define_trace.h:102,
>  from drivers/cxl/core/trace.h:737,
>  from drivers/cxl/core/trace.c:8:
> drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 
> arguments, but takes just 1
> 
> This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
> So far that seems to be the only build failure.
> Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
> cxl_general_media and cxl_dram events"). Guess we'll see more of those
> towards the end of the commit window.

Looks like I made this patch just before this commit was pulled into
Linus's tree.

Which is why I'll apply and rerun the above again probably on Tuesday of
next week against Linus's latest.

This patch made it through both an allyesconfig and an allmodconfig, but on
the commit I had applied it to, which was:

  1b294a1f3561 ("Merge tag 'net-next-6.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")

I'll be compiling those two builds after I update it then.

-- Steve



Re: ftrace_direct_func_count ?

2024-05-06 Thread Steven Rostedt
On Sat, 4 May 2024 13:35:26 +
"Dr. David Alan Gilbert"  wrote:

> Hi,
>   I've just posted a patch 'ftrace: Remove unused list 'ftrace_direct_funcs''
> that clears out some old code, but while at it I noticed the global
> 'ftrace_direct_func_count'.
> 
> As far as I can tell, it's never assigned (or initialised) but it is tested:
> 
> kernel/trace/fgraph.c:
> #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
>   /*
>* Skip graph tracing if the return location is served by direct trampoline,
>* since call sequence and return addresses are unpredictable anyway.
>* Ex: BPF trampoline may call original function and may skip frame
>* depending on type of BPF programs attached.
>*/
>   if (ftrace_direct_func_count &&
>   ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
> return -EBUSY;
> #endif
> 
> So I wasn't sure whether it was just safe to nuke that section
> or whether it really needed fixing?

Yes, after commit 8788ca164eb4bad ("ftrace: Remove the legacy
_ftrace_direct API") that variable is no longer used.

-- Steve



Re: [PATCH] ftrace: Replace ftrace_disabled variable with ftrace_is_dead function

2024-04-27 Thread Steven Rostedt
On Sat, 20 Apr 2024 11:50:29 +0800
"Bang Li"  wrote:

> Thank you for your explanation, let me understand this. How about
> replacing ftrace_disabled with unlike(ftrace_disabled)?

Why? They are slow paths. No need to optimize them.

-- Steve



Re: [PATCH net-next v6 0/7] Implement reset reason mechanism to detect

2024-04-19 Thread Steven Rostedt
On Fri, 19 Apr 2024 16:00:20 +0800
Jason Xing  wrote:

> If other experts see this thread, please help me. I would appreciate
> it. I have strong interests and feel strong responsibility to
> implement something like this patch series. It can be very useful!!

I'm not a networking expert, but as I'm Cc'd and this is about tracing,
I'll jump in to see if I can help. Honestly, reading the thread, it
appears that you and Eric are talking past each other.

I believe Eric is concerned about losing the value of the enum. Enums
are types, and if you typecast them to another type, they lose the
previous type, and all the safety that goes with it.

Now, I do not really understand the problem trying to be solved here. I
understand how TCP works but I never looked into the implementation of
MPTCP.

You added this:

+static inline enum sk_rst_reason convert_mptcp_reason(u32 reason)
+{
+   return reason += RST_REASON_START;
+}

And used it for places like this:

@@ -309,8 +309,13 @@ static struct dst_entry *subflow_v4_route_req(const struct 
sock *sk,
return dst;
 
dst_release(dst);
-   if (!req->syncookie)
-   tcp_request_sock_ops.send_reset(sk, skb, 
SK_RST_REASON_NOT_SPECIFIED);
+   if (!req->syncookie) {
+   struct mptcp_ext *mpext = mptcp_get_ext(skb);
+   enum sk_rst_reason reason;
+
+   reason = convert_mptcp_reason(mpext->reset_reason);
+   tcp_request_sock_ops.send_reset(sk, skb, reason);
+   }
return NULL;
 }

As I don't know this code or how MPTCP works, I do not understand the
above. It use to pass to send_reset() SK_RST_REASON_NOT_SPECIFIED. But
now it takes a "reset_reason" calls the "convert_mptcp_reason()" to get
back a enum value.

If you are mapping the reset_reason to enum sk_rst_reason, why not do
it via a real conversion instead of this fragile arithmetic between the two
values?

static inline enum sk_rst_reason convert_mptcp_reason(u32 reason)
{
switch(reason) {
case 0: return SK_RST_REASON_MPTCP_RST_EUNSPEC;
case 1: return SK_RST_REASON_MPTCP_RST_EMPTCP;
case 2: return SK_RST_REASON_MPTCP_RST_ERESOURCE;
[..]
default: return SK_RST_REASON_MAX; // or some other error value
]
}

I'm not sure if this is any better, but it's not doing any casting and
it's easier to understand. It's a simple mapping between the reason and
the enum and there's no inherit dependency between the values. Could
possibly create enums for the reason numbers and replace the hard coded
values with them.

That way that helper function is at least doing a real conversion of
one type to another.

But like I said from the beginning. I don't understand the details here
and have not spent the time to dig deeper. I just read the thread and I
agree with Eric that the arithmetic conversion of reason to an enum
looks fragile at best and buggy at worst.

-- Steve



Re: [PATCH] ftrace: Replace ftrace_disabled variable with ftrace_is_dead function

2024-04-19 Thread Steven Rostedt
On Fri, 19 Apr 2024 22:38:44 +0800
"Bang Li"  wrote:

> Use the existing function ftrace_is_dead to replace the variable to make
> the code clearer.
> 
> Signed-off-by: Bang Li 
> ---
>  kernel/trace/ftrace.c | 46 +--
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 50ca4d4f8840..4a08c79db677 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2693,7 +2693,7 @@ void __weak ftrace_replace_code(int mod_flags)
>   int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
>   int failed;
>  
> - if (unlikely(ftrace_disabled))
> + if (unlikely(ftrace_is_dead()))
>   return;
>  

NACK!

ftrace_is_dead() is only there to make the static variable
"ftrace_disabled" available for code outside of ftrace.c. In ftrace.c,
it is perfectly fine to use ftrace_disabled.

-- Steve



Re: [PATCH net-next v3 6/6] rstreason: make it work in trace world

2024-04-09 Thread Steven Rostedt
On Tue,  9 Apr 2024 18:09:34 +0800
Jason Xing  wrote:

>  /*
>   * tcp event with arguments sk and skb
> @@ -74,20 +75,38 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
>   TP_ARGS(sk, skb)
>  );
>  
> +#undef FN1
> +#define FN1(reason)  TRACE_DEFINE_ENUM(SK_RST_REASON_##reason);
> +#undef FN2
> +#define FN2(reason)  TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
> +DEFINE_RST_REASON(FN1, FN1)

Interesting. I've never seen the passing of the internal macros to the main
macro before. I see that you are using it for handling both the
SK_RST_REASON and the SK_DROP_REASON.

> +
> +#undef FN1
> +#undef FNe1
> +#define FN1(reason)  { SK_RST_REASON_##reason, #reason },
> +#define FNe1(reason) { SK_RST_REASON_##reason, #reason }
> +
> +#undef FN2
> +#undef FNe2
> +#define FN2(reason)  { SKB_DROP_REASON_##reason, #reason },
> +#define FNe2(reason) { SKB_DROP_REASON_##reason, #reason }

Anyway, from a tracing point of view, as it looks like it would work
(I haven't tested it).

Reviewed-by: Steven Rostedt (Google) 

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-03 Thread Steven Rostedt
On Tue, 2 Apr 2024 22:21:00 -0700
Andrii Nakryiko  wrote:

> > I just checked our fleet-wide production data for the last 24 hours.
> > Within the kprobe/kretprobe code path (ftrace_trampoline and
> > everything called from it), rcu_is_watching (both calls, see below)
> > cause 0.484% CPU cycles usage, which isn't nothing. So definitely we'd
> > prefer to be able to avoid that in production use cases.
> >  
> 
> I just ran synthetic microbenchmark testing multi-kretprobe
> throughput. We get (in millions of BPF kretprobe-multi program
> invocations per second):
>   - 5.568M/s as baseline;
>   - 5.679M/s with changes in this patch (+2% throughput improvement);
>   - 5.808M/s with disabling rcu_is_watching in rethook_try_get()
> (+2.3% more vs just one of rcu_is_watching, and +4.3% vs baseline).
> 
> It's definitely noticeable.

Ah, thanks for verifying (I should have read the thread before replying to
the previous email).

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-03 Thread Steven Rostedt
On Tue, 2 Apr 2024 21:00:19 -0700
Andrii Nakryiko  wrote:

> I just noticed another rcu_is_watching() call, in rethook_try_get(),
> which seems to be a similar and complementary validation check to the
> one we are putting under CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING option
> in this patch. It feels like both of them should be controlled by the
> same settings. WDYT? Can I add CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> guard around rcu_is_watching() check in rethook_try_get() as well?

That is totally up to Masami. It may have even less overhead as I'm not
sure how many times that gets called, and there may be more work to do than
with function tracing.

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-02 Thread Steven Rostedt
On Wed, 3 Apr 2024 09:40:48 +0900
Masami Hiramatsu (Google)  wrote:

> OK, for me, this last sentence is preferred for the help message. That 
> explains
> what this is for. 
> 
> All callbacks that attach to the function tracing have some sort
> of protection against recursion. This option is only to verify that
>    ftrace (and other users of ftrace_test_recursion_trylock()) are not
> called outside of RCU, as if they are, it can cause a race.
> But it also has a noticeable overhead when enabled.
> 
> BTW, how much overhead does this introduce? and the race case a kernel crash?
> or just messed up the ftrace buffer?

There's a hypothetical race where it can cause a use after free.

That is, after you shutdown ftrace, you need to call synchronize_rcu_tasks(),
which requires RCU to be watching. There's a theoretical case where that
task calls the trampoline and misses the synchronization. Note, these
locations are with preemption disabled, as rcu is always watching when
preemption is enabled. Thus it would be extremely fast where as the
synchronize_rcu_tasks() is rather slow.

We also have synchronize_rcu_tasks_rude() which would actually keep the
trace from happening, as it would schedule on each CPU forcing all CPUs to
have RCU watching.

I have never heard of this race being hit. I guess it could happen on a VM
where a vCPU gets preempted at the right moment for a long time and the
other CPUs synchronize.

But like lockdep, where deadlocks can crash the kernel, we don't enable it
for production.

The overhead is another function call within the function tracer. I had
numbers before, but I guess I could run tests again and get new numbers.

Thanks,

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-01 Thread Steven Rostedt
On Mon, 1 Apr 2024 19:29:46 -0700
Andrii Nakryiko  wrote:

> On Mon, Apr 1, 2024 at 5:38 PM Masami Hiramatsu  wrote:
> >
> > On Mon, 1 Apr 2024 12:09:18 -0400
> > Steven Rostedt  wrote:
> >  
> > > On Mon, 1 Apr 2024 20:25:52 +0900
> > > Masami Hiramatsu (Google)  wrote:
> > >  
> > > > > Masami,
> > > > >
> > > > > Are you OK with just keeping it set to N.  
> > > >
> > > > OK, if it is only for the debugging, I'm OK to set N this.
> > > >  
> > > > >
> > > > > We could have other options like PROVE_LOCKING enable it.  
> > > >
> > > > Agreed (but it should say this is a debug option)  
> > >
> > > It does say "Validate" which to me is a debug option. What would you
> > > suggest?  
> >
> > I think the help message should have "This is for debugging ftrace."
> >  
> 
> Sent v2 with adjusted wording, thanks!

You may want to wait till Masami and I agree ;-)

Masami,

But it isn't really for "debugging", it's for validating. That is, it
doesn't give us any information to debug ftrace. It only validates if it is
executed properly. In other words, I never want to be asked "How can I use
this option to debug ftrace?"

For example, we also have:

  "Verify ring buffer time stamp deltas"

that makes sure the time stamps of the ring buffer are not buggy.

And there's:

  "Verify compile time sorting of ftrace functions"

Which is also used to make sure things are working properly.

Neither of the above says they are for "debugging", even though they are
more useful for debugging than this option.

I'm not sure saying this is "debugging ftrace" is accurate. It may help
debug ftrace if it is called outside of an RCU location, which has a
1 in 100,000,000,000 chance of causing an actual bug, as the race window is
extremely small. 

Now if it is also called outside of instrumentation, that will likely trigger
other warnings even without this code, and this will not be needed to debug
that.

ftrace has all sorts of "verifiers" that is used to make sure things are
working properly. And yes, you can consider it as "debugging". But I would
not consider this an option to enable if ftrace was broken, and you are
looking into why it is broken.

To me, this option is only to verify that ftrace (and other users of
ftrace_test_recursion_trylock()) are not called outside of RCU, as if they
are, it can cause a race. But it also has a noticeable overhead when enabled.

-- Steve


-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-04-01 Thread Steven Rostedt
On Mon, 1 Apr 2024 20:25:52 +0900
Masami Hiramatsu (Google)  wrote:

> > Masami,
> > 
> > Are you OK with just keeping it set to N.  
> 
> OK, if it is only for the debugging, I'm OK to set N this.
> 
> > 
> > We could have other options like PROVE_LOCKING enable it.  
> 
> Agreed (but it should say this is a debug option)

It does say "Validate" which to me is a debug option. What would you
suggest?

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-03-26 Thread Steven Rostedt
On Tue, 26 Mar 2024 09:16:33 -0700
Andrii Nakryiko  wrote:

> > It's no different than lockdep. Test boxes should have it enabled, but
> > there's no reason to have this enabled in a production system.
> >  
> 
> I tend to agree with Steven here (which is why I sent this patch as it
> is), but I'm happy to do it as an opt-out, if Masami insists. Please
> do let me know if I need to send v2 or this one is actually the one
> we'll end up using. Thanks!

Masami,

Are you OK with just keeping it set to N.

We could have other options like PROVE_LOCKING enable it.

-- Steve



Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-03-25 Thread Steven Rostedt
On Mon, 25 Mar 2024 11:38:48 +0900
Masami Hiramatsu (Google)  wrote:

> On Fri, 22 Mar 2024 09:03:23 -0700
> Andrii Nakryiko  wrote:
> 
> > Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to
> > control whether ftrace low-level code performs additional
> > rcu_is_watching()-based validation logic in an attempt to catch noinstr
> > violations.
> > 
> > This check is expected to never be true in practice and would be best
> > controlled with extra config to let users decide if they are willing to
> > pay the price.  
> 
> Hmm, for me, it sounds like "WARN_ON(something) never be true in practice
> so disable it by default". I think CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> is OK, but tht should be set to Y by default. If you have already verified
> that your system never make it true and you want to optimize your ftrace
> path, you can manually set it to N at your own risk.
> 

Really, it's for debugging. I would argue that it should *not* be default y.
Peter added this to find all the locations that could be called where RCU
is not watching. But the issue I have is that this is that it *does cause
overhead* with function tracing.

I believe we found pretty much all locations that were an issue, and we
should now just make it an option for developers.

It's no different than lockdep. Test boxes should have it enabled, but
there's no reason to have this enabled in a production system.

-- Steve


> > 
> > Cc: Steven Rostedt 
> > Cc: Masami Hiramatsu 
> > Cc: Paul E. McKenney 
> > Signed-off-by: Andrii Nakryiko 
> > ---
> >  include/linux/trace_recursion.h |  2 +-
> >  kernel/trace/Kconfig| 13 +
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/trace_recursion.h 
> > b/include/linux/trace_recursion.h
> > index d48cd92d2364..24ea8ac049b4 100644
> > --- a/include/linux/trace_recursion.h
> > +++ b/include/linux/trace_recursion.h
> > @@ -135,7 +135,7 @@ extern void ftrace_record_recursion(unsigned long ip, 
> > unsigned long parent_ip);
> >  # define do_ftrace_record_recursion(ip, pip)   do { } while (0)
> >  #endif
> >  
> > -#ifdef CONFIG_ARCH_WANTS_NO_INSTR
> > +#ifdef CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING
> >  # define trace_warn_on_no_rcu(ip)  \
> > ({  \
> > bool __ret = !rcu_is_watching();\  
> 
> BTW, maybe we can add "unlikely" in the next "if" line?
> 
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 61c541c36596..19bce4e217d6 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -974,6 +974,19 @@ config FTRACE_RECORD_RECURSION_SIZE
> >   This file can be reset, but the limit can not change in
> >   size at runtime.
> >  
> > +config FTRACE_VALIDATE_RCU_IS_WATCHING
> > +   bool "Validate RCU is on during ftrace recursion check"
> > +   depends on FUNCTION_TRACER
> > +   depends on ARCH_WANTS_NO_INSTR  
> 
>   default y
> 
> > +   help
> > + All callbacks that attach to the function tracing have some sort
> > + of protection against recursion. This option performs additional
> > + checks to make sure RCU is on when ftrace callbacks recurse.
> > +
> > + This will add more overhead to all ftrace-based invocations.  
> 
>   ... invocations, but keep it safe.
> 
> > +
> > + If unsure, say N  
> 
>   If unsure, say Y
> 
> Thank you,
> 
> > +
> >  config RING_BUFFER_RECORD_RECURSION
> > bool "Record functions that recurse in the ring buffer"
> > depends on FTRACE_RECORD_RECURSION
> > -- 
> > 2.43.0
> >   
> 
> 




Re: [PATCH] tracing: probes: Fix to zero initialize a local variable

2024-03-20 Thread Steven Rostedt
On Wed, 20 Mar 2024 17:10:38 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Masami Hiramatsu (Google) 
> 
> Fix to initialize 'val' local variable with zero.
> Dan reported that Smatch static code checker reports an error that a local
> 'val' variable needs to be initialized. Actually, the 'val' is expected to
> be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So
> initialize it with zero.

BTW, that loop should really have a comment stating that FETCH_OP_ARG is
expected to happen before FETCH_OP_ST_EDATA.

-- Steve



Re: [PATCH] tracing: probes: Fix to zero initialize a local variable

2024-03-20 Thread Steven Rostedt
On Wed, 20 Mar 2024 17:10:38 +0900
"Masami Hiramatsu (Google)"  wrote:

> From: Masami Hiramatsu (Google) 
> 
> Fix to initialize 'val' local variable with zero.
> Dan reported that Smatch static code checker reports an error that a local
> 'val' variable needs to be initialized. Actually, the 'val' is expected to
> be initialized by FETCH_OP_ARG in the same loop, but it is not obvious. So
> initialize it with zero.
> 
> Reported-by: Dan Carpenter 
> Closes: 
> https://lore.kernel.org/all/b010488e-68aa-407c-add0-3e059254aaa0@moroto.mountain/
> Fixes: 25f00e40ce79 ("tracing/probes: Support $argN in return probe (kprobe 
> and fprobe)")
> Signed-off-by: Masami Hiramatsu (Google) 
> ---

Reviewed-by: Steven Rostedt (Google) 

-- Steve

>  kernel/trace/trace_probe.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 217169de0920..dfe3ee6035ec 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -839,7 +839,7 @@ int traceprobe_get_entry_data_size(struct trace_probe *tp)
>  void store_trace_entry_data(void *edata, struct trace_probe *tp, struct 
> pt_regs *regs)
>  {
>   struct probe_entry_arg *earg = tp->entry_arg;
> - unsigned long val;
> + unsigned long val = 0;
>   int i;
>  
>   if (!earg)




Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-20 Thread Steven Rostedt
On Wed, 20 Mar 2024 12:44:23 +0900
Masami Hiramatsu (Google)  wrote:

> > > kernel/trace/trace_probe.c
> > > 846 return;
> > > 847 
> > > 848 for (i = 0; i < earg->size; i++) {
> > > 849 struct fetch_insn *code = >code[i];
> > > 850 
> > > 851 switch (code->op) {
> > > 852 case FETCH_OP_ARG:
> > > 853 val = regs_get_kernel_argument(regs, 
> > > code->param);
> > > 854 break;
> > > 855 case FETCH_OP_ST_EDATA:  
> > > --> 856 *(unsigned long *)((unsigned long)edata + 
> > > code->offset) = val;
> > > 
> > > Probably the earg->code[i] always has FETCH_OP_ARG before
> > > FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...  
> > 
> > Looks that way:
> > 
> > case FETCH_OP_END:
> > earg->code[i].op = FETCH_OP_ARG;
> > earg->code[i].param = argnum;
> > earg->code[i + 1].op = FETCH_OP_ST_EDATA;
> > earg->code[i + 1].offset = offset;
> > return offset;
> > 
> > But probably should still initialize val to zero or have a WARN_ON() if
> > that doesn't happen.  
> 
> OK, let's val = 0 in the store_trace_entry_data(), but WARN_ON() in this loop
> is a bit strange. I think we should have a verifiler.

Initializing to zero is fine.

-- Steve



Re: [bug report] tracing/probes: Support $argN in return probe (kprobe and fprobe)

2024-03-19 Thread Steven Rostedt
On Tue, 19 Mar 2024 10:19:09 +0300
Dan Carpenter  wrote:

> Hello Masami Hiramatsu (Google),
> 
> Commit 25f00e40ce79 ("tracing/probes: Support $argN in return probe
> (kprobe and fprobe)") from Mar 4, 2024 (linux-next), leads to the
> following Smatch static checker warning:
> 
>   kernel/trace/trace_probe.c:856 store_trace_entry_data()
>   error: uninitialized symbol 'val'.
> 
> kernel/trace/trace_probe.c
> 846 return;
> 847 
> 848 for (i = 0; i < earg->size; i++) {
> 849 struct fetch_insn *code = >code[i];
> 850 
> 851 switch (code->op) {
> 852 case FETCH_OP_ARG:
> 853 val = regs_get_kernel_argument(regs, 
> code->param);
> 854 break;
> 855 case FETCH_OP_ST_EDATA:
> --> 856 *(unsigned long *)((unsigned long)edata + 
> code->offset) = val;  
> 
> Probably the earg->code[i] always has FETCH_OP_ARG before
> FETCH_OP_ST_EDATA but Smatch isn't smart enough to figure that out...

Looks that way:

case FETCH_OP_END:
earg->code[i].op = FETCH_OP_ARG;
earg->code[i].param = argnum;
earg->code[i + 1].op = FETCH_OP_ST_EDATA;
earg->code[i + 1].offset = offset;
return offset;

But probably should still initialize val to zero or have a WARN_ON() if
that doesn't happen.

-- Steve


> 
> 857 break;
> 858 case FETCH_OP_END:
> 859 goto end;
> 860 default:
> 861 break;
> 862 }
> 863 }
> 864 end:
> 865 return;
> 866 }
> 
> regards,
> dan carpenter




Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-03-14 Thread Steven Rostedt
On Thu, 14 Mar 2024 09:57:57 -0700
Alison Schofield  wrote:

> On Fri, Feb 23, 2024 at 12:56:34PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > [
> >This is a treewide change. I will likely re-create this patch again in
> >the second week of the merge window of v6.9 and submit it then. Hoping
> >to keep the conflicts that it will cause to a minimum.
> > ]

Note, change of plans. I plan on sending this in the next merge window, as
this merge window I have this patch:

  
https://lore.kernel.org/linux-trace-kernel/20240312113002.00031...@gandalf.local.home/

That will warn if the source string of __string() is different than the
source string of __assign_str(). I want to make sure they are identical
before just dropping one of them.


> 
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index bdf117a33744..07ba4e033347 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h  
> 
> snip to poison
> 
> > @@ -668,8 +668,8 @@ TRACE_EVENT(cxl_poison,
> > ),
> >  
> > TP_fast_assign(
> > -   __assign_str(memdev, dev_name(>dev));
> > -   __assign_str(host, dev_name(cxlmd->dev.parent));
> > +   __assign_str(memdev);
> > +   __assign_str(host);  
> 
> I think I get that the above changes work because the TP_STRUCT__entry for
> these did:
>   __string(memdev, dev_name(>dev))
>   __string(host, dev_name(cxlmd->dev.parent))

That's the point. They have to be identical or you will likely bug.

The __string(name, src) is used to find the string length of src which
allocates the necessary length on the ring buffer. The __assign_str(name, src)
will copy src into the ring buffer.

Similar to:

len = strlen(src);
buf = malloc(len);
strcpy(buf, str);

Where __string() is strlen() and __assign_str() is strcpy(). It doesn't
make sense to use two different strings, and if you did, it would likely be
a bug.

But the magic behind __string() does much more than just get the length of
the string, and it could easily save the pointer to the string (along with
its length) and have it copy that in the __assign_str() call, making the
src parameter of __assign_str() useless.

> 
> > __entry->serial = cxlmd->cxlds->serial;
> > __entry->overflow_ts = cxl_poison_overflow(flags, overflow_ts);
> > __entry->dpa = cxl_poison_record_dpa(record);
> > @@ -678,12 +678,12 @@ TRACE_EVENT(cxl_poison,
> > __entry->trace_type = trace_type;
> > __entry->flags = flags;
> > if (region) {
> > -   __assign_str(region, dev_name(>dev));
> > +   __assign_str(region);
> > memcpy(__entry->uuid, >params.uuid, 16);
> > __entry->hpa = cxl_trace_hpa(region, cxlmd,
> >  __entry->dpa);
> > } else {
> > -   __assign_str(region, "");
> > +   __assign_str(region);
> > memset(__entry->uuid, 0, 16);
> > __entry->hpa = ULLONG_MAX;  
> 
> For the above 2, there was no helper in TP_STRUCT__entry. A recently
> posted patch is fixing that up to be __string(region, NULL) See [1],
> with the actual assignment still happening in TP_fast_assign.

__string(region, NULL) doesn't make sense. It's like:

len = strlen(NULL);
buf = malloc(len);
strcpy(buf, NULL);

??

I'll reply to that email.

-- Steve

> 
> Does that assign logic need to move to the TP_STRUCT__entry definition
> when you merge these changes? I'm not clear how much logic is able to be
> included, ie like 'C' style code in the TP_STRUCT__entry.
> 
> [1]
> https://lore.kernel.org/linux-cxl/20240314044301.2108650-1-alison.schofi...@intel.com/



Re: How to display a ktime value as trace timestamp in trace output?

2024-02-29 Thread Steven Rostedt
On Wed, 31 Jan 2024 14:47:31 +
David Howells  wrote:

> Hi Steven,

Hi David,

Sorry, I just noticed this email as it was buried in other unread emails :-p

> 
> I have a tracepoint in AF_RXRPC that displays information about a timeout I'm
> going to set.  I have the timeout in a ktime_t as an absolute time.  Is there
> a way to display this in the trace output such that it looks like a trace
> timestamp and can be (roughly) correlated with the displayed timestamps?

Have you tried the other clocks?

{ ktime_get_mono_fast_ns,   "mono", 1 },
{ ktime_get_raw_fast_ns,"mono_raw", 1 },
{ ktime_get_boot_fast_ns,   "boot", 1 },

The above are the functions used for the tracing timestamps.

-- Steve

> 
> I tried subtracting ktime_get_read() - ktime_get_boottime() from it and
> displaying the result, but it looked about one and a bit seconds out from the
> trace timestamp.
> 




Re: tprobe event tracing error

2024-02-28 Thread Steven Rostedt
On Wed, 28 Feb 2024 10:52:52 -0500
Steven Rostedt  wrote:

> The prototype of fchownat() is:
> 
>   int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags);
> 
> Where pathname is the third parameter, not the first, and mode is the third.

I meant pathname is the second parameter.

-- Steve



Re: tprobe event tracing error

2024-02-28 Thread Steven Rostedt
On Wed, 28 Feb 2024 17:25:40 +0300
Максим Морсков  wrote:

> Dear colleagues,
> One last question — is it bug or feature that trobe event tracing can not 
> correctly dereference string pointers from pt_regs?
> For example:
> echo 't:tmy_chmod sys_enter id=$arg2 filename=+8($arg1):string 
> mode=+16($arg1)'  | sudo tee ‘/sys/kernel/tracing/dynamic_events’

So the tprobe attaches to the tracepoint, which is this:

 trace_sys_enter(regs, syscall);

Where arg1 is pt_regs, which on x86_64 (I'm assuming that's what you are
using) has:

struct pt_regs {
/*
 * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
 * unless syscall needs a complete, fully filled "struct pt_regs".
 */
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long rbp;
unsigned long rbx;
/* These regs are callee-clobbered. Always saved on kernel entry. */
unsigned long r11;
unsigned long r10;
unsigned long r9;
unsigned long r8;
unsigned long rax;
unsigned long rcx;
unsigned long rdx;
unsigned long rsi;
unsigned long rdi;
/*
 * On syscall entry, this is syscall#. On CPU exception, this is error code.
 * On hw interrupt, it's IRQ number:
 */
unsigned long orig_rax;
/* Return frame for iretq */
unsigned long rip;
unsigned long cs;
unsigned long eflags;
unsigned long rsp;
unsigned long ss;
/* top of stack page */
};

Where regs+8 is register r14.  and regs+16 is r13. Is that what you really want?

No, it's not.

Also, I noticed that you are not tracing chmod, but you are tracing id = 268
which is fchownat() (I noticed via strace, that this is what "chmod" uses).

The prototype of fchownat() is:

  int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags);

Where pathname is the third parameter, not the first, and mode is the third.

The calling convention for x86_64 is:  rdi rsi rdx rcx r8-9 

That is, arg1 is in register rdi, arg2 is rsi, arg3 is rdx.

We want arguments 2 and 3. Which is:

  regs: $arg1
  regs->rsi:+104($arg1)
  regs->rdx:+96($arg1)

And since the file name is a string, we need to do one more dereference to
get to it:

  pathname: +0(+104($arg1)):ustring

(notice I used "ustring" as we now differentiate between kernel and user space)

With the above information I can do:

 # cd /sys/kernel/tracing
 # echo 't:tmy_chmod sys_enter id=$arg2 filename=+0(+104($arg1)):ustring 
mode=+96($arg1):x16' > dynamic_events
 # echo 'id == 268' > events/tracepoints/tmy_chmod/filter
 # echo 1 > events/tracepoints/tmy_chmod/enable
 # mkdir /tmp/x
 # chmod 100 /tmp/x
 # cat trace

# tracer: nop
#
# entries-in-buffer/entries-written: 2/2   #P:8
#
#_-=> irqs-off/BH-disabled
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / _-=> migrate-disable
#   / delay
#   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
#  | | |   | | |
   chmod-1035[004] ...1.  1744.492490: tmy_chmod: 
(__probestub_sys_enter+0x4/0x10) id=0x10c filename="/tmp/x" mode=0x40

  TADA!!!

-- Steve


> echo 'id == 268' | sudo tee 
> ‘/sys/kernel/tracing/events/tracepoints/tmy_chmod/filter’
> echo '1' | sudo tee ‘/sys/kernel/tracing/events/tracepoints/tmy_chmod/enable’
> echo ‘1’ | sudo tee ‘/sys/kernel/tracing/tracing_on’
> 
> cat ‘/sys/kernel/tracing/trace’
> #   TASK-PID CPU#  |  TIMESTAMP  FUNCTION 
> #  | | |   | | |
>   chmod-10522   [010] ...1.  8533.321703: tmy_chmod: 
> (__probestub_sys_enter+0x0/0x10) id=0x10c fd=0x81ed filename="" mode=0x1ed
>  
> The pointer is correct (it corresponds to kprobe event args), but dereference 
> never happens
>   



Re: tprobe event tracing error

2024-02-26 Thread Steven Rostedt
On Mon, 26 Feb 2024 23:41:56 +0900
Masami Hiramatsu (Google)  wrote:

> Hi,
> (Cc: linux-kernel-trace ML for sharing this knowledge)
> 
> On Mon, 26 Feb 2024 16:36:29 +0300
> Максим Морсков  wrote:
> 
> > 
> > Hello, dear Masami.
> > I am researching Linux event tracing subsystem in part of tprobes,
> > and found interesting behavior in kernel version 6.6:
> >  
> > echo  't:my_fchmodat sys_enter_fchmodat' | 
sudo tee ‘/sys/kernel/tracing/dynamic_events’
> > bash: line 1: echo: write error: Invalid argument  
> 
> Yeah, I understand that you are confused by this behavior, but it is
> actually expected behavior. syscalls:* events looks like trace events
> based on tracepoint, but those are software generated trace event.
> 
> You can find raw_syscalls:* trace events, that is based on the tracepoint,
> and other syscalls:* are based on that raw_syscalls:* trace points.
> (IOW, those are a kind of pre-compiled dynamic events)
> 
> e.g.
> 
> /sys/kernel/tracing # echo "t sys_enter \$arg*" >> dynamic_events 
> /sys/kernel/tracing # cat dynamic_events 
> t:tracepoints/sys_enter sys_enter regs=regs id=id
> 
> /sys/kernel/tracing # echo "t sys_enter_open \$arg*" >> dynamic_events 
> sh: write error: Invalid argument
> /sys/kernel/tracing # cat error_log 
> [  227.981347] trace_fprobe: error: Tracepoint is not found
>   Command: t sys_enter_open $arg*
>  ^
> 
> So, tprobe can not find the hard-coded tracepoints for those dynamically
> generated syscall trace events. But raw_syscall sys_enter/sys_exit are OK.

Ah, that's because "tprobes" are attached to tracepoints and not
trace events. If you want to attach to trace events, you need to use eprobes
(which I need to add documentation for!).

 # echo 'e:my_fchmodat syscalls/sys_enter_fchmodat' > dynamic_events

Works.

-- Steve



Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-23 Thread Steven Rostedt
On Fri, 23 Feb 2024 13:46:53 -0500
Steven Rostedt  wrote:

> Now one thing I could do is to not remove the parameter, but just add:
> 
>   WARN_ON_ONCE((src) != __data_offsets->item##_ptr_);
> 
> in the __assign_str() macro to make sure that it's still the same that is
> assigned. But I'm not sure how useful that is, and still causes burden to
> have it. I never really liked the passing of the string in two places to
> begin with.

Hmm, maybe I'll just add this patch for 6.9 and then in 6.10 do the
parameter removal.

-- Steve

diff --git a/include/trace/stages/stage6_event_callback.h 
b/include/trace/stages/stage6_event_callback.h
index 0c0f50bcdc56..7372e2c2a0c4 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -35,6 +35,7 @@ #define __assign_str(dst, src)
do {\
char *__str__ = __get_str(dst); \
int __len__ = __get_dynamic_array_len(dst) - 1; \
+   WARN_ON_ONCE((src) != __data_offsets.dst##_ptr_);   \
memcpy(__str__, __data_offsets.dst##_ptr_ ? :   \
   EVENT_NULL_STR, __len__);\
__str__[__len__] = '\0';\



Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-23 Thread Steven Rostedt
On Fri, 23 Feb 2024 14:50:49 -0500
Kent Overstreet  wrote:

> Tangentially related though, what would make me really happy is if we
> could create the string with in the TP__fast_assign() section. I have to
> have a bunch of annoying wrappers right now because the string length
> has to be known when we invoke the tracepoint.

You can use __string_len() to determine the string length in the tracepoint
(which is executed in the TP_fast_assign() section).

My clean up patches will make __assign_str_len() obsolete too (I'm working
on them now), and you can just use __assign_str().

I noticed that I don't have a string_len example in the sample code and I'm
actually writing it now.

// cutting out everything else:

TRACE_EVENT(foo_bar,

TP_PROTO(const char *foo, int bar),

TP_ARGS(foo, bar),

TP_STRUCT__entry(
__string_len(   lstr,   foo,bar < strlen(foo) ? bar : 
strlen(foo) )
),

TP_fast_assign(
__assign_str(lstr, foo);

// Note, the above is with my updates, without them, you need to duplicate the 
logic

//  __assign_str_len(lstr, foo, bar < strlen(foo) ? bar : 
strlen(foo));
),

TP_printk("%s", __get_str(lstr))
);


The above will allocate "bar < strlen(foo) ? bar : strlen(foo)" size on the
ring buffer. As the size is already stored, my clean up code uses that
instead of requiring duplicating the logic again.

-- Steve



Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-23 Thread Steven Rostedt
On Fri, 23 Feb 2024 10:30:45 -0800
Jeff Johnson  wrote:

> On 2/23/2024 9:56 AM, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > [
> >This is a treewide change. I will likely re-create this patch again in
> >the second week of the merge window of v6.9 and submit it then. Hoping
> >to keep the conflicts that it will cause to a minimum.
> > ]
> > 
> > With the rework of how the __string() handles dynamic strings where it
> > saves off the source string in field in the helper structure[1], the
> > assignment of that value to the trace event field is stored in the helper
> > value and does not need to be passed in again.  
> 
> Just curious if this could be done piecemeal by first changing the
> macros to be variadic macros which allows you to ignore the extra
> argument. The callers could then be modified in their separate trees.
> And then once all the callers have be merged, the macros could be
> changed to no longer be variadic.

I weighed doing that, but I think ripping off the band-aid is a better
approach. One thing I found is that leaving unused parameters in the macros
can cause bugs itself. I found one case doing my clean up, where an unused
parameter in one of the macros was bogus, and when I made it a used
parameter, it broke the build.

I think for tree-wide changes, the preferred approach is to do one big
patch at once. And since this only affects TRACE_EVENT() macros, it
hopefully would not be too much of a burden (although out of tree users may
suffer from this, but do we care?)

Now one thing I could do is to not remove the parameter, but just add:

WARN_ON_ONCE((src) != __data_offsets->item##_ptr_);

in the __assign_str() macro to make sure that it's still the same that is
assigned. But I'm not sure how useful that is, and still causes burden to
have it. I never really liked the passing of the string in two places to
begin with.

-- Steve



Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-02-23 Thread Steven Rostedt
On Fri, 23 Feb 2024 12:56:34 -0500
Steven Rostedt  wrote:

> Note, the same updates will need to be done for:
> 
>   __assign_str_len()
>   __assign_rel_str()
>   __assign_rel_str_len()

Correction: The below macros do not pass in their source to the entry
macros, so they will not need to be updated.

-- Steve

>   __assign_bitmask()
>   __assign_rel_bitmask()
>   __assign_cpumask()
>   __assign_rel_cpumask()




Re: [v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event

2024-02-01 Thread Steven Rostedt
On Thu,  1 Feb 2024 14:24:55 +0800
Andy Chiu  wrote:

> If the task happens to run after cpu hot-plug offline, then it would not
> be running in a percpu_thread. Instead, it would be re-queued into a
> UNBOUND workqueue. This would trigger a warning if we enable kernel
> preemption.
> 
> Signed-off-by: Andy Chiu 
> ---
>  kernel/trace/trace_hwlat.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index b791524a6536..87258ddc2141 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -511,7 +511,16 @@ static int start_cpu_kthread(unsigned int cpu)
>  static void hwlat_hotplug_workfn(struct work_struct *dummy)
>  {
>   struct trace_array *tr = hwlat_trace;
> - unsigned int cpu = smp_processor_id();
> + unsigned int cpu;
> +
> + /*
> +  * If the work is scheduled after CPU hotplug offline being invoked,
> +  * then it would be queued into UNBOUNDED workqueue

Is it due to a CPU going to online and then right back to offline? This
function is called by the online path.

From the queue_work_on() comment:

/**
 * queue_work_on - queue work on specific cpu
 * @cpu: CPU number to execute work on
 * @wq: workqueue to use
 * @work: work to queue
 *
 * We queue the work to a specific CPU, the caller must ensure it
 * can't go away.  Callers that fail to ensure that the specified
 * CPU cannot go away will execute on a randomly chosen CPU.
 * But note well that callers specifying a CPU that never has been
 * online will get a splat.
 *
 * Return: %false if @work was already on a queue, %true otherwise.
 */

So the work gets queued on the CPU but then the CPU immediately goes
offline. I wonder what happens if it comes back online?

The above states that if the work is already queued on a CPU it won't queue
it again. Hmm, reading the comments, I'm not sure you can run the same
workqueue on muliple CPUs. It looks to do only one and all the others will
fail.

I think we need to change this to a global work queue and use CPU masks.
Where in the *cpu_init() function, it just sets the current CPU bit in a
bit mask and calls the workqueue, and that workqueue is responsible for all
CPUs.

Then the workqueue function will call cpu_read_lock() and just iterate the
CPU mask starting the threads for each of the bits that are set and clear
the bit (possibly with __test_and_clear_bit).

I don't think the schedule_on_cpu() is doing what we think it should be
doing.


-- Steve


> +  */
> + if (!is_percpu_thread())
> + return;
> +
> + cpu = smp_processor_id();
>  
>   mutex_lock(_types_lock);
>   mutex_lock(_data.lock);




Re: [v1] trace/osnoise: prevent osnoise hotplog worker running in UNBOUND workqueue

2024-02-01 Thread Steven Rostedt
On Thu,  1 Feb 2024 14:18:45 +0800
Andy Chiu  wrote:

> smp_processor_id() should be called with migration disabled. This mean
> we may safely call smp_processor_id() in percpu thread. However, this is
> not the case if the work is (re-)queued into unbound workqueue, during
> cpu-hotplog. So, detect and return early if this work happens to run on
> an unbound wq.
> 

Yeah I triggered this too, but never made it a priority to fix.


> Signed-off-by: Andy Chiu 
> ---
>  kernel/trace/trace_osnoise.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index bd0d01d00fb9..cf7f716d3f35 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -2068,7 +2068,12 @@ static int start_per_cpu_kthreads(void)
>  #ifdef CONFIG_HOTPLUG_CPU
>  static void osnoise_hotplug_workfn(struct work_struct *dummy)
>  {
> - unsigned int cpu = smp_processor_id();
> + unsigned int cpu;
> +
> + if (!is_percpu_thread())
> + return;

But doesn't this then fail to register the CPU thread?

I wonder if this has some race with schedule_work_on() and hotplug?

-- Steve



> +
> + cpu = smp_processor_id();
>  
>   mutex_lock(_types_lock);
>  




Re: [PATCH] eventfs: Create dentries and inodes at dir open

2024-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2024 10:53:36 -0800
Linus Torvalds  wrote:

> Let's at least *try* to move towards a better and simpler world, in other
> words.

OK, but I did just finish the below. I'll save it for another time if the
single inode becomes an issue.

I cropped it to just 31 bits, so I'm not sure how much that would still
expose kernel addresses.

But as you prefer a single inode number, I'll submit that instead, and
store this as one of my many git branches that I hoard.

This patch would give me:

~# ls -i /sys/kernel/tracing/events
2024739183 9p   801480813 iocost281269778 qdisc
 401505785 alarmtimer   641794539 iomap1206040657 ras
1930290274 avc 1239754069 iommu 275816503 raw_syscalls
1108006611 block138722947 io_uring 1758073266 rcu
1528091656 bpf_test_run 694421747 ipi  1384703124 regmap
 557364529 bpf_trace  2765888 irq30507018 regulator
1351362737 cfg80211 369905250 irq_matrix   2078862821 resctrl
 886445085 cgroup  1115008967 irq_vectors   324090967 rpm
 796209754 clk 1448778784 jbd2 1141318882 rseq
 478739129 compaction99410253 kmem 1274783780 rtc
1714397712 cpuhp   2001779594 ksm  1409072807 sched
 720701943 csd   51728677 kyber 347239934 scsi
1824588347 dev 1507974437 libata   1768671172 sd
1640041299 devfreq 1421146927 lock 1167562598 signal
 121399632 dma_fence956825721 mac802114116590 skb
 975908383 drm  738868061 maple_tree   1435501164 smbus
1227060804 e1000e_trace 969175760 mce  1664441095 sock
1770307058 enable  1225375766 mdio 1634697993 spi
1372107864 error_report 744198394 migrate   556841757 swiotlb
1356665351 exceptions   602669807 mmap  400337480 syscalls
[..]

~# ls -i /sys/kernel/tracing/events/sched
1906992193 enable1210369853 sched_process_wait
 592505447 filter1443020725 sched_stat_blocked
1742488280 sched_kthread_stop1213185672 sched_stat_iowait
1674576234 sched_kthread_stop_ret1908510325 sched_stat_runtime
1999376743 sched_kthread_work_execute_end1570203600 sched_stat_sleep
1041533096 sched_kthread_work_execute_start   608113040 sched_stat_wait
 166824308 sched_kthread_work_queue_work 2033295833 sched_stick_numa
 492462616 sched_migrate_task1617500395 sched_swap_numa
1786868196 sched_move_numa   1865216679 sched_switch
 691687388 sched_pi_setprio  1465729401 sched_wait_task
1610977146 sched_process_exec 969941227 
sched_wake_idle_without_ipi
 610128037 sched_process_exit  84398030 sched_wakeup
2139284699 sched_process_fork 750220489 sched_wakeup_new
 169172804 sched_process_free1024064201 sched_waking

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index a0598eb3e26e..57448bf4acb4 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +33,9 @@
  */
 static DEFINE_MUTEX(eventfs_mutex);
 
+/* Used for making inode numbers */
+static siphash_key_t inode_key;
+
 /*
  * The eventfs_inode (ei) itself is protected by SRCU. It is released from
  * its parent's list and will have is_freed set (under eventfs_mutex).
@@ -57,6 +61,28 @@ static int eventfs_dir_open(struct inode *inode, struct file 
*file);
 static int eventfs_iterate(struct file *file, struct dir_context *ctx);
 static int eventfs_dir_release(struct inode *inode, struct file *file);
 
+/* Copied from scripts/kconfig/symbol.c */
+static unsigned strhash(const char *s)
+{
+   /* fnv32 hash */
+   unsigned hash = 2166136261U;
+   for (; *s; s++)
+   hash = (hash ^ *s) * 0x01000193;
+   return hash;
+}
+
+/* Just try to make something consistent and unique */
+static int eventfs_inode_ino(struct eventfs_inode *ei, const char *name)
+{
+   unsigned long sip = (unsigned long)ei;
+
+   sip += strhash(name);
+   sip = siphash_1u32((int)sip, _key);
+
+   /* keep it positive */
+   return sip & ((1U << 31) - 1);
+}
+
 static void update_attr(struct eventfs_attr *attr, struct iattr *iattr)
 {
unsigned int ia_valid = iattr->ia_valid;
@@ -491,6 +517,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
mutex_unlock(_mutex);
 
dentry = create_file(name, mode, attr, parent, data, fops);
+   if (!IS_ERR_OR_NULL(dentry))
+   dentry->d_inode->i_ino = eventfs_inode_ino(ei, name);
 
mutex_lock(_mutex);
 
@@ -585,6 +613,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct 
eventfs_inode *ei,
mutex_unlock(_mutex);
 
dentry = create_dir(ei, parent);
+   if (!IS_ERR_OR_NULL(dentry))
+ 

Re: [PATCH] eventfs: Create dentries and inodes at dir open

2024-01-16 Thread Steven Rostedt
On Tue, 16 Jan 2024 10:21:49 -0800
Linus Torvalds  wrote:

> Here's a clue: just fix your inode numbers.
> 
> I can think of many ways to do it. Here's a couple:
> 
>  - use a fixed inode number for all inodes. It's fine. Really. You might
> confuse some programs that still do getpwd() the legacy way, but hey,
> nobody cares
> 
>  - just put the inode number in the same data structure everything else is
>
> 
>  - make the inode number be a hash of the address of your data structure.
> That's actually the closest to a traditional "real" inode number, which is
> just an index to some on-disk thing
> 
> I'm sure there are other *trivial* solutions.
> 
> None of this is an excuse to misuse sentries.
> 
> Try the first one - one single inode number - first. You shouldn't be doing
> iget() anyway, so why do you care so deeply about a number that makes no
> sense and nobody should care about?

It was me being paranoid that using the same inode number would break user
space. If that is not a concern, then I'm happy to just make it either the
same, or maybe just hash the ei and name that it is associated with.

If I do not fully understand how something is used, I try hard to make it
act the same as it does for other use cases. That is, I did all this to
keep inodes unique and consistent because I did not know if it would break
something if I didn't.

Removing that requirement does make it much easier to implement readdir.

I think I'll do the hashing, just because I'm still paranoid that something
might still break if they are all the same.

-- Steve



Re: Unable to trace nf_nat function using kprobe with latest kernel 6.1.66-1

2024-01-02 Thread Steven Rostedt
On Tue, 2 Jan 2024 11:23:54 +0530
P K  wrote:

> Hi,
> 
> I am unable to trace nf_nat functions using kprobe with the latest kernel.
> Previously in kernel 6.1.55-1 it was working fine.
> Can someone please check if it's broken with the latest commit or  i
> have to use it differently?
> 

Note, attaching to kernel functions is never considered stable API and may
break at any kernel release.

Also, if the compiler decides to inline the function, and makes it no
longer visible in /proc/kallsyms then that too will cause this to break.

-- Steve


> Mentioned below are output:
> Kernel - 6.1.55-1
> / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> Attaching 1 probe...
> cannot attach kprobe, probe entry may not exist
> ERROR: Error attaching probe: 'kprobe:nf_nat_ipv4_manip_pkt'
> 
> 
> 
> Kernel 6.1.55-1
> / # bpftrace -e 'kprobe:nf_nat_ipv4_manip_pkt { printf("func called\n"); }'
> Attaching 1 probe...
> func called
> func called




Re: trace_event names with more than NAME_MAX chars

2023-12-09 Thread Steven Rostedt
On Fri, 8 Dec 2023 18:36:01 +
Beau Belgrave  wrote:

> While developing some unrelated features I happened to create a
> trace_event that was more than NAME_MAX (255) characters. When this
> happened the creation worked, but tracefs would hang any task that tried
> to list the directory of the trace_event or remove it.
> 
> I followed the code down to the reason being eventfs would call
> simple_lookup(), and if it failed, it would still try to create the
> dentry. In this case DCACHE_PAR_LOOKUP would get set and never cleared.
> This caused d_wait_lookup() to loop forever, since that flag is used in
> d_in_lookup().
> 
> Both tracefs and eventfs use simple_lookup() and it fails for
> dentries that exceed NAME_MAX. Should we even allow trace_events to
> be created that exceed this limit? Or should tracefs/eventfs allow
> this but somehow represent these differently?

Yeah I think it's easiest to just prevent dentries from exceeding
NAME_MAX, and yes the patch below should be added. Care to send a real
patch?

Thanks,

-- Steve


> 
> I have a fix that appears to work for myself, but unsure if there are
> other locations (attached at the end of this mail).
> 
> Thanks,
> -Beau
> 
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index f8a594a50ae6..d2c06ba26db4 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -561,6 +561,8 @@ static struct dentry *eventfs_root_lookup(struct inode 
> *dir,
> if (strcmp(ei_child->name, name) != 0)
> continue;
> ret = simple_lookup(dir, dentry, flags);
> +   if (IS_ERR(ret))
> +   goto out;
> create_dir_dentry(ei, ei_child, ei_dentry, true);
> created = true;
> break;
> @@ -583,6 +585,8 @@ static struct dentry *eventfs_root_lookup(struct inode 
> *dir,
> if (r <= 0)
> continue;
> ret = simple_lookup(dir, dentry, flags);
> +   if (IS_ERR(ret))
> +   goto out;
> create_file_dentry(ei, i, ei_dentry, name, mode, 
> cdata,
>fops, true);
> break;




Re: [PATCH net-next] tcp: add tracepoints for data send/recv/acked

2023-12-05 Thread Steven Rostedt
On Tue, 5 Dec 2023 20:39:28 +0100
Eric Dumazet  wrote:

> > So, we do not want to add some tracepoint to do some unknow debug.
> > We have a clear goal. debugging is just an incidental capability.
> >  
> 
> We have powerful mechanisms in the stack already that ordinary (no
> privilege requested) applications can readily use.
>

I'm not arguing for or against this patch set, but tracepoints are
available for other utilities that may have non privilege access. They are
not just for tracers.

-- Steve



Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint

2023-12-02 Thread Steven Rostedt
On Sun, 3 Dec 2023 10:33:32 +0900
Dominique Martinet  wrote:


> > TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> >   (unsigned long)__entry->clnt, 
> > show_9p_op(__entry->type),
> >   __entry->tag, 0, __get_dynamic_array(line), 16,
> >   __get_dynamic_array(line) + 16)  
> 
> This was just printing garbage in the previous version but %16ph with a
> dynamic alloc would be out of range (even the start of the next buffer,
> _get_dynamic_array(line) + 16, can be out of range)
> 
> Also, for custom tracepoints e.g. bpftrace the program needs to know how
> many bytes can be read safely even if it's just for dumping -- unless
> dynamic_array is a "fat pointer" that conveys its own size?
> (Sorry didn't take the time to check)

Yes, there's also a __get_dynamic_array_len(line) that will return the
allocated length of the line. Is that what you need?

-- Steve





Re: [PATCH] 9p: prevent read overrun in protocol dump tracepoint

2023-12-02 Thread Steven Rostedt
On Sat, 02 Dec 2023 14:05:24 +0100
Christian Schoenebeck  wrote:

> > > --- a/include/trace/events/9p.h
> > > +++ b/include/trace/events/9p.h
> > > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
> > >   __entry->clnt   =  clnt;
> > >   __entry->type   =  pdu->id;
> > >   __entry->tag=  pdu->tag;
> > > - memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> > > + memcpy(__entry->line, pdu->sdata,
> > > + min(pdu->capacity, P9_PROTO_DUMP_SZ));
> > >   ),
> > >   TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> > > (unsigned long)__entry->clnt, show_9p_op(__entry->type),  
> 
> AFAICS __entry is a local variable on stack, and array __entry->line not
> intialized with zeros, i.e. the dump would contain trash at the end. Maybe
> prepending memset() before memcpy()?

__entry is a macro that points into the ring buffer that gets allocated
before this is called. TRACE_EVENT() has a __dynamic_array() field that
can handle variable length arrays. What you can do is turn this into
something like:

TRACE_EVENT(9p_protocol_dump,
TP_PROTO(struct p9_client *clnt, struct p9_fcall *pdu),

TP_ARGS(clnt, pdu),

TP_STRUCT__entry(
__field(void *, clnt
)
__field(__u8,   type
)
__field(__u16,  tag 
)
__dynamic_array(unsigned char,  line, min(pdu->capacity, 
P9_PROTO_DUMP_SZ) )
),

TP_fast_assign(
__entry->clnt   =  clnt;
__entry->type   =  pdu->id;
__entry->tag=  pdu->tag;
memcpy(__get_dynamic_array(line), pdu->sdata,
   min(pdu->capacity, P9_PROTO_DUMP_SZ));
),
TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
  (unsigned long)__entry->clnt, show_9p_op(__entry->type),
  __entry->tag, 0, __get_dynamic_array(line), 16,
  __get_dynamic_array(line) + 16)
 );

-- Steve



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Steven Rostedt
On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:

> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.  
> 
> Thanks for the assistance. The gap between the stack frame depends on 
> the function. Most do not have a gap. Some have 8 (as shown above), some 
> have 12. A single assumption here is not going to work. I'm having a 
> hard time finding out the reasoning for this gap. I tried disabling a 
> bunch of gcc flags as well as -O2 and the gap still exists.

That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

  71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve



[PATCH v4] libtracecmd: Use an rbtree for mapping of cache pages

2023-11-28 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

I was loading a very large trace.dat file into kernelshark when it just
stopped near the end off the load and hung there for a very long time. I
ran it under gdb and hit Ctrl^C when it hit the hang to see what it was
doing and found that it was spinning in:

  libtracecmd trace-input.c: free_zpage

static void free_zpage(struct cpu_data *cpu_data, void *map)
{
struct zchunk_cache *cache;

list_for_each_entry(cache, _data->compress.cache, list) {
if (map <= cache->map && map > (cache->map + 
cache->chunk->size))
goto found;
}
return;

found:

It seems that there can be over 10 thousand cache pages in that link
list and that kernelshark is constantly hitting that. So it's doing a
linear search for the mapped page to free it.

I ran:  time kernelshark trace.dat

And exited out when it finished loading and the result was:

real6m14.772s
user6m0.649s
sys 0m12.718s

That's over 6 minutes to load the trace.dat file!!!

I ran perf record on it and it showed 77% of the time was in free_zpage().

I pulled out my old algorithms book and wrote up a rbtree for internal use
of libtracecmd. Then I switched the cache into a binary rbtree to do the
look ups. As the lookups used both where the memory of the compressed page
is mapped as well as the offset depending on how the search was done, I
found that it only used the memory allocation address in one location.
Luckily, the memory allocation mapping lookup also had access to the
offset of the file the memory represented. That allowed me to make all
lookups use the file offset.

After converting the cache to an rbtree lookup, I ran kernelshark again on
opening that file and exited out as soon as it finished loading and the
timings was:

real1m22.356s
user1m10.532s
sys 0m10.901s

Still a bit long, but it dropped from over 6 minutes to under 1 1/2
minutes. Also, free_zpages() was no longer in the perf record output.

Running the same file under trace-cmd produced:

Without this change:

  $ time trace-cmd report trace.dat > /dev/null
 real9m20.390s
 user9m16.391s
 sys 0m3.529s

With this change:

  $ time trace-cmd report trace.dat > /dev/null
 real6m22.935s
 user6m19.537s
 sys 0m3.139s

Not as drastic as the KernelShark speedup, but still brings it down by a
third.

Signed-off-by: Steven Rostedt (Google) 
---
Changse since v3: 
https://lore.kernel.org/all/20231019165205.371e8...@gandalf.local.home/

- Check for node not NULL before checking node->left

Changes since v2: 
https://lore.kernel.org/all/20231017104307.662aa...@gandalf.local.home/

 - Fixed a few more bugs
 - Added a nice little iterator (although not used)
 - Added a "check_tree()" routine in case I need to debug it again
   This is #if out but nice to keep in the code anyway.

Changes since v1: 
https://lore.kernel.org/linux-trace-devel/20231016230058.2f0e9...@gandalf.local.home/

- Removed some leftover debug
- Fix deletion in setting the color of the node that replaces the deleted
  node.


 lib/trace-cmd/Makefile   |   1 +
 lib/trace-cmd/include/private/trace-rbtree.h |  34 ++
 lib/trace-cmd/trace-input.c  |  81 +++-
 lib/trace-cmd/trace-rbtree.c | 440 +++
 4 files changed, 533 insertions(+), 23 deletions(-)
 create mode 100644 lib/trace-cmd/include/private/trace-rbtree.h
 create mode 100644 lib/trace-cmd/trace-rbtree.c

diff --git a/lib/trace-cmd/Makefile b/lib/trace-cmd/Makefile
index e9d26b2bb367..aba6fda5b0f6 100644
--- a/lib/trace-cmd/Makefile
+++ b/lib/trace-cmd/Makefile
@@ -9,6 +9,7 @@ DEFAULT_TARGET = $(LIBTRACECMD_STATIC)
 
 OBJS =
 OBJS += trace-hash.o
+OBJS += trace-rbtree.o
 OBJS += trace-hooks.o
 OBJS += trace-input.o
 OBJS += trace-output.o
diff --git a/lib/trace-cmd/include/private/trace-rbtree.h 
b/lib/trace-cmd/include/private/trace-rbtree.h
new file mode 100644
index ..822111998e7a
--- /dev/null
+++ b/lib/trace-cmd/include/private/trace-rbtree.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: LGPL-2.1 */
+/*
+ * Copyright (C) 2023 Google, Steven Rostedt 
+ *
+ */
+#ifndef _TRACE_RBTREE_H
+#define _TRACE_RBTREE_H
+
+struct trace_rbtree_node {
+   struct trace_rbtree_node*parent;
+   struct trace_rbtree_node*left;
+   struct trace_rbtree_node*right;
+   int color;
+};
+
+typedef int (*trace_rbtree_cmp_fn)(const struct trace_rbtree_node *A, const 
struct trace_rbtree_node *B);
+
+typedef int (*trace_rbtree_search_fn)(const struct trace_rbtree_node *n, const 
void *data);
+
+struct trace_rbtree {
+   struct trace_rbtree_node*node;
+   trace_rbtree_search_fn  search;
+   trace_rbtree_cmp_fn cmp;
+   size_t  nr_nodes;
+};
+
+void trace_rbtree_

Re: [PATCH v5 0/2] Return EADDRNOTAVAIL when func matches several symbols during kprobe creation

2023-10-19 Thread Steven Rostedt
On Thu, 19 Oct 2023 21:18:43 +0900
Masami Hiramatsu (Google)  wrote:

> > So why is this adding stable? (and as Greg's form letter states, that's not
> > how you do that)
> > 
> > I don't see this as a fix but a new feature.  
> 
> I asked him to make this a fix since the current kprobe event' behavior is
> somewhat strange. It puts the probe on only the "first symbol" if user
> specifies a symbol name which has multiple instances. In this case, the
> actual probe address can not be solved by name. User must specify the
> probe address by unique name + offset. Unless, it can put a probe on
> unexpected address, especially if it specifies non-unique symbol + offset,
> the address may NOT be the instruction boundary.
> To avoid this issue, it should check the given symbol is unique.
>

OK, so what is broken is that when you add a probe to a function that has
multiple names, it will attach to the first one and not necessarily the one
you want.

The change log needs to be more explicit in what the "bug" is. It does
state this in a round about way, but it is written in a way that it doesn't
stand out.

Previously to this commit, if func matches several symbols, a kprobe,
being either sysfs or PMU, would only be installed for the first
matching address. This could lead to some misunderstanding when some
BPF code was never called because it was attached to a function which
was indeed not called, because the effectively called one has no
kprobes attached.

So, this commit returns EADDRNOTAVAIL when func matches several
symbols. This way, user needs to use address to remove the ambiguity.


What it should say is:

When a kprobe is attached to a function that's name is not unique (is
static and shares the name with other functions in the kernel), the
kprobe is attached to the first function it finds. This is a bug as the
function that it is attaching to is not necessarily the one that the
user wants to attach to.

Instead of blindly picking a function to attach to what is ambiguous,
error with EADDRNOTAVAIL to let the user know that this function is not
unique, and that the user must use another unique function with an
address offset to get to the function they want to attach to.

And yes, it should have:

Cc: sta...@vger.kernel.org

which is how to mark something for stable, and

Fixes: ...

To the commit that caused the bug.

-- Steve



Re: [PATCH v5 0/2] Return EADDRNOTAVAIL when func matches several symbols during kprobe creation

2023-10-18 Thread Steven Rostedt
On Wed, 18 Oct 2023 17:40:28 +0300
Francis Laniel  wrote:

> Changes since:
>  v1:
>   * Use EADDRNOTAVAIL instead of adding a new error code.
>   * Correct also this behavior for sysfs kprobe.
>  v2:
>   * Count the number of symbols corresponding to function name and return
>   EADDRNOTAVAIL if higher than 1.
>   * Return ENOENT if above count is 0, as it would be returned later by while
>   registering the kprobe.
>  v3:
>   * Check symbol does not contain ':' before testing its uniqueness.
>   * Add a selftest to check this is not possible to install a kprobe for a non
>   unique symbol.
>  v5:
>   * No changes, just add linux-stable as recipient.

So why is this adding stable? (and as Greg's form letter states, that's not
how you do that)

I don't see this as a fix but a new feature.

-- Steve



Re: [PATCH 3/8] usb: udc: trace: reduce buffer usage of trace event

2023-09-14 Thread Steven Rostedt
On Fri, 15 Sep 2023 09:11:06 +0800
Linyu Yuan  wrote:

> >> +  snprintf(__s, 9, "ep%d%s", te.address, \
> >> +  (te.caps.dir_in && te.caps.dir_out) ? "" : \
> >> +  te.caps.dir_in ? "in" : "out");  
> > Note, there's a temp buffer trace_seq 'p' available for use as well. See
> > both include/trace/events/libata.h and include/trace/events/scsi.h:
> >
> >const char *libata_trace_parse_status(struct trace_seq*, unsigned char);
> >#define __parse_status(s) libata_trace_parse_status(p, s)
> >
> > I think that can be used instead of adding this TP_printk_init().  
> 
> 
> the reason add TP_printk_init() because when i first design some macro 
> which not
> 
> related to tracepoint,  it use too much stack.
> 

Not sure what you mean about 'uses too much stack'. This is called by
the reading code and not some arbitrary location, and the above macros
are done in the same location as your "init" call, so I'm not sure how
that makes a difference on the stack.

> 
> but i think  TP_printk_init()  is good as it following most common way 
> to print.
> 

I really do not want to add more versions of TRACE_EVENT() that I need
to maintain unless there is a really good reason to do so.

And I really don't want to encourage the use of a "TP_printk_init()"
because that just encourages more use cases that will make it hard for
user space to parse the TP_printk().

-- Steve


Re: [PATCH 3/8] usb: udc: trace: reduce buffer usage of trace event

2023-09-14 Thread Steven Rostedt
On Thu, 14 Sep 2023 18:02:57 +0800
Linyu Yuan  wrote:

> Save u32 members into trace event ring buffer and parse it for possible
> bit fields.
> 
> Use new DECLARE_EVENT_CLASS_PRINT_INIT() class macro for output stage.
> 
> Signed-off-by: Linyu Yuan 
> ---
>  drivers/usb/gadget/udc/trace.h | 154 +++--
>  1 file changed, 69 insertions(+), 85 deletions(-)
> 
> diff --git a/drivers/usb/gadget/udc/trace.h b/drivers/usb/gadget/udc/trace.h
> index a5ed26fbc2da..e1754667f1d2 100644
> --- a/drivers/usb/gadget/udc/trace.h
> +++ b/drivers/usb/gadget/udc/trace.h
> @@ -17,7 +17,7 @@
>  #include 
>  #include 
>  
> -DECLARE_EVENT_CLASS(udc_log_gadget,
> +DECLARE_EVENT_CLASS_PRINT_INIT(udc_log_gadget,
>   TP_PROTO(struct usb_gadget *g, int ret),
>   TP_ARGS(g, ret),
>   TP_STRUCT__entry(
> @@ -25,20 +25,7 @@ DECLARE_EVENT_CLASS(udc_log_gadget,
>   __field(enum usb_device_speed, max_speed)
>   __field(enum usb_device_state, state)
>   __field(unsigned, mA)
> - __field(unsigned, sg_supported)
> - __field(unsigned, is_otg)
> - __field(unsigned, is_a_peripheral)
> - __field(unsigned, b_hnp_enable)
> - __field(unsigned, a_hnp_support)
> - __field(unsigned, hnp_polling_support)
> - __field(unsigned, host_request_flag)
> - __field(unsigned, quirk_ep_out_aligned_size)
> - __field(unsigned, quirk_altset_not_supp)
> - __field(unsigned, quirk_stall_not_supp)
> - __field(unsigned, quirk_zlp_not_supp)
> - __field(unsigned, is_selfpowered)
> - __field(unsigned, deactivated)
> - __field(unsigned, connected)
> + __field(u32, gdw1)
>   __field(int, ret)
>   ),
>   TP_fast_assign(
> @@ -46,39 +33,35 @@ DECLARE_EVENT_CLASS(udc_log_gadget,
>   __entry->max_speed = g->max_speed;
>   __entry->state = g->state;
>   __entry->mA = g->mA;
> - __entry->sg_supported = g->sg_supported;
> - __entry->is_otg = g->is_otg;
> - __entry->is_a_peripheral = g->is_a_peripheral;
> - __entry->b_hnp_enable = g->b_hnp_enable;
> - __entry->a_hnp_support = g->a_hnp_support;
> - __entry->hnp_polling_support = g->hnp_polling_support;
> - __entry->host_request_flag = g->host_request_flag;
> - __entry->quirk_ep_out_aligned_size = 
> g->quirk_ep_out_aligned_size;
> - __entry->quirk_altset_not_supp = g->quirk_altset_not_supp;
> - __entry->quirk_stall_not_supp = g->quirk_stall_not_supp;
> - __entry->quirk_zlp_not_supp = g->quirk_zlp_not_supp;
> - __entry->is_selfpowered = g->is_selfpowered;
> - __entry->deactivated = g->deactivated;
> - __entry->connected = g->connected;
> + __entry->gdw1 = g->dw1;
>   __entry->ret = ret;
>   ),
> - TP_printk("speed %d/%d state %d %dmA [%s%s%s%s%s%s%s%s%s%s%s%s%s%s] --> 
> %d",
> + TP_printk("speed %d/%d state %d %dmA 
> [%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s] --> %d",
>   __entry->speed, __entry->max_speed, __entry->state, __entry->mA,
> - __entry->sg_supported ? "sg:" : "",
> - __entry->is_otg ? "OTG:" : "",
> - __entry->is_a_peripheral ? "a_peripheral:" : "",
> - __entry->b_hnp_enable ? "b_hnp:" : "",
> - __entry->a_hnp_support ? "a_hnp:" : "",
> - __entry->hnp_polling_support ? "hnp_poll:" : "",
> - __entry->host_request_flag ? "hostreq:" : "",
> - __entry->quirk_ep_out_aligned_size ? "out_aligned:" : "",
> - __entry->quirk_altset_not_supp ? "no_altset:" : "",
> - __entry->quirk_stall_not_supp ? "no_stall:" : "",
> - __entry->quirk_zlp_not_supp ? "no_zlp" : "",
> - __entry->is_selfpowered ? "self-powered:" : "bus-powered:",
> - __entry->deactivated ? "deactivated:" : "activated:",
> - __entry->connected ? "connected" : "disconnected",
> - __entry->ret)
> + tg.sg_supported ? "sg:" : "",
> + tg.is_otg ? "OTG:" : "",
> + tg.is_a_peripheral ? "a_peripheral:" : "",
> + tg.b_hnp_enable ? "b_hnp:" : "",
> + tg.a_hnp_support ? "a_hnp:" : "",
> + tg.a_alt_hnp_support ? "a_alt_hnp:" : "",
> + tg.hnp_polling_support ? "hnp_poll:" : "",
> + tg.host_request_flag ? "hostreq:" : "",
> + tg.quirk_ep_out_aligned_size ? "out_aligned:" : "",
> + tg.quirk_altset_not_supp ? "no_altset:" : "",
> + tg.quirk_stall_not_supp ? "no_stall:" : "",
> + tg.quirk_zlp_not_supp ? "no_zlp" : "",
> + tg.quirk_avoids_skb_reserve ? "no_skb_reserve" : "",
> + tg.is_selfpowered ? "self-powered:" : 

Re: [PATCH 0/8] usb: gadget: reduce usb gadget trace event buffer usage

2023-09-14 Thread Steven Rostedt
On Thu, 14 Sep 2023 18:02:54 +0800
Linyu Yuan  wrote:

> some trace event use an interger to to save a bit field info of gadget,
> also some trace save endpoint name in string forat, it all can be
> chagned to other way at trace event store phase.
> 
> bit field can be replace with a union interger member which include
> multiple bit fields.
> 
> ep name stringe can be replace to a interger which contaion number
> and dir info.
> 
> to allow trace output stage can get bit info from save interger,
> add DECLARE_EVENT_CLASS_PRINT_INIT() clas which allow user defined
> operation before print.
> 
> v1: 
> https://lore.kernel.org/linux-usb/20230911042843.2711-1-quic_linyy...@quicinc.com/
> v2: fix two compile issues that COMPILE_TEST not covered
> 
> https://lore.kernel.org/linux-usb/2023092446.1791-1-quic_linyy...@quicinc.com/
> v3: fix reviewer comments, allow bit fields work on both little and big endian
> 
> https://lore.kernel.org/linux-usb/20230912104455.7737-1-quic_linyy...@quicinc.com/
> v4: add DECLARE_EVENT_CLASS_PRINT_INIT() new trace class and use it
> 

All these changes make it useless for user space. :-(

-- Steve

> Linyu Yuan (8):
>   trace: add new DECLARE_EVENT_CLASS_PRINT_INIT class type
>   usb: gadget: add anonymous definition in some struct for trace purpose
>   usb: udc: trace: reduce buffer usage of trace event
>   usb: cdns3: trace: reduce buffer usage of trace event
>   usb: dwc3: trace: reduce buffer usage of trace event
>   usb: cdns2: trace: reduce buffer usage of trace event
>   usb: mtu3: trace: reduce buffer usage of trace event
>   usb: musb: trace: reduce buffer usage of trace event
> 
>  drivers/usb/cdns3/cdns3-trace.h| 201 ++---
>  drivers/usb/cdns3/cdnsp-trace.h| 105 +++
>  drivers/usb/dwc3/trace.h   |  99 ++
>  drivers/usb/gadget/udc/cdns2/cdns2-trace.h | 175 --
>  drivers/usb/gadget/udc/trace.h | 154 +++-
>  drivers/usb/mtu3/mtu3_trace.h  |  76 +---
>  drivers/usb/musb/musb_trace.h  |  20 +-
>  include/linux/tracepoint.h |  22 +++
>  include/linux/usb/gadget.h | 113 +++-
>  include/trace/bpf_probe.h  |   4 +
>  include/trace/perf.h   |  43 +
>  include/trace/stages/stage3_trace_output.h |   3 +
>  include/trace/trace_events.h   | 118 
>  13 files changed, 784 insertions(+), 349 deletions(-)
> 



Re: [PATCH v5] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()

2023-09-12 Thread Steven Rostedt
On Tue, 12 Sep 2023 21:47:52 +0800
Jinjie Ruan  wrote:

> v5:
> - Move the ef below dir to keep the "upside-down x-mas tree" format.

That was quick!

I'll add it to my queue and start testing it.

Thanks Ruan!

-- Steve


Re: [PATCH v4] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()

2023-09-12 Thread Steven Rostedt
On Tue, 12 Sep 2023 10:58:08 +0800
Jinjie Ruan  wrote:

This looks good. But I have a small nit.

> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index ed367d713be0..8e0593d4c6a6 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2296,6 +2296,7 @@ event_subsystem_dir(struct trace_array *tr, const char 
> *name,
>   struct trace_event_file *file, struct dentry *parent)
>  {
>   struct event_subsystem *system, *iter;
> + struct eventfs_file *ef;
>   struct trace_subsystem_dir *dir;
>   int res;
>  

Can you move the ef below dir to keep the "upside-down x-mas tree" format:

struct event_subsystem *system, *iter;
struct trace_subsystem_dir *dir;
struct eventfs_file *ef;
int res;

That's easier to read.

Thanks!

-- Steve


Re: [PATCH v3] eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()

2023-09-11 Thread Steven Rostedt
On Mon, 11 Sep 2023 20:51:25 +0900
Masami Hiramatsu (Google)  wrote:

> Instead,
> 
>   ef = eventfs_add_subsystem_dir(name, parent);
>   if (IS_ERR(ef)) {
>   ...
>   } else
>   dir->ef = ef;


Note, as the error has a goto out_free, it just needs to be:

if (IS_ERR(ef)) {
...
goto out_free;
}

dir->ef = ef;

And for event_create_dir()

ef = eventfs_add_dir(name, ef_subsystem);
if (IS_ERR(ef)) {
...
return -1;
}

file->ef = ef;

-- Steve