Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/9 22:28, Steven Rostedt wrote: On Mon, 9 Oct 2023 18:58:27 +0800 Yajun Deng wrote: C compiler decides to inline or not, depending on various factors. The most efficient (and small) code is generated by this_cpu_inc() version, allowing the compiler to inline it. If you copy/paste this_cpu_inc() twenty times, then the compiler would not inline the function anymore. Yes, if you want something to be visible by ftrace, it must not be inlined (as inlined functions are not function calls by definition). And as Eric stated, the compiler is perfectly allowed to inline something if it believes it will be more efficient. i.e. There may be code around the function call that could be more efficient if it wasn't change to parameters. If you want to make sure a function stays out of line, you must explicitly tell the compiler you want the function not to ever be inlined (hence the "noinline" attribute). Thanks for the details. Got it. Thank you. Great. -- Steve
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On Mon, 9 Oct 2023 18:58:27 +0800 Yajun Deng wrote: > > C compiler decides to inline or not, depending on various factors. > > > > The most efficient (and small) code is generated by this_cpu_inc() > > version, allowing the compiler to inline it. > > > > If you copy/paste this_cpu_inc() twenty times, then the compiler > > would not inline the function anymore. Yes, if you want something to be visible by ftrace, it must not be inlined (as inlined functions are not function calls by definition). And as Eric stated, the compiler is perfectly allowed to inline something if it believes it will be more efficient. i.e. There may be code around the function call that could be more efficient if it wasn't change to parameters. If you want to make sure a function stays out of line, you must explicitly tell the compiler you want the function not to ever be inlined (hence the "noinline" attribute). > > > Got it. Thank you. Great. -- Steve
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/9 18:16, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng wrote: On 2023/10/9 17:30, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng wrote: On 2023/10/9 16:20, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: On 2023/10/9 15:53, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make the trace work well. They all have 'pop' instructions in them. This may be the key to making the trace work well. Hi all, I need your help on percpu and ftrace. I do not think you made sure netdev_core_stats_inc() was never inlined. Adding more code in it is simply changing how the compiler decides to inline or not. Yes, you are right. It needs to add the 'noinline' prefix. The disassembly code will have 'pop' instruction. The function was fine, you do not need anything like push or pop. The only needed stuff was the call __fentry__. The fact that the function was inlined for some invocations was the issue, because the trace point is only planted in the out of line function. But somehow the following code isn't inline? They didn't need to add the 'noinline' prefix. + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); Or + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; I think you are very confused. You only want to trace netdev_core_stats_inc() entry point, not arbitrary pieces of it. Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace + field = (__force unsigned long __percpu *)((__force void *)p + offset); + this_cpu_inc(*field); with + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); Or + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; The netdev_core_stats_inc() entry point will work fine even if it doesn't have 'noinline' prefix. I don't know why this code needs to add 'noinline' prefix. + field = (__force unsigned long __percpu *)((__force void *)p + offset); + this_cpu_inc(*field); C compiler decides to inline or not, depending on various factors. The most efficient (and small) code is generated by this_cpu_inc() version, allowing the compiler to inline it. If you copy/paste this_cpu_inc() twenty times, then the compiler would not inline the function anymore. Got it. Thank you.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng wrote: > > > On 2023/10/9 17:30, Eric Dumazet wrote: > > On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng wrote: > >> > >> On 2023/10/9 16:20, Eric Dumazet wrote: > >>> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: > On 2023/10/9 15:53, Eric Dumazet wrote: > > On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: > > > >> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make > >> the trace work well. > >> > >> They all have 'pop' instructions in them. This may be the key to making > >> the trace work well. > >> > >> Hi all, > >> > >> I need your help on percpu and ftrace. > >> > > I do not think you made sure netdev_core_stats_inc() was never inlined. > > > > Adding more code in it is simply changing how the compiler decides to > > inline or not. > Yes, you are right. It needs to add the 'noinline' prefix. The > disassembly code will have 'pop' > > instruction. > > >>> The function was fine, you do not need anything like push or pop. > >>> > >>> The only needed stuff was the call __fentry__. > >>> > >>> The fact that the function was inlined for some invocations was the > >>> issue, because the trace point > >>> is only planted in the out of line function. > >> > >> But somehow the following code isn't inline? They didn't need to add the > >> 'noinline' prefix. > >> > >> + field = (unsigned long *)((void *)this_cpu_ptr(p) + > >> offset); > >> + WRITE_ONCE(*field, READ_ONCE(*field) + 1); > >> > >> Or > >> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > >> > > I think you are very confused. > > > > You only want to trace netdev_core_stats_inc() entry point, not > > arbitrary pieces of it. > > > Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace > > + field = (__force unsigned long > __percpu *)((__force void *)p + offset); > + this_cpu_inc(*field); > > with > > + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); > + WRITE_ONCE(*field, READ_ONCE(*field) + 1); > > Or > + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > > The netdev_core_stats_inc() entry point will work fine even if it doesn't > have 'noinline' prefix. > > I don't know why this code needs to add 'noinline' prefix. > + field = (__force unsigned long __percpu *)((__force void *)p > + offset); > + this_cpu_inc(*field); > C compiler decides to inline or not, depending on various factors. The most efficient (and small) code is generated by this_cpu_inc() version, allowing the compiler to inline it. If you copy/paste this_cpu_inc() twenty times, then the compiler would not inline the function anymore.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/9 17:30, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng wrote: On 2023/10/9 16:20, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: On 2023/10/9 15:53, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make the trace work well. They all have 'pop' instructions in them. This may be the key to making the trace work well. Hi all, I need your help on percpu and ftrace. I do not think you made sure netdev_core_stats_inc() was never inlined. Adding more code in it is simply changing how the compiler decides to inline or not. Yes, you are right. It needs to add the 'noinline' prefix. The disassembly code will have 'pop' instruction. The function was fine, you do not need anything like push or pop. The only needed stuff was the call __fentry__. The fact that the function was inlined for some invocations was the issue, because the trace point is only planted in the out of line function. But somehow the following code isn't inline? They didn't need to add the 'noinline' prefix. + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); Or + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; I think you are very confused. You only want to trace netdev_core_stats_inc() entry point, not arbitrary pieces of it. Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace + field = (__force unsigned long __percpu *)((__force void *)p + offset); + this_cpu_inc(*field); with + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); Or + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; The netdev_core_stats_inc() entry point will work fine even if it doesn't have 'noinline' prefix. I don't know why this code needs to add 'noinline' prefix. + field = (__force unsigned long __percpu *)((__force void *)p + offset); + this_cpu_inc(*field);
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng wrote: > > > On 2023/10/9 16:20, Eric Dumazet wrote: > > On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: > >> > >> On 2023/10/9 15:53, Eric Dumazet wrote: > >>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: > >>> > 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make > the trace work well. > > They all have 'pop' instructions in them. This may be the key to making > the trace work well. > > Hi all, > > I need your help on percpu and ftrace. > > >>> I do not think you made sure netdev_core_stats_inc() was never inlined. > >>> > >>> Adding more code in it is simply changing how the compiler decides to > >>> inline or not. > >> > >> Yes, you are right. It needs to add the 'noinline' prefix. The > >> disassembly code will have 'pop' > >> > >> instruction. > >> > > The function was fine, you do not need anything like push or pop. > > > > The only needed stuff was the call __fentry__. > > > > The fact that the function was inlined for some invocations was the > > issue, because the trace point > > is only planted in the out of line function. > > > But somehow the following code isn't inline? They didn't need to add the > 'noinline' prefix. > > + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); > + WRITE_ONCE(*field, READ_ONCE(*field) + 1); > > Or > + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > I think you are very confused. You only want to trace netdev_core_stats_inc() entry point, not arbitrary pieces of it.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/9 16:20, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: On 2023/10/9 15:53, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make the trace work well. They all have 'pop' instructions in them. This may be the key to making the trace work well. Hi all, I need your help on percpu and ftrace. I do not think you made sure netdev_core_stats_inc() was never inlined. Adding more code in it is simply changing how the compiler decides to inline or not. Yes, you are right. It needs to add the 'noinline' prefix. The disassembly code will have 'pop' instruction. The function was fine, you do not need anything like push or pop. The only needed stuff was the call __fentry__. The fact that the function was inlined for some invocations was the issue, because the trace point is only planted in the out of line function. But somehow the following code isn't inline? They didn't need to add the 'noinline' prefix. + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); Or + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng wrote: > > > On 2023/10/9 15:53, Eric Dumazet wrote: > > On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: > > > >> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make > >> the trace work well. > >> > >> They all have 'pop' instructions in them. This may be the key to making > >> the trace work well. > >> > >> Hi all, > >> > >> I need your help on percpu and ftrace. > >> > > I do not think you made sure netdev_core_stats_inc() was never inlined. > > > > Adding more code in it is simply changing how the compiler decides to > > inline or not. > > > Yes, you are right. It needs to add the 'noinline' prefix. The > disassembly code will have 'pop' > > instruction. > The function was fine, you do not need anything like push or pop. The only needed stuff was the call __fentry__. The fact that the function was inlined for some invocations was the issue, because the trace point is only planted in the out of line function.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/9 15:53, Eric Dumazet wrote: On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make the trace work well. They all have 'pop' instructions in them. This may be the key to making the trace work well. Hi all, I need your help on percpu and ftrace. I do not think you made sure netdev_core_stats_inc() was never inlined. Adding more code in it is simply changing how the compiler decides to inline or not. Yes, you are right. It needs to add the 'noinline' prefix. The disassembly code will have 'pop' instruction. Thanks.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng wrote: > 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make > the trace work well. > > They all have 'pop' instructions in them. This may be the key to making > the trace work well. > > Hi all, > > I need your help on percpu and ftrace. > I do not think you made sure netdev_core_stats_inc() was never inlined. Adding more code in it is simply changing how the compiler decides to inline or not.
Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/8 17:12, Yajun Deng wrote: On 2023/10/8 16:53, Eric Dumazet wrote: On Sun, Oct 8, 2023 at 10:44 AM Yajun Deng wrote: On 2023/10/8 15:18, Eric Dumazet wrote: On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng wrote: On 2023/10/8 14:45, Eric Dumazet wrote: On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng wrote: On 2023/10/7 13:29, Eric Dumazet wrote: On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng wrote: Although there is a kfree_skb_reason() helper function that can be used to find the reason why this skb is dropped, but most callers didn't increase one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. ... + +void netdev_core_stats_inc(struct net_device *dev, u32 offset) +{ + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); + unsigned long *field; + + if (unlikely(!p)) + p = netdev_core_stats_alloc(dev); + + if (p) { + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset); + WRITE_ONCE(*field, READ_ONCE(*field) + 1); This is broken... As I explained earlier, dev_core_stats_(dev) can be called from many different contexts: 1) process contexts, where preemption and migration are allowed. 2) interrupt contexts. Adding WRITE_ONCE()/READ_ONCE() is not solving potential races. I _think_ I already gave you how to deal with this ? Yes, I replied in v6. https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a31728...@linux.dev/ Please try instead: +void netdev_core_stats_inc(struct net_device *dev, u32 offset) +{ + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); + unsigned long __percpu *field; + + if (unlikely(!p)) { + p = netdev_core_stats_alloc(dev); + if (!p) + return; + } + field = (__force unsigned long __percpu *)((__force void *)p + offset); + this_cpu_inc(*field); +} This wouldn't trace anything even the rx_dropped is in increasing. It needs to add an extra operation, such as: I honestly do not know what you are talking about. Have you even tried to change your patch to use field = (__force unsigned long __percpu *)((__force void *)p + offset); this_cpu_inc(*field); Yes, I tested this code. But the following couldn't show anything even if the rx_dropped is increasing. 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc' Well, I am not sure about this, "bpftrace" worked for me. Make sure your toolchain generates something that looks like what I got: ef20 : ef20: f3 0f 1e fa endbr64 ef24: e8 00 00 00 00 call ef29 ef25: R_X86_64_PLT32 __fentry__-0x4 ef29: 55 push %rbp ef2a: 48 89 e5 mov %rsp,%rbp ef2d: 53 push %rbx ef2e: 89 f3 mov %esi,%ebx ef30: 48 8b 87 f0 01 00 00 mov 0x1f0(%rdi),%rax ef37: 48 85 c0 test %rax,%rax ef3a: 74 0b je ef47 ef3c: 89 d9 mov %ebx,%ecx ef3e: 65 48 ff 04 08 incq %gs:(%rax,%rcx,1) ef43: 5b pop %rbx ef44: 5d pop %rbp ef45: c3 ret ef46: cc int3 ef47: e8 00 00 00 00 call ef4c ef48: R_X86_64_PLT32 .text.unlikely.+0x13c ef4c: 48 85 c0 test %rax,%rax ef4f: 75 eb jne ef3c ef51: eb f0 jmp ef43 ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1) ef5a: 84 00 00 00 00 00 I'll share some I can see it. 1. objdump -D vmlinux 81b2f170 : 81b2f170: e8 8b ea 55 ff callq 8108dc00 <__fentry__> 81b2f175: 55 push %rbp 81b2f176: 48 89 e5 mov %rsp,%rbp 81b2f179: 48 83 ec 08 sub $0x8,%rsp 81b2f17d: 48 8b 87 e8 01 00 00 mov 0x1e8(%rdi),%rax 81b2f184: 48 85 c0 test %rax,%rax 81b2f187: 74 0d je 81b2f196 81b2f189: 89 f6 mov %esi,%esi 81b2f18b: 65 48 ff 04 30 incq %gs:(%rax,%rsi,1) 81b2f190: c9 leaveq 81b2f191: e9 aa 31 6d 00 jmpq 82202340 <__x86_return_thunk> 81b2f196: 89 75 fc mov %esi,-0x4(%rbp) 81b2f199: e8 82 ff ff ff callq 81b2f120 81b2f19e: 8b 75 fc mov -0x4(%rbp),%esi 81b2f1a1: 48 85 c0 test %rax,%rax 81b2f1a4: 75 e3 jne