Re: [PATCH] uprobes: reduce contention on uprobes_tree access

2024-03-29 Thread Andrii Nakryiko
On Fri, Mar 29, 2024 at 5:36 PM Masami Hiramatsu  wrote:
>
> On Fri, 29 Mar 2024 10:33:57 -0700
> Andrii Nakryiko  wrote:
>
> > On Wed, Mar 27, 2024 at 5:45 PM Andrii Nakryiko
> >  wrote:
> > >
> > > On Wed, Mar 27, 2024 at 5:18 PM Masami Hiramatsu  
> > > wrote:
> > > >
> > > > On Wed, 27 Mar 2024 17:06:01 +
> > > > Jonthan Haslam  wrote:
> > > >
> > > > > > > Masami,
> > > > > > >
> > > > > > > Given the discussion around per-cpu rw semaphore and need for
> > > > > > > (internal) batched attachment API for uprobes, do you think you 
> > > > > > > can
> > > > > > > apply this patch as is for now? We can then gain initial 
> > > > > > > improvements
> > > > > > > in scalability that are also easy to backport, and Jonathan will 
> > > > > > > work
> > > > > > > on a more complete solution based on per-cpu RW semaphore, as
> > > > > > > suggested by Ingo.
> > > > > >
> > > > > > Yeah, it is interesting to use per-cpu rw semaphore on uprobe.
> > > > > > I would like to wait for the next version.
> > > > >
> > > > > My initial tests show a nice improvement on the over RW spinlocks but
> > > > > significant regression in acquiring a write lock. I've got a few days
> > > > > vacation over Easter but I'll aim to get some more formalised results 
> > > > > out
> > > > > to the thread toward the end of next week.
> > > >
> > > > As far as the write lock is only on the cold path, I think you can 
> > > > choose
> > > > per-cpu RW semaphore. Since it does not do busy wait, the total system
> > > > performance impact will be small.
> > >
> > > No, Masami, unfortunately it's not as simple. In BPF we have BPF
> > > multi-uprobe, which can be used to attach to thousands of user
> > > functions. It currently creates one uprobe at a time, as we don't
> > > really have a batched API. If each such uprobe registration will now
> > > take a (relatively) long time, when multiplied by number of attach-to
> > > user functions, it will be a horrible regression in terms of
> > > attachment/detachment performance.
>
> Ah, got it. So attachment/detachment performance should be counted.
>
> > >
> > > So when we switch to per-CPU rw semaphore, we'll need to provide an
> > > internal batch uprobe attach/detach API to make sure that attaching to
> > > multiple uprobes is still fast.
>
> Yeah, we need such interface like register_uprobes(...).
>
> > >
> > > Which is why I was asking to land this patch as is, as it relieves the
> > > scalability pains in production and is easy to backport to old
> > > kernels. And then we can work on batched APIs and switch to per-CPU rw
> > > semaphore.
>
> OK, then I'll push this to for-next at this moment.

Great, thanks a lot!

> Please share if you have a good idea for the batch interface which can be
> backported. I guess it should involve updating userspace changes too.
>

Yep, we'll investigate a best way to provide batch interface for
uprobes and will send patches.

> Thank you!
>
> > >
> > > So I hope you can reconsider and accept improvements in this patch,
> > > while Jonathan will keep working on even better final solution.
> > > Thanks!
> > >
> > > > I look forward to your formalized results :)
> > > >
> >
> > BTW, as part of BPF selftests, we have a multi-attach test for uprobes
> > and USDTs, reporting attach/detach timings:
> > $ sudo ./test_progs -v -t uprobe_multi_test/bench
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_bench_attach_uprobe:PASS:uprobe_multi_bench__open_and_load 0 nsec
> > test_bench_attach_uprobe:PASS:uprobe_multi_bench__attach 0 nsec
> > test_bench_attach_uprobe:PASS:uprobes_count 0 nsec
> > test_bench_attach_uprobe: attached in   0.120s
> > test_bench_attach_uprobe: detached in   0.092s
> > #400/5   uprobe_multi_test/bench_uprobe:OK
> > test_bench_attach_usdt:PASS:uprobe_multi__open 0 nsec
> > test_bench_attach_usdt:PASS:bpf_program__attach_usdt 0 nsec
> > test_bench_attach_usdt:PASS:usdt_count 0 nsec
> > test_bench_attach_usdt: attached in   0.124s
> > test_bench_attach_usdt: detached in   0.064s
> > #400/6   uprobe_multi_test/bench_usdt:OK
> > #400 uprobe_multi_test:OK
> > Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> > So it should be easy for Jonathan to validate his changes with this.
> >
> > > > Thank you,
> > > >
> > > > >
> > > > > Jon.
> > > > >
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > BTW, how did you measure the overhead? I think spinlock overhead
> > > > > > > > will depend on how much lock contention happens.
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > [0] https://docs.kernel.org/locking/spinlocks.html
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jonathan Haslam 
> > > > > > > > > ---
> > > > > > > > >  kernel/events/uprobes.c | 22 +++---
> > > > > > > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > 

Re: [PATCH] uprobes: reduce contention on uprobes_tree access

2024-03-29 Thread Google
On Fri, 29 Mar 2024 10:33:57 -0700
Andrii Nakryiko  wrote:

> On Wed, Mar 27, 2024 at 5:45 PM Andrii Nakryiko
>  wrote:
> >
> > On Wed, Mar 27, 2024 at 5:18 PM Masami Hiramatsu  
> > wrote:
> > >
> > > On Wed, 27 Mar 2024 17:06:01 +
> > > Jonthan Haslam  wrote:
> > >
> > > > > > Masami,
> > > > > >
> > > > > > Given the discussion around per-cpu rw semaphore and need for
> > > > > > (internal) batched attachment API for uprobes, do you think you can
> > > > > > apply this patch as is for now? We can then gain initial 
> > > > > > improvements
> > > > > > in scalability that are also easy to backport, and Jonathan will 
> > > > > > work
> > > > > > on a more complete solution based on per-cpu RW semaphore, as
> > > > > > suggested by Ingo.
> > > > >
> > > > > Yeah, it is interesting to use per-cpu rw semaphore on uprobe.
> > > > > I would like to wait for the next version.
> > > >
> > > > My initial tests show a nice improvement on the over RW spinlocks but
> > > > significant regression in acquiring a write lock. I've got a few days
> > > > vacation over Easter but I'll aim to get some more formalised results 
> > > > out
> > > > to the thread toward the end of next week.
> > >
> > > As far as the write lock is only on the cold path, I think you can choose
> > > per-cpu RW semaphore. Since it does not do busy wait, the total system
> > > performance impact will be small.
> >
> > No, Masami, unfortunately it's not as simple. In BPF we have BPF
> > multi-uprobe, which can be used to attach to thousands of user
> > functions. It currently creates one uprobe at a time, as we don't
> > really have a batched API. If each such uprobe registration will now
> > take a (relatively) long time, when multiplied by number of attach-to
> > user functions, it will be a horrible regression in terms of
> > attachment/detachment performance.

Ah, got it. So attachment/detachment performance should be counted.

> >
> > So when we switch to per-CPU rw semaphore, we'll need to provide an
> > internal batch uprobe attach/detach API to make sure that attaching to
> > multiple uprobes is still fast.

Yeah, we need such interface like register_uprobes(...).

> >
> > Which is why I was asking to land this patch as is, as it relieves the
> > scalability pains in production and is easy to backport to old
> > kernels. And then we can work on batched APIs and switch to per-CPU rw
> > semaphore.

OK, then I'll push this to for-next at this moment.
Please share if you have a good idea for the batch interface which can be
backported. I guess it should involve updating userspace changes too.

Thank you!

> >
> > So I hope you can reconsider and accept improvements in this patch,
> > while Jonathan will keep working on even better final solution.
> > Thanks!
> >
> > > I look forward to your formalized results :)
> > >
> 
> BTW, as part of BPF selftests, we have a multi-attach test for uprobes
> and USDTs, reporting attach/detach timings:
> $ sudo ./test_progs -v -t uprobe_multi_test/bench
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_bench_attach_uprobe:PASS:uprobe_multi_bench__open_and_load 0 nsec
> test_bench_attach_uprobe:PASS:uprobe_multi_bench__attach 0 nsec
> test_bench_attach_uprobe:PASS:uprobes_count 0 nsec
> test_bench_attach_uprobe: attached in   0.120s
> test_bench_attach_uprobe: detached in   0.092s
> #400/5   uprobe_multi_test/bench_uprobe:OK
> test_bench_attach_usdt:PASS:uprobe_multi__open 0 nsec
> test_bench_attach_usdt:PASS:bpf_program__attach_usdt 0 nsec
> test_bench_attach_usdt:PASS:usdt_count 0 nsec
> test_bench_attach_usdt: attached in   0.124s
> test_bench_attach_usdt: detached in   0.064s
> #400/6   uprobe_multi_test/bench_usdt:OK
> #400 uprobe_multi_test:OK
> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
> Successfully unloaded bpf_testmod.ko.
> 
> So it should be easy for Jonathan to validate his changes with this.
> 
> > > Thank you,
> > >
> > > >
> > > > Jon.
> > > >
> > > > >
> > > > > Thank you,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > BTW, how did you measure the overhead? I think spinlock overhead
> > > > > > > will depend on how much lock contention happens.
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > >
> > > > > > > > [0] https://docs.kernel.org/locking/spinlocks.html
> > > > > > > >
> > > > > > > > Signed-off-by: Jonathan Haslam 
> > > > > > > > ---
> > > > > > > >  kernel/events/uprobes.c | 22 +++---
> > > > > > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > > > > > > index 929e98c62965..42bf9b6e8bc0 100644
> > > > > > > > --- a/kernel/events/uprobes.c
> > > > > > > > +++ b/kernel/events/uprobes.c
> > > > > > > > @@ -39,7 +39,7 @@ static struct rb_root uprobes_tree = RB_ROOT;
> > > > > > > >   */
> > > > > > > >  #define no_uprobe_events()   RB_EMPTY_ROOT(_tree)
> > > > > > > >
> > > > 

Re: [PATCH v19 RESEND 3/5] tracing: Allow user-space mapping of the ring-buffer

2024-03-29 Thread Steven Rostedt
On Tue, 26 Mar 2024 10:08:28 +
Vincent Donnefort  wrote:

> Currently, user-space extracts data from the ring-buffer via splice,
> which is handy for storage or network sharing. However, due to splice
> limitations, it is imposible to do real-time analysis without a copy.
> 
> A solution for that problem is to let the user-space map the ring-buffer
> directly.
> 
> The mapping is exposed via the per-CPU file trace_pipe_raw. The first
> element of the mapping is the meta-page. It is followed by each
> subbuffer constituting the ring-buffer, ordered by their unique page ID:
> 
>   * Meta-page -- include/uapi/linux/trace_mmap.h for a description
>   * Subbuf ID 0
>   * Subbuf ID 1
>  ...
> 
> It is therefore easy to translate a subbuf ID into an offset in the
> mapping:
> 
>   reader_id = meta->reader->id;
>   reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
> 
> When new data is available, the mapper must call a newly introduced ioctl:
> TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
> point to the next reader containing unread data.
> 

Thanks for the update Vincent!

> Mapping will prevent snapshot and buffer size modifications.
> 
> CC: 
> Signed-off-by: Vincent Donnefort 
> 
> diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
> index ffcd8dfcaa4f..d25b9d504a7c 100644
> --- a/include/uapi/linux/trace_mmap.h
> +++ b/include/uapi/linux/trace_mmap.h
> @@ -43,4 +43,6 @@ struct trace_buffer_meta {
>   __u64   Reserved2;
>  };
>  
> +#define TRACE_MMAP_IOCTL_GET_READER  _IO('T', 0x1)
> +
>  #endif /* _TRACE_MMAP_H_ */
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 233d1af39fff..0f37aa9860fd 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1191,6 +1191,12 @@ static void tracing_snapshot_instance_cond(struct 
> trace_array *tr,
>   return;
>   }
>  
> + if (tr->mapped) {
> + trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
> + trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
> + return;
> + }
> +
>   local_irq_save(flags);
>   update_max_tr(tr, current, smp_processor_id(), cond_data);
>   local_irq_restore(flags);
> @@ -1323,7 +1329,7 @@ static int tracing_arm_snapshot_locked(struct 
> trace_array *tr)
>   lockdep_assert_held(_types_lock);
>  
>   spin_lock(>snapshot_trigger_lock);
> - if (tr->snapshot == UINT_MAX) {
> + if (tr->snapshot == UINT_MAX || tr->mapped) {
>   spin_unlock(>snapshot_trigger_lock);
>   return -EBUSY;
>   }
> @@ -6068,7 +6074,7 @@ static void tracing_set_nop(struct trace_array *tr)
>  {
>   if (tr->current_trace == _trace)
>   return;
> - 
> +
>   tr->current_trace->enabled--;
>  
>   if (tr->current_trace->reset)
> @@ -8194,15 +8200,32 @@ tracing_buffers_splice_read(struct file *file, loff_t 
> *ppos,
>   return ret;
>  }
>  
> -/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters 
> */
>  static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
> unsigned long arg)
>  {
>   struct ftrace_buffer_info *info = file->private_data;
>   struct trace_iterator *iter = >iter;
> + int err;
> +
> + if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
> + if (!(file->f_flags & O_NONBLOCK)) {
> + err = ring_buffer_wait(iter->array_buffer->buffer,
> +iter->cpu_file,
> +iter->tr->buffer_percent,
> +NULL, NULL);
> + if (err)
> + return err;
> + }
>  
> - if (cmd)
> - return -ENOIOCTLCMD;
> + return ring_buffer_map_get_reader(iter->array_buffer->buffer,
> +   iter->cpu_file);
> + } else if (cmd) {
> + return -ENOTTY;
> + }
>  
> + /*
> +  * An ioctl call with cmd 0 to the ring buffer file will wake up all
> +  * waiters
> +  */
>   mutex_lock(_types_lock);
>  
>   /* Make sure the waiters see the new wait_index */
> @@ -8214,6 +8237,94 @@ static long tracing_buffers_ioctl(struct file *file, 
> unsigned int cmd, unsigned
>   return 0;
>  }
>  
> +static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}

If this is all it does, I don't believe it's needed.

> +
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +static int get_snapshot_map(struct trace_array *tr)
> +{
> + int err = 0;
> +
> + /*
> +  * Called with mmap_lock held. lockdep would be unhappy if we would now
> +  * take trace_types_lock. Instead use the specific
> +  * snapshot_trigger_lock.
> +  */
> + spin_lock(>snapshot_trigger_lock);
> +
> + if (tr->snapshot || tr->mapped == UINT_MAX)
> +  

[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Naik, Avadhut



On 3/29/2024 11:50, Luck, Tony wrote:
>>> - While at it, don't forget to:
>>>
>>>s/ADDR/MISC/SYND
>>> /addr/misc/synd
>>>
>> These are actually acronyms for Address, Miscellaneous and Syndrome 
>> registers.
> 
> They look like abbreviations, not acronyms to me.
> 
> -Tony

Yes, they are actually abbreviations. Wrong choice of words on my part.
Was under the impression that Boris' recommendation also applied to
abbreviations. Will change them though and resubmit.

-- 
Thanks,
Avadhut Naik



Re: [PATCH] uprobes: reduce contention on uprobes_tree access

2024-03-29 Thread Andrii Nakryiko
On Wed, Mar 27, 2024 at 5:45 PM Andrii Nakryiko
 wrote:
>
> On Wed, Mar 27, 2024 at 5:18 PM Masami Hiramatsu  wrote:
> >
> > On Wed, 27 Mar 2024 17:06:01 +
> > Jonthan Haslam  wrote:
> >
> > > > > Masami,
> > > > >
> > > > > Given the discussion around per-cpu rw semaphore and need for
> > > > > (internal) batched attachment API for uprobes, do you think you can
> > > > > apply this patch as is for now? We can then gain initial improvements
> > > > > in scalability that are also easy to backport, and Jonathan will work
> > > > > on a more complete solution based on per-cpu RW semaphore, as
> > > > > suggested by Ingo.
> > > >
> > > > Yeah, it is interesting to use per-cpu rw semaphore on uprobe.
> > > > I would like to wait for the next version.
> > >
> > > My initial tests show a nice improvement on the over RW spinlocks but
> > > significant regression in acquiring a write lock. I've got a few days
> > > vacation over Easter but I'll aim to get some more formalised results out
> > > to the thread toward the end of next week.
> >
> > As far as the write lock is only on the cold path, I think you can choose
> > per-cpu RW semaphore. Since it does not do busy wait, the total system
> > performance impact will be small.
>
> No, Masami, unfortunately it's not as simple. In BPF we have BPF
> multi-uprobe, which can be used to attach to thousands of user
> functions. It currently creates one uprobe at a time, as we don't
> really have a batched API. If each such uprobe registration will now
> take a (relatively) long time, when multiplied by number of attach-to
> user functions, it will be a horrible regression in terms of
> attachment/detachment performance.
>
> So when we switch to per-CPU rw semaphore, we'll need to provide an
> internal batch uprobe attach/detach API to make sure that attaching to
> multiple uprobes is still fast.
>
> Which is why I was asking to land this patch as is, as it relieves the
> scalability pains in production and is easy to backport to old
> kernels. And then we can work on batched APIs and switch to per-CPU rw
> semaphore.
>
> So I hope you can reconsider and accept improvements in this patch,
> while Jonathan will keep working on even better final solution.
> Thanks!
>
> > I look forward to your formalized results :)
> >

BTW, as part of BPF selftests, we have a multi-attach test for uprobes
and USDTs, reporting attach/detach timings:
$ sudo ./test_progs -v -t uprobe_multi_test/bench
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_bench_attach_uprobe:PASS:uprobe_multi_bench__open_and_load 0 nsec
test_bench_attach_uprobe:PASS:uprobe_multi_bench__attach 0 nsec
test_bench_attach_uprobe:PASS:uprobes_count 0 nsec
test_bench_attach_uprobe: attached in   0.120s
test_bench_attach_uprobe: detached in   0.092s
#400/5   uprobe_multi_test/bench_uprobe:OK
test_bench_attach_usdt:PASS:uprobe_multi__open 0 nsec
test_bench_attach_usdt:PASS:bpf_program__attach_usdt 0 nsec
test_bench_attach_usdt:PASS:usdt_count 0 nsec
test_bench_attach_usdt: attached in   0.124s
test_bench_attach_usdt: detached in   0.064s
#400/6   uprobe_multi_test/bench_usdt:OK
#400 uprobe_multi_test:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Successfully unloaded bpf_testmod.ko.

So it should be easy for Jonathan to validate his changes with this.

> > Thank you,
> >
> > >
> > > Jon.
> > >
> > > >
> > > > Thank you,
> > > >
> > > > >
> > > > > >
> > > > > > BTW, how did you measure the overhead? I think spinlock overhead
> > > > > > will depend on how much lock contention happens.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > >
> > > > > > > [0] https://docs.kernel.org/locking/spinlocks.html
> > > > > > >
> > > > > > > Signed-off-by: Jonathan Haslam 
> > > > > > > ---
> > > > > > >  kernel/events/uprobes.c | 22 +++---
> > > > > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > > > > > index 929e98c62965..42bf9b6e8bc0 100644
> > > > > > > --- a/kernel/events/uprobes.c
> > > > > > > +++ b/kernel/events/uprobes.c
> > > > > > > @@ -39,7 +39,7 @@ static struct rb_root uprobes_tree = RB_ROOT;
> > > > > > >   */
> > > > > > >  #define no_uprobe_events()   RB_EMPTY_ROOT(_tree)
> > > > > > >
> > > > > > > -static DEFINE_SPINLOCK(uprobes_treelock);/* serialize rbtree 
> > > > > > > access */
> > > > > > > +static DEFINE_RWLOCK(uprobes_treelock);  /* serialize rbtree 
> > > > > > > access */
> > > > > > >
> > > > > > >  #define UPROBES_HASH_SZ  13
> > > > > > >  /* serialize uprobe->pending_list */
> > > > > > > @@ -669,9 +669,9 @@ static struct uprobe *find_uprobe(struct 
> > > > > > > inode *inode, loff_t offset)
> > > > > > >  {
> > > > > > >   struct uprobe *uprobe;
> > > > > > >
> > > > > > > - spin_lock(_treelock);
> > > > > > > + read_lock(_treelock);
> > > > > > >   uprobe = __find_uprobe(inode, offset);
> > 

[PATCH net v3] virtio_net: Do not send RSS key if it is not supported

2024-03-29 Thread Breno Leitao
There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.

Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:

# ethtool -X eth0  hfunc toeplitz

This is how the problem happens:

1) ethtool_set_rxfh() calls virtnet_set_rxfh()

2) virtnet_set_rxfh() calls virtnet_commit_rss_command()

3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather

4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;

5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):

  if (!sz) {
  virtio_error(vdev, "virtio: zero sized buffers are not allowed");

6) virtio_error() (also QEMU function) set the device as broken

vdev->broken = true;

7) Qemu bails out, and do not repond this crazy kernel.

8) The kernel is waiting for the response to come back (function
virtnet_send_command())

9) The kernel is waiting doing the following :

  while (!virtqueue_get_buf(vi->cvq, ) &&
 !virtqueue_is_broken(vi->cvq))
  cpu_relax();

10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.

Fix it by not sending RSS commands if the feature is not available in
the device.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: sta...@vger.kernel.org
Cc: qemu-de...@nongnu.org
Signed-off-by: Breno Leitao 
---
Changelog:

V2:
  * Moved from creating a valid packet, by rejecting the request
completely
V3:
  * Got some good feedback from and Xuan Zhuo and Heng Qi, and reworked
the rejection path.

---
 drivers/net/virtio_net.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..c4a21ec51adf 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3807,6 +3807,7 @@ static int virtnet_set_rxfh(struct net_device *dev,
struct netlink_ext_ack *extack)
 {
struct virtnet_info *vi = netdev_priv(dev);
+   bool update = false;
int i;
 
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
@@ -3814,13 +3815,24 @@ static int virtnet_set_rxfh(struct net_device *dev,
return -EOPNOTSUPP;
 
if (rxfh->indir) {
+   if (!vi->has_rss)
+   return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = rxfh->indir[i];
+   update = true;
}
-   if (rxfh->key)
+
+   if (rxfh->key) {
+   if (!vi->has_rss && !vi->has_rss_hash_report)
+   return -EOPNOTSUPP;
+
memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
+   update = true;
+   }
 
-   virtnet_commit_rss_command(vi);
+   if (update)
+   virtnet_commit_rss_command(vi);
 
return 0;
 }
@@ -4729,13 +4741,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
 
-   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+   }
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, 
rss_max_key_size));
 
-- 
2.43.0




RE: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Luck, Tony
>> - While at it, don't forget to:
>> 
>>s/ADDR/MISC/SYND
>> /addr/misc/synd
>>
> These are actually acronyms for Address, Miscellaneous and Syndrome registers.

They look like abbreviations, not acronyms to me.

-Tony


Re: [PATCH] ftrace: make extra rcu_is_watching() validation check optional

2024-03-29 Thread Andrii Nakryiko
On Tue, Mar 26, 2024 at 11:58 AM Steven Rostedt  wrote:
>
> On Tue, 26 Mar 2024 09:16:33 -0700
> Andrii Nakryiko  wrote:
>
> > > It's no different than lockdep. Test boxes should have it enabled, but
> > > there's no reason to have this enabled in a production system.
> > >
> >
> > I tend to agree with Steven here (which is why I sent this patch as it
> > is), but I'm happy to do it as an opt-out, if Masami insists. Please
> > do let me know if I need to send v2 or this one is actually the one
> > we'll end up using. Thanks!
>
> Masami,
>
> Are you OK with just keeping it set to N.
>
> We could have other options like PROVE_LOCKING enable it.
>

So what's the conclusion, Masami? Should I send another version where
this config is opt-out, or are you ok with keeping it as opt-in as
proposed in this revision?

> -- Steve



[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Naik, Avadhut



On 3/29/2024 02:11, Ingo Molnar wrote:
> 
> Please split out the other (capitalization) changes to the output into 
> a separate patch.
> 
Okay. Will put the capitalization changes into a separate patch.

> - While at it, don't forget to:
> 
>s/ADDR/MISC/SYND
> /addr/misc/synd
>
These are actually acronyms for Address, Miscellaneous and Syndrome registers.

It was recommended to keep the acronyms in ALL CAPS. Hence, didn't change them.

https://lore.kernel.org/linux-edac/20240327205435.3667588-1-avadhut.n...@amd.com/T/#m0c04f1c0deaa0347af66653a5950aad5f6b320e7
 
> - Also, it's a bit weird that we have 'CPU' but also 'processor' 
>   fields, why isn't it 'vendor' and 'CPUID'?
> 
Think it has been this way since the tracepoint was created back
in 2009. Will modify the field per your suggestion.

> - Finally, why are some fields 'merged' as per ADDR/MISC/SYND, while 
>   others are listed separately? All that have separate names should be 
>   listed separately.
> 
Will separate the fields so that each is listed individually.
> Ie. something like the patch below?
> 
> Thanks,
> 
>   Ingo
> 
> >
> From: Ingo Molnar 
> Date: Fri, 29 Mar 2024 08:09:23 +0100
> Subject: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record 
> tracepoint
> 
>  - Only capitalize entries where that makes sense
>  - Print separate values separately
>  - Rename 'PROCESSOR' to vendor & CPUID
> 
> Signed-off-by: Ingo Molnar 
> ---
>  include/trace/events/mce.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
> index 1391ada0da3b..c5b0523f25ee 100644
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -55,15 +55,18 @@ TRACE_EVENT(mce_record,
>   __entry->cpuvendor  = m->cpuvendor;
>   ),
>  
> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> addr: %016Lx, misc: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x",
>   __entry->cpu,
>   __entry->mcgcap, __entry->mcgstatus,
>   __entry->bank, __entry->status,
>   __entry->ipid,
> - __entry->addr, __entry->misc, __entry->synd,
> + __entry->addr,
> + __entry->misc,
> + __entry->synd,
>   __entry->cs, __entry->ip,
>   __entry->tsc,
> - __entry->cpuvendor, __entry->cpuid,
> + __entry->cpuvendor,
> + __entry->cpuid,
>   __entry->walltime,
>   __entry->socketid,
>   __entry->apicid)

-- 
Thanks,
Avadhut Naik



Re: [PATCH net v2] vsock/virtio: fix packet delivery to tap device

2024-03-29 Thread Stefano Garzarella

On Fri, Mar 29, 2024 at 05:12:59PM +0100, Marco Pinna wrote:

Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.

Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.

Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: sta...@vge.kernel.org
Signed-off-by: Marco Pinna 
---
net/vmw_vsock/virtio_transport.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 1748268e0694..ee5d306a96d0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -120,7 +120,6 @@ virtio_transport_send_pkt_work(struct work_struct *work)
if (!skb)
break;

-   virtio_transport_deliver_tap_pkt(skb);
reply = virtio_vsock_skb_reply(skb);
sgs = vsock->out_sgs;
sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb),
@@ -170,6 +169,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
break;
}

+   virtio_transport_deliver_tap_pkt(skb);
+
if (reply) {
struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
int val;
--
2.44.0






[PATCH net v2] vsock/virtio: fix packet delivery to tap device

2024-03-29 Thread Marco Pinna
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.

Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.

Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: sta...@vge.kernel.org
Signed-off-by: Marco Pinna 
---
 net/vmw_vsock/virtio_transport.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 1748268e0694..ee5d306a96d0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -120,7 +120,6 @@ virtio_transport_send_pkt_work(struct work_struct *work)
if (!skb)
break;
 
-   virtio_transport_deliver_tap_pkt(skb);
reply = virtio_vsock_skb_reply(skb);
sgs = vsock->out_sgs;
sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb),
@@ -170,6 +169,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
break;
}
 
+   virtio_transport_deliver_tap_pkt(skb);
+
if (reply) {
struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
int val;
-- 
2.44.0




[RESEND PATCH v2] tracing: Improve performance by using do_div()

2024-03-29 Thread Thorsten Blum
Partially revert commit d6cb38e10810 ("tracing: Use div64_u64() instead
of do_div()") and use do_div() again to utilize its faster 64-by-32
division compared to the 64-by-64 division done by div64_u64().

Explicitly cast the divisor bm_cnt to u32 to prevent a Coccinelle
warning reported by do_div.cocci. The warning was removed with commit
d6cb38e10810 ("tracing: Use div64_u64() instead of do_div()").

Using the faster 64-by-32 division and casting bm_cnt to u32 is safe
because we return early from trace_do_benchmark() if bm_cnt > UINT_MAX.
This approach is already used twice in trace_do_benchmark() when
calculating the standard deviation:

do_div(stddev, (u32)bm_cnt);
do_div(stddev, (u32)bm_cnt - 1);

Signed-off-by: Thorsten Blum 
---
Changes in v2:
- Update patch with latest changes from master
- Update patch title and description
---
 kernel/trace/trace_benchmark.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index 811b08439406..e19c32f2a938 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -104,7 +104,7 @@ static void trace_do_benchmark(void)
stddev = 0;
 
delta = bm_total;
-   delta = div64_u64(delta, bm_cnt);
+   do_div(delta, (u32)bm_cnt);
avg = delta;
 
if (stddev > 0) {
-- 
2.44.0




[PATCH] perf-ftrace: sync funcgraph options with perf-ftrace interface

2024-03-29 Thread piecuch
From: Krzysztof Piecuch 

Adding overrun, retval, retval-hex and tail options to set
off-by-default funcgraph features.

Add nocpu, nooverhead, noduration options to turn off funcgraph
features which are on by default.

Signed-off-by: Krzysztof Piecuch 
---
 Documentation/trace/ftrace.rst   |   5 +-
 tools/perf/Documentation/perf-ftrace.txt |  16 +++
 tools/perf/builtin-ftrace.c  | 118 ++-
 tools/perf/util/ftrace.h |   7 ++
 4 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 7e7b8ec17934..9a608a59dce6 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1384,9 +1384,8 @@ Options for function_graph tracer:
of each process displayed at every line.
 
   funcgraph-duration
-   At the end of each function (the return)
-   the duration of the amount of time in the
-   function is displayed in microseconds.
+   At the end (return) of each function the duration
+   of the function is displayed in microseconds.
 
   funcgraph-abstime
When set, the timestamp is displayed at each line.
diff --git a/tools/perf/Documentation/perf-ftrace.txt 
b/tools/perf/Documentation/perf-ftrace.txt
index d780b93fcf87..0cf7b3301478 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -121,6 +121,22 @@ OPTIONS for 'perf ftrace trace'
List of options allowed to set:
 
  - nosleep-time - Measure on-CPU time only for function_graph tracer.
+ - overrun  - Show number of lost events due to overwriting when
+   the buffer was full.
+ - nocpu- Don't show the CPU which the process was running on.
+ - nooverhead   - Do not print the appropriate delay sign next to
+   long-running functions (' ' for sub-10us, '+' for sub-100us, '!' for
+   sub-1ms, '#' for sub-10ms, '*' for sub-100ms and '@' for sub-1s, '$'
+   for sub-1s runtime).
+ - noduration   - Don't show function the end of each function (the
+   return) the duration of the amount of time in the function is
+   displayed in microseconds.
+ - retval   - Print function's return value on exit. Error codes
+   are printed as signed decimals, other return values are printed as
+   hexadecimals.
+ - retval-hex   - Print all return values in hexadecimal format.
+ - tail - Print the returning name of function next to
+   closing brackets.
  - noirqs   - Ignore functions that happen inside interrupt.
  - verbose  - Show process names, PIDs, timestamps, etc.
  - thresh=   - Setup trace duration threshold in microseconds.
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index eb30c8eca488..1c58fa744758 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -225,9 +225,16 @@ static void reset_tracing_options(struct perf_ftrace 
*ftrace __maybe_unused)
write_tracing_option_file("function-fork", "0");
write_tracing_option_file("func_stack_trace", "0");
write_tracing_option_file("sleep-time", "1");
+   write_tracing_option_file("funcgraph-overrun", "0");
+   write_tracing_option_file("funcgraph-cpu", "1");
+   write_tracing_option_file("funcgraph-overhead", "1");
+   write_tracing_option_file("funcgraph-duration", "1");
write_tracing_option_file("funcgraph-irqs", "1");
write_tracing_option_file("funcgraph-proc", "0");
write_tracing_option_file("funcgraph-abstime", "0");
+   write_tracing_option_file("funcgraph-retval", "0");
+   write_tracing_option_file("funcgraph-retval-hex", "0");
+   write_tracing_option_file("funcgraph-tail", "0");
write_tracing_option_file("latency-format", "0");
write_tracing_option_file("irq-info", "0");
 }
@@ -425,6 +432,42 @@ static int set_tracing_trace_inherit(struct perf_ftrace 
*ftrace)
return 0;
 }
 
+static int set_tracing_funcgraph_overrun(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_overrun == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-overrun", "1") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_nocpu(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_nocpu == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-cpu", "0") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_nooverhead(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_nooverhead == 0)
+   return 0;
+   if (write_tracing_option_file("funcgraph-overhead", "0") < 0)
+   return -1;
+   return 0;
+}
+
+static int set_tracing_funcgraph_noduration(struct perf_ftrace *ftrace)
+{
+   if (ftrace->graph_noduration == 0)
+   return 0;
+   if 

[PATCH AUTOSEL 5.10 31/31] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment

2024-03-29 Thread Sasha Levin
From: linke li 

[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: 
https://lore.kernel.org/linux-trace-kernel/tencent_dff7d3561a0686b5e8fc079150a025051...@qq.com

Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Signed-off-by: linke li 
Signed-off-by: Steven Rostedt (Google) 
Signed-off-by: Sasha Levin 
---
 kernel/trace/ring_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 4a43b8846b49f..70b6cb6bfb56e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4184,7 +4184,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
-   commit_page = cpu_buffer->commit_page;
+   commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
 
/*
-- 
2.43.0




[PATCH AUTOSEL 5.15 34/34] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment

2024-03-29 Thread Sasha Levin
From: linke li 

[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: 
https://lore.kernel.org/linux-trace-kernel/tencent_dff7d3561a0686b5e8fc079150a025051...@qq.com

Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Signed-off-by: linke li 
Signed-off-by: Steven Rostedt (Google) 
Signed-off-by: Sasha Levin 
---
 kernel/trace/ring_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d9bed77f96c1f..2b46c66fb132d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4343,7 +4343,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
-   commit_page = cpu_buffer->commit_page;
+   commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
 
/*
-- 
2.43.0




[PATCH AUTOSEL 6.1 52/52] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment

2024-03-29 Thread Sasha Levin
From: linke li 

[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: 
https://lore.kernel.org/linux-trace-kernel/tencent_dff7d3561a0686b5e8fc079150a025051...@qq.com

Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Signed-off-by: linke li 
Signed-off-by: Steven Rostedt (Google) 
Signed-off-by: Sasha Levin 
---
 kernel/trace/ring_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e019a9278794f..7ed92f311dc9b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4384,7 +4384,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
-   commit_page = cpu_buffer->commit_page;
+   commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
 
/*
-- 
2.43.0




[PATCH AUTOSEL 6.6 75/75] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment

2024-03-29 Thread Sasha Levin
From: linke li 

[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: 
https://lore.kernel.org/linux-trace-kernel/tencent_dff7d3561a0686b5e8fc079150a025051...@qq.com

Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Signed-off-by: linke li 
Signed-off-by: Steven Rostedt (Google) 
Signed-off-by: Sasha Levin 
---
 kernel/trace/ring_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1ac6637895a44..0d98e847fd6c2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4389,7 +4389,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
-   commit_page = cpu_buffer->commit_page;
+   commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
 
/*
-- 
2.43.0




[PATCH AUTOSEL 6.8 98/98] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment

2024-03-29 Thread Sasha Levin
From: linke li 

[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer->commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: 
https://lore.kernel.org/linux-trace-kernel/tencent_dff7d3561a0686b5e8fc079150a025051...@qq.com

Cc: Masami Hiramatsu 
Cc: Mathieu Desnoyers 
Signed-off-by: linke li 
Signed-off-by: Steven Rostedt (Google) 
Signed-off-by: Sasha Levin 
---
 kernel/trace/ring_buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b1..54410c8cacbe8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4350,7 +4350,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
cpu_buffer = iter->cpu_buffer;
reader = cpu_buffer->reader_page;
head_page = cpu_buffer->head_page;
-   commit_page = cpu_buffer->commit_page;
+   commit_page = READ_ONCE(cpu_buffer->commit_page);
commit_ts = commit_page->page->time_stamp;
 
/*
-- 
2.43.0




[PATCH v2 3/3] arm64: dts: qcom: sm7225-fairphone-fp4: Enable USB role switching

2024-03-29 Thread Luca Weiss
Configure the Type-C and VBUS regulator on PM7250B and wire it up to the
USB PHY, so that USB role and orientation switching works.

For now USB Power Delivery properties are skipped / disabled, so that
the (presumably) bootloader-configured charger doesn't get messed with
and we can charge the phone with at least some amount of power.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/sm6350.dtsi  | 47 ++
 arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts | 60 ++-
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sm6350.dtsi 
b/arch/arm64/boot/dts/qcom/sm6350.dtsi
index b663c1b18f61..2e135989de8c 100644
--- a/arch/arm64/boot/dts/qcom/sm6350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6350.dtsi
@@ -1717,6 +1717,33 @@ usb_1_qmpphy: phy@88e8000 {
#phy-cells = <1>;
 
status = "disabled";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+
+   usb_1_qmpphy_out: endpoint {
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+
+   usb_1_qmpphy_usb_ss_in: endpoint {
+   remote-endpoint = 
<_1_dwc3_ss_out>;
+   };
+   };
+
+   port@2 {
+   reg = <2>;
+
+   usb_1_qmpphy_dp_in: endpoint {
+   };
+   };
+   };
};
 
dc_noc: interconnect@916 {
@@ -1892,6 +1919,26 @@ usb_1_dwc3: usb@a60 {
snps,hird-threshold = /bits/ 8 <0x10>;
phys = <_1_hsphy>, <_1_qmpphy 
QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+
+   usb_1_dwc3_hs_out: endpoint {
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+
+   usb_1_dwc3_ss_out: endpoint {
+   remote-endpoint = 
<_1_qmpphy_usb_ss_in>;
+   };
+   };
+   };
};
};
 
diff --git a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts 
b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
index bc67e8c1fe4d..5d7778c48413 100644
--- a/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
+++ b/arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "sm7225.dtsi"
 #include "pm6150l.dtsi"
 #include "pm6350.dtsi"
@@ -543,6 +544,53 @@ conn-therm@1 {
};
 };
 
+_typec {
+   vdd-pdphy-supply = <_l3a>;
+
+   status = "okay";
+
+   connector {
+   compatible = "usb-c-connector";
+
+   power-role = "dual";
+   data-role = "dual";
+   self-powered;
+
+   /*
+* Disable USB Power Delivery for now, seems to need extra work
+* to support role switching while also letting the battery
+* charge still - without charger driver
+*/
+   typec-power-opmode = "default";
+   pd-disable;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+   pm7250b_hs_in: endpoint {
+   remote-endpoint = <_1_dwc3_hs_out>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+   pm7250b_ss_in: endpoint {
+   remote-endpoint = <_1_qmpphy_out>;
+   };
+   };
+   };
+   };
+};
+
+_vbus {
+   regulator-min-microamp = <50>;
+   regulator-max-microamp = 

[PATCH v2 2/3] arm64: dts: qcom: pm7250b: Add a TCPM description

2024-03-29 Thread Luca Weiss
Type-C port management functionality lives inside of the PMIC block on
pm7250b.

The Type-C port management logic controls orientation detection,
vbus/vconn sense and to send/receive Type-C Power Domain messages.

Reviewed-by: Bryan O'Donoghue 
Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/pm7250b.dtsi | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm7250b.dtsi 
b/arch/arm64/boot/dts/qcom/pm7250b.dtsi
index 4faed25a787f..0205c2669093 100644
--- a/arch/arm64/boot/dts/qcom/pm7250b.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm7250b.dtsi
@@ -51,6 +51,45 @@ pm7250b_vbus: usb-vbus-regulator@1100 {
status = "disabled";
};
 
+   pm7250b_typec: typec@1500 {
+   compatible = "qcom,pm7250b-typec", "qcom,pm8150b-typec";
+   reg = <0x1500>,
+ <0x1700>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   interrupt-names = "or-rid-detect-change",
+ "vpd-detect",
+ "cc-state-change",
+ "vconn-oc",
+ "vbus-change",
+ "attach-detach",
+ "legacy-cable-detect",
+ "try-snk-src-detect",
+ "sig-tx",
+ "sig-rx",
+ "msg-tx",
+ "msg-rx",
+ "msg-tx-failed",
+ "msg-tx-discarded",
+ "msg-rx-discarded",
+ "fr-swap";
+   vdd-vbus-supply = <_vbus>;
+   };
+
pm7250b_temp: temp-alarm@2400 {
compatible = "qcom,spmi-temp-alarm";
reg = <0x2400>;

-- 
2.44.0




[PATCH v2 1/3] arm64: dts: qcom: pm7250b: Add node for PMIC VBUS booster

2024-03-29 Thread Luca Weiss
Add the required DTS node for the USB VBUS output regulator, which is
available on PM7250B. This will provide the VBUS source to connected
peripherals.

Reviewed-by: Bryan O'Donoghue 
Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/pm7250b.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm7250b.dtsi 
b/arch/arm64/boot/dts/qcom/pm7250b.dtsi
index 3bf7cf5d1700..4faed25a787f 100644
--- a/arch/arm64/boot/dts/qcom/pm7250b.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm7250b.dtsi
@@ -45,6 +45,12 @@ pmic@PM7250B_SID {
#address-cells = <1>;
#size-cells = <0>;
 
+   pm7250b_vbus: usb-vbus-regulator@1100 {
+   compatible = "qcom,pm7250b-vbus-reg", 
"qcom,pm8150b-vbus-reg";
+   reg = <0x1100>;
+   status = "disabled";
+   };
+
pm7250b_temp: temp-alarm@2400 {
compatible = "qcom,spmi-temp-alarm";
reg = <0x2400>;

-- 
2.44.0




[PATCH v2 0/3] Add TCPM support for PM7250B and Fairphone 4

2024-03-29 Thread Luca Weiss
This series adds support for Type-C Port Management on the Fairphone 4
which enables USB role switching and orientation switching.

This enables a user for example to plug in a USB stick or a USB keyboard
to the Type-C port.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Rob Herring 
To: Krzysztof Kozlowski 
To: Conor Dooley 
Cc: ~postmarketos/upstream...@lists.sr.ht
Cc: phone-de...@vger.kernel.org
Cc: linux-arm-...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luca Weiss 

Changes in v2:
- Move disabled as last property for pm7250b_vbus
- Update USB graph to newer version, connect both HS and SS signals
- Update FP4 Type-C properties, try to keep phone charging intact by
  disabling USB PD for now
- Pick up tags
- Drop patches that landed in linux-next already
- Link to v1: 
https://lore.kernel.org/r/20240322-fp4-tcpm-v1-0-c5644099d...@fairphone.com

---
Luca Weiss (3):
  arm64: dts: qcom: pm7250b: Add node for PMIC VBUS booster
  arm64: dts: qcom: pm7250b: Add a TCPM description
  arm64: dts: qcom: sm7225-fairphone-fp4: Enable USB role switching

 arch/arm64/boot/dts/qcom/pm7250b.dtsi | 45 +
 arch/arm64/boot/dts/qcom/sm6350.dtsi  | 47 ++
 arch/arm64/boot/dts/qcom/sm7225-fairphone-fp4.dts | 60 ++-
 3 files changed, 151 insertions(+), 1 deletion(-)
---
base-commit: f3583a292140e0a2a2ca0ae0019108401b4c9158
change-id: 20240322-fp4-tcpm-2ad68ef55346

Best regards,
-- 
Luca Weiss 




Re: [PATCH v4 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-03-29 Thread Arnaud POULIQUEN



On 3/27/24 18:14, Mathieu Poirier wrote:
> On Tue, Mar 26, 2024 at 08:31:33PM +0100, Arnaud POULIQUEN wrote:
>>
>>
>> On 3/25/24 17:51, Mathieu Poirier wrote:
>>> On Fri, Mar 08, 2024 at 03:47:08PM +0100, Arnaud Pouliquen wrote:
 The new TEE remoteproc device is used to manage remote firmware in a
 secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
 introduced to delegate the loading of the firmware to the trusted
 execution context. In such cases, the firmware should be signed and
 adhere to the image format defined by the TEE.

 Signed-off-by: Arnaud Pouliquen 
 ---
 Updates from V3:
 - remove support of the attach use case. Will be addressed in a separate
   thread,
 - add st_rproc_tee_ops::parse_fw ops,
 - inverse call of devm_rproc_alloc()and tee_rproc_register() to manage 
 cross
   reference between the rproc struct and the tee_rproc struct in 
 tee_rproc.c.
 ---
  drivers/remoteproc/stm32_rproc.c | 60 +---
  1 file changed, 56 insertions(+), 4 deletions(-)

 diff --git a/drivers/remoteproc/stm32_rproc.c 
 b/drivers/remoteproc/stm32_rproc.c
 index 8cd838df4e92..13df33c78aa2 100644
 --- a/drivers/remoteproc/stm32_rproc.c
 +++ b/drivers/remoteproc/stm32_rproc.c
 @@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
 +#include 
  #include 
  
  #include "remoteproc_internal.h"
 @@ -49,6 +50,9 @@
  #define M4_STATE_STANDBY  4
  #define M4_STATE_CRASH5
  
 +/* Remote processor unique identifier aligned with the Trusted Execution 
 Environment definitions */
>>>
>>> Why is this the case?  At least from the kernel side it is possible to call
>>> tee_rproc_register() with any kind of value, why is there a need to be any
>>> kind of alignment with the TEE?
>>
>>
>> The use of the proc_id is to identify a processor in case of multi 
>> co-processors.
>>
> 
> That is well understood.
> 
>> For instance we can have a system with A DSP and a modem. We would use the 
>> same
>> TEE service, but
> 
> That too.
> 
>> the TEE driver will probably be different, same for the signature key.
> 
> What TEE driver are we talking about here?

In OP-TEE remoteproc frameork is divided in 2 or  3 layers:

- the remoteproc Trusted Application (TA) [1] which is platform agnostic
- The remoteproc Pseudo Trusted Application (PTA) [2] which is platform
dependent and can rely on the proc ID to retrieve the context.
- the remoteproc driver (optional for some platforms) [3], which is in charge
 of DT parsing and platform configuration.

Here TEE driver can be interpreted by remote PTA and/or platform driver.

[1]
https://elixir.bootlin.com/op-tee/latest/source/ta/remoteproc/src/remoteproc_core.c
[2]
https://elixir.bootlin.com/op-tee/latest/source/core/pta/stm32mp/remoteproc_pta.c
[3]
https://elixir.bootlin.com/op-tee/latest/source/core/drivers/remoteproc/stm32_remoteproc.c

> 
>> In such case the proc ID allows to identify the the processor you want to 
>> address.
>>
> 
> That too is well understood, but there is no alignment needed with the TEE, 
> i.e
> the TEE application is not expecting a value of '0'.  We could set
> STM32_MP1_M4_PROC_ID to 0xDEADBEEF and things would work.  This driver won't 
> go
> anywhere for as long as it is not the case.


Here I suppose that you do not challenge the rproc_ID use in general, but for
the stm32mp1 platform as we have only one remote processor. I'm right?

In OP-TEE the check is done here:
https://elixir.bootlin.com/op-tee/latest/source/core/drivers/remoteproc/stm32_remoteproc.c#L64

If driver does not register the proc ID an error is returned indicating that the
feature is not supported.

In case of stm32mp1 yes we could consider it as useless as we have only one
remote proc.

Nevertheless I can not guaranty that a customer will not add an external
companion processor that uses OP-TEE to authenticate the associated firmware. As
the trusted Application is the unique entry point. he will need the proc_id to
identify the target at PTA level.

So from my point of view having a proc ID on stm32MP1 (and on stm32mp2 that will
reuse same driver) aligned between Linux and OP-TEE is useful.

That said using a TEE_REMOTEPROC_DEFAULT_ID is something that could be
more generic (for linux and OP-TEE). This ID could be reuse in the stm32mp
driver and platform drivers with an unique internal remote processor.

It that solution would be ok for you?

Regards,
Arnaud


> 
>>
>>>
 +#define STM32_MP1_M4_PROC_ID0
 +
  struct stm32_syscon {
struct regmap *map;
u32 reg;
 @@ -257,6 +261,19 @@ static int stm32_rproc_release(struct rproc *rproc)
return 0;
  }
  
 +static int stm32_rproc_tee_stop(struct rproc *rproc)
 +{
 +  int err;
 +
 +  stm32_rproc_request_shutdown(rproc);
 +
 +  err = tee_rproc_stop(rproc);
 +  if 

Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 11:23 AM Jason Xing  wrote:
>
> On Fri, Mar 29, 2024 at 5:07 PM Eric Dumazet  wrote:
> >
> > On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  
> > wrote:
> > >
> > > From: Jason Xing 
> > >
> > > Prior to this patch, what we can see by enabling trace_tcp_send is
> > > only happening under two circumstances:
> > > 1) active rst mode
> > > 2) non-active rst mode and based on the full socket
> > >
> > > That means the inconsistency occurs if we use tcpdump and trace
> > > simultaneously to see how rst happens.
> > >
> > > It's necessary that we should take into other cases into considerations,
> > > say:
> > > 1) time-wait socket
> > > 2) no socket
> > > ...
> > >
> > > By parsing the incoming skb and reversing its 4-tuple can
> > > we know the exact 'flow' which might not exist.
> > >
> > > Samples after applied this patch:
> > > 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> > > state=TCP_ESTABLISHED
> > > 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> > > state=UNKNOWN
> > > Note:
> > > 1) UNKNOWN means we cannot extract the right information from skb.
> > > 2) skbaddr/skaddr could be 0
> > >
> > > Signed-off-by: Jason Xing 
> > > ---
> > >  include/trace/events/tcp.h | 39 --
> > >  net/ipv4/tcp_ipv4.c|  4 ++--
> > >  net/ipv6/tcp_ipv6.c|  3 ++-
> > >  3 files changed, 41 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > > index 194425f69642..289438c54227 100644
> > > --- a/include/trace/events/tcp.h
> > > +++ b/include/trace/events/tcp.h
> > > @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
> > >   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
> > >   * active reset, skb should be NULL
> > >   */
> > > -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> > > +TRACE_EVENT(tcp_send_reset,
> > >
> > > TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> > >
> > > -   TP_ARGS(sk, skb)
> > > +   TP_ARGS(sk, skb),
> > > +
> > > +   TP_STRUCT__entry(
> > > +   __field(const void *, skbaddr)
> > > +   __field(const void *, skaddr)
> > > +   __field(int, state)
> > > +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> > > +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> > > +   ),
> > > +
> > > +   TP_fast_assign(
> > > +   __entry->skbaddr = skb;
> > > +   __entry->skaddr = sk;
> > > +   /* Zero means unknown state. */
> > > +   __entry->state = sk ? sk->sk_state : 0;
> > > +
> > > +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > > +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > > +
> > > +   if (sk && sk_fullsock(sk)) {
> > > +   const struct inet_sock *inet = inet_sk(sk);
> > > +
> > > +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> > > +   } else {
> >
> > To be on the safe side, I would test if (skb) here.
> > We have one caller with skb == NULL, we might have more in the future.
>
> Thanks for the review.
>
> How about changing '} else {' to '} else if (skb) {', then if we go
> into this else-if branch, we will print nothing, right? I'll test it
> in this case.

Right, the fields are cleared before this else

+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));

>
> >
> > > +   /*
> > > +* We should reverse the 4-tuple of skb, so later
> > > +* it can print the right flow direction of rst.
> > > +*/
> > > +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> > > entry->saddr);
> > > +   }
> > > +   ),
> > > +
> > > +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> > > + __entry->skbaddr, __entry->skaddr,
> > > + __entry->saddr, __entry->daddr,
> > > + __entry->state ? show_tcp_state_name(__entry->state) : 
> > > "UNKNOWN")
> > >  );
> > >
> > >  /*
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index a22ee5838751..d5c4a969c066 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> > > @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock 
> > > *sk, struct sk_buff *skb)
> > >  */
> > > if (sk) {
> > > arg.bound_dev_if = sk->sk_bound_dev_if;
> > > -   if (sk_fullsock(sk))
> > > -   trace_tcp_send_reset(sk, skb);
> > > }
> >
> > Remove the { } ?
>
> Yes, I forgot to remove them.

No problem.



Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-03-29 Thread Michael S. Tsirkin
On Fri, Mar 29, 2024 at 06:39:33PM +0800, Jason Wang wrote:
> On Fri, Mar 29, 2024 at 5:13 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Mar 27, 2024 at 05:08:57PM +0800, Jason Wang wrote:
> > > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > > From: Rong Wang 
> > > > >
> > > > > Once enable iommu domain for one device, the MSI
> > > > > translation tables have to be there for software-managed MSI.
> > > > > Otherwise, platform with software-managed MSI without an
> > > > > irq bypass function, can not get a correct memory write event
> > > > > from pcie, will not get irqs.
> > > > > The solution is to obtain the MSI phy base address from
> > > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > > then translation tables will be created while request irq.
> > > > >
> > > > > Change log
> > > > > --
> > > > >
> > > > > v1->v2:
> > > > > - add resv iotlb to avoid overlap mapping.
> > > > > v2->v3:
> > > > > - there is no need to export the iommu symbol anymore.
> > > > >
> > > > > Signed-off-by: Rong Wang 
> > > >
> > > > There's in interest to keep extending vhost iotlb -
> > > > we should just switch over to iommufd which supports
> > > > this already.
> > >
> > > IOMMUFD is good but VFIO supports this before IOMMUFD.
> >
> > You mean VFIO migrated to IOMMUFD but of course they keep supporting
> > their old UAPI?
> 
> I meant VFIO support software managed MSI before IOMMUFD.

And then they switched over and stopped adding new IOMMU
related features. And so should vdpa?


> > OK and point being?
> >
> > > This patch
> > > makes vDPA run without a backporting of full IOMMUFD in the production
> > > environment. I think it's worth.
> >
> > Where do we stop? saying no to features is the only tool maintainers
> > have to make cleanups happen, otherwise people will just keep piling
> > stuff up.
> 
> I think we should not have more features than VFIO without IOMMUFD.
> 
> Thanks
> 
> >
> > > If you worry about the extension, we can just use the vhost iotlb
> > > existing facility to do this.
> > >
> > > Thanks
> > >
> > > >
> > > > > ---
> > > > >  drivers/vhost/vdpa.c | 59 
> > > > > +---
> > > > >  1 file changed, 56 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > index ba52d128aeb7..28b56b10372b 100644
> > > > > --- a/drivers/vhost/vdpa.c
> > > > > +++ b/drivers/vhost/vdpa.c
> > > > > @@ -49,6 +49,7 @@ struct vhost_vdpa {
> > > > >   struct completion completion;
> > > > >   struct vdpa_device *vdpa;
> > > > >   struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
> > > > > + struct vhost_iotlb resv_iotlb;
> > > > >   struct device dev;
> > > > >   struct cdev cdev;
> > > > >   atomic_t opened;
> > > > > @@ -247,6 +248,7 @@ static int _compat_vdpa_reset(struct vhost_vdpa 
> > > > > *v)
> > > > >  static int vhost_vdpa_reset(struct vhost_vdpa *v)
> > > > >  {
> > > > >   v->in_batch = 0;
> > > > > + vhost_iotlb_reset(>resv_iotlb);
> > > > >   return _compat_vdpa_reset(v);
> > > > >  }
> > > > >
> > > > > @@ -1219,10 +1221,15 @@ static int 
> > > > > vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > > > >   msg->iova + msg->size - 1 > v->range.last)
> > > > >   return -EINVAL;
> > > > >
> > > > > + if (vhost_iotlb_itree_first(>resv_iotlb, msg->iova,
> > > > > + msg->iova + msg->size - 1))
> > > > > + return -EINVAL;
> > > > > +
> > > > >   if (vhost_iotlb_itree_first(iotlb, msg->iova,
> > > > >   msg->iova + msg->size - 1))
> > > > >   return -EEXIST;
> > > > >
> > > > > +
> > > > >   if (vdpa->use_va)
> > > > >   return vhost_vdpa_va_map(v, iotlb, msg->iova, msg->size,
> > > > >msg->uaddr, msg->perm);
> > > > > @@ -1307,6 +1314,45 @@ static ssize_t 
> > > > > vhost_vdpa_chr_write_iter(struct kiocb *iocb,
> > > > >   return vhost_chr_write_iter(dev, from);
> > > > >  }
> > > > >
> > > > > +static int vhost_vdpa_resv_iommu_region(struct iommu_domain *domain, 
> > > > > struct device *dma_dev,
> > > > > + struct vhost_iotlb *resv_iotlb)
> > > > > +{
> > > > > + struct list_head dev_resv_regions;
> > > > > + phys_addr_t resv_msi_base = 0;
> > > > > + struct iommu_resv_region *region;
> > > > > + int ret = 0;
> > > > > + bool with_sw_msi = false;
> > > > > + bool with_hw_msi = false;
> > > > > +
> > > > > + INIT_LIST_HEAD(_resv_regions);
> > > > > + iommu_get_resv_regions(dma_dev, _resv_regions);
> > > > > +
> > > > > + list_for_each_entry(region, _resv_regions, list) {
> > > > > + ret = vhost_iotlb_add_range_ctx(resv_iotlb, 
> > > > > region->start,
> > > > > + region->start + 

Re: [PATCH net-next v3 3/3] tcp: add location into reset trace process

2024-03-29 Thread Jason Xing
On Fri, Mar 29, 2024 at 5:13 PM Eric Dumazet  wrote:
>
> On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
> >
> > From: Jason Xing 
> >
> > In addition to knowing the 4-tuple of the flow which generates RST,
> > the reason why it does so is very important because we have some
> > cases where the RST should be sent and have no clue which one
> > exactly.
> >
> > Adding location of reset process can help us more, like what
> > trace_kfree_skb does.
>
> Well, I would prefer a drop_reason here, even if there is no 'dropped' packet.

Good idea really. Then we can accurately diagnose which kind of reason
exactly causes the RST behavior.

I'm not sure if we can reuse the drop_reason here, like adding/using
some reasons in enum skb_drop_reason {}? The name is a little bit
strange.

Oh, I can just print the string of reason directly instead of really
using enum skb_drop_reason {}...

>
> This would be more stable than something based on function names that
> could be changed.
>
> tracepoints do not have to get ugly, we can easily get stack traces if needed.
>
> perf record -a -g  -e tcp:tcp_send_reset ...

Ah, yes, I blindly mimic what trace_skb_kfree() and
trace_consume_skb() do. Introducing some RST reasons is more
reasonable and easier to detect since it's not hard to add four or
five reasons only.

Thanks,
Jason



Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-03-29 Thread Jason Wang
On Fri, Mar 29, 2024 at 5:13 PM Michael S. Tsirkin  wrote:
>
> On Fri, Mar 29, 2024 at 11:55:50AM +0800, Jason Wang wrote:
> > On Wed, Mar 27, 2024 at 5:08 PM Jason Wang  wrote:
> > >
> > > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > > From: Rong Wang 
> > > > >
> > > > > Once enable iommu domain for one device, the MSI
> > > > > translation tables have to be there for software-managed MSI.
> > > > > Otherwise, platform with software-managed MSI without an
> > > > > irq bypass function, can not get a correct memory write event
> > > > > from pcie, will not get irqs.
> > > > > The solution is to obtain the MSI phy base address from
> > > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > > then translation tables will be created while request irq.
> > > > >
> > > > > Change log
> > > > > --
> > > > >
> > > > > v1->v2:
> > > > > - add resv iotlb to avoid overlap mapping.
> > > > > v2->v3:
> > > > > - there is no need to export the iommu symbol anymore.
> > > > >
> > > > > Signed-off-by: Rong Wang 
> > > >
> > > > There's in interest to keep extending vhost iotlb -
> > > > we should just switch over to iommufd which supports
> > > > this already.
> > >
> > > IOMMUFD is good but VFIO supports this before IOMMUFD. This patch
> > > makes vDPA run without a backporting of full IOMMUFD in the production
> > > environment. I think it's worth.
> > >
> > > If you worry about the extension, we can just use the vhost iotlb
> > > existing facility to do this.
> > >
> > > Thanks
> >
> > Btw, Wang Rong,
> >
> > It looks that Cindy does have the bandwidth in working for IOMMUFD support.
>
> I think you mean she does not.

Yes, you are right.

Thanks

>
> > Do you have the will to do that?
> >
> > Thanks
>




Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-03-29 Thread Jason Wang
On Fri, Mar 29, 2024 at 5:13 PM Michael S. Tsirkin  wrote:
>
> On Wed, Mar 27, 2024 at 05:08:57PM +0800, Jason Wang wrote:
> > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > From: Rong Wang 
> > > >
> > > > Once enable iommu domain for one device, the MSI
> > > > translation tables have to be there for software-managed MSI.
> > > > Otherwise, platform with software-managed MSI without an
> > > > irq bypass function, can not get a correct memory write event
> > > > from pcie, will not get irqs.
> > > > The solution is to obtain the MSI phy base address from
> > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > then translation tables will be created while request irq.
> > > >
> > > > Change log
> > > > --
> > > >
> > > > v1->v2:
> > > > - add resv iotlb to avoid overlap mapping.
> > > > v2->v3:
> > > > - there is no need to export the iommu symbol anymore.
> > > >
> > > > Signed-off-by: Rong Wang 
> > >
> > > There's in interest to keep extending vhost iotlb -
> > > we should just switch over to iommufd which supports
> > > this already.
> >
> > IOMMUFD is good but VFIO supports this before IOMMUFD.
>
> You mean VFIO migrated to IOMMUFD but of course they keep supporting
> their old UAPI?

I meant VFIO support software managed MSI before IOMMUFD.

> OK and point being?
>
> > This patch
> > makes vDPA run without a backporting of full IOMMUFD in the production
> > environment. I think it's worth.
>
> Where do we stop? saying no to features is the only tool maintainers
> have to make cleanups happen, otherwise people will just keep piling
> stuff up.

I think we should not have more features than VFIO without IOMMUFD.

Thanks

>
> > If you worry about the extension, we can just use the vhost iotlb
> > existing facility to do this.
> >
> > Thanks
> >
> > >
> > > > ---
> > > >  drivers/vhost/vdpa.c | 59 +---
> > > >  1 file changed, 56 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > index ba52d128aeb7..28b56b10372b 100644
> > > > --- a/drivers/vhost/vdpa.c
> > > > +++ b/drivers/vhost/vdpa.c
> > > > @@ -49,6 +49,7 @@ struct vhost_vdpa {
> > > >   struct completion completion;
> > > >   struct vdpa_device *vdpa;
> > > >   struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
> > > > + struct vhost_iotlb resv_iotlb;
> > > >   struct device dev;
> > > >   struct cdev cdev;
> > > >   atomic_t opened;
> > > > @@ -247,6 +248,7 @@ static int _compat_vdpa_reset(struct vhost_vdpa *v)
> > > >  static int vhost_vdpa_reset(struct vhost_vdpa *v)
> > > >  {
> > > >   v->in_batch = 0;
> > > > + vhost_iotlb_reset(>resv_iotlb);
> > > >   return _compat_vdpa_reset(v);
> > > >  }
> > > >
> > > > @@ -1219,10 +1221,15 @@ static int 
> > > > vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > > >   msg->iova + msg->size - 1 > v->range.last)
> > > >   return -EINVAL;
> > > >
> > > > + if (vhost_iotlb_itree_first(>resv_iotlb, msg->iova,
> > > > + msg->iova + msg->size - 1))
> > > > + return -EINVAL;
> > > > +
> > > >   if (vhost_iotlb_itree_first(iotlb, msg->iova,
> > > >   msg->iova + msg->size - 1))
> > > >   return -EEXIST;
> > > >
> > > > +
> > > >   if (vdpa->use_va)
> > > >   return vhost_vdpa_va_map(v, iotlb, msg->iova, msg->size,
> > > >msg->uaddr, msg->perm);
> > > > @@ -1307,6 +1314,45 @@ static ssize_t vhost_vdpa_chr_write_iter(struct 
> > > > kiocb *iocb,
> > > >   return vhost_chr_write_iter(dev, from);
> > > >  }
> > > >
> > > > +static int vhost_vdpa_resv_iommu_region(struct iommu_domain *domain, 
> > > > struct device *dma_dev,
> > > > + struct vhost_iotlb *resv_iotlb)
> > > > +{
> > > > + struct list_head dev_resv_regions;
> > > > + phys_addr_t resv_msi_base = 0;
> > > > + struct iommu_resv_region *region;
> > > > + int ret = 0;
> > > > + bool with_sw_msi = false;
> > > > + bool with_hw_msi = false;
> > > > +
> > > > + INIT_LIST_HEAD(_resv_regions);
> > > > + iommu_get_resv_regions(dma_dev, _resv_regions);
> > > > +
> > > > + list_for_each_entry(region, _resv_regions, list) {
> > > > + ret = vhost_iotlb_add_range_ctx(resv_iotlb, region->start,
> > > > + region->start + region->length - 1,
> > > > + 0, 0, NULL);
> > > > + if (ret) {
> > > > + vhost_iotlb_reset(resv_iotlb);
> > > > + break;
> > > > + }
> > > > +
> > > > + if (region->type == IOMMU_RESV_MSI)
> > > > + with_hw_msi = true;
> > > > +
> > > > + if (region->type == IOMMU_RESV_SW_MSI) {
> > > > + 

Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Jason Xing
On Fri, Mar 29, 2024 at 5:07 PM Eric Dumazet  wrote:
>
> On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
> >
> > From: Jason Xing 
> >
> > Prior to this patch, what we can see by enabling trace_tcp_send is
> > only happening under two circumstances:
> > 1) active rst mode
> > 2) non-active rst mode and based on the full socket
> >
> > That means the inconsistency occurs if we use tcpdump and trace
> > simultaneously to see how rst happens.
> >
> > It's necessary that we should take into other cases into considerations,
> > say:
> > 1) time-wait socket
> > 2) no socket
> > ...
> >
> > By parsing the incoming skb and reversing its 4-tuple can
> > we know the exact 'flow' which might not exist.
> >
> > Samples after applied this patch:
> > 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> > state=TCP_ESTABLISHED
> > 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> > state=UNKNOWN
> > Note:
> > 1) UNKNOWN means we cannot extract the right information from skb.
> > 2) skbaddr/skaddr could be 0
> >
> > Signed-off-by: Jason Xing 
> > ---
> >  include/trace/events/tcp.h | 39 --
> >  net/ipv4/tcp_ipv4.c|  4 ++--
> >  net/ipv6/tcp_ipv6.c|  3 ++-
> >  3 files changed, 41 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> > index 194425f69642..289438c54227 100644
> > --- a/include/trace/events/tcp.h
> > +++ b/include/trace/events/tcp.h
> > @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
> >   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
> >   * active reset, skb should be NULL
> >   */
> > -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> > +TRACE_EVENT(tcp_send_reset,
> >
> > TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
> >
> > -   TP_ARGS(sk, skb)
> > +   TP_ARGS(sk, skb),
> > +
> > +   TP_STRUCT__entry(
> > +   __field(const void *, skbaddr)
> > +   __field(const void *, skaddr)
> > +   __field(int, state)
> > +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> > +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __entry->skbaddr = skb;
> > +   __entry->skaddr = sk;
> > +   /* Zero means unknown state. */
> > +   __entry->state = sk ? sk->sk_state : 0;
> > +
> > +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > +
> > +   if (sk && sk_fullsock(sk)) {
> > +   const struct inet_sock *inet = inet_sk(sk);
> > +
> > +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> > +   } else {
>
> To be on the safe side, I would test if (skb) here.
> We have one caller with skb == NULL, we might have more in the future.

Thanks for the review.

How about changing '} else {' to '} else if (skb) {', then if we go
into this else-if branch, we will print nothing, right? I'll test it
in this case.

>
> > +   /*
> > +* We should reverse the 4-tuple of skb, so later
> > +* it can print the right flow direction of rst.
> > +*/
> > +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> > entry->saddr);
> > +   }
> > +   ),
> > +
> > +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> > + __entry->skbaddr, __entry->skaddr,
> > + __entry->saddr, __entry->daddr,
> > + __entry->state ? show_tcp_state_name(__entry->state) : 
> > "UNKNOWN")
> >  );
> >
> >  /*
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index a22ee5838751..d5c4a969c066 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock *sk, 
> > struct sk_buff *skb)
> >  */
> > if (sk) {
> > arg.bound_dev_if = sk->sk_bound_dev_if;
> > -   if (sk_fullsock(sk))
> > -   trace_tcp_send_reset(sk, skb);
> > }
>
> Remove the { } ?

Yes, I forgot to remove them.

Thanks,
Jason

>
>
> >
> > +   trace_tcp_send_reset(sk, skb);
> > +
> > BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
> >  offsetof(struct inet_timewait_sock, tw_bound_dev_if));
> >
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index 3f4cba49e9ee..8e9c59b6c00c 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -1113,7 +1113,6 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> > struct sk_buff *skb)
> > if (sk) {
> > oif = sk->sk_bound_dev_if;
> > if (sk_fullsock(sk)) {
> > -

Re: [PATCH v2] Documentation: Add reconnect process for VDUSE

2024-03-29 Thread Michael S. Tsirkin
On Fri, Mar 29, 2024 at 05:38:25PM +0800, Cindy Lu wrote:
> Add a document explaining the reconnect process, including what the
> Userspace App needs to do and how it works with the kernel.
> 
> Signed-off-by: Cindy Lu 
> ---
>  Documentation/userspace-api/vduse.rst | 41 +++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/Documentation/userspace-api/vduse.rst 
> b/Documentation/userspace-api/vduse.rst
> index bdb880e01132..f903aed714d1 100644
> --- a/Documentation/userspace-api/vduse.rst
> +++ b/Documentation/userspace-api/vduse.rst
> @@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
> after the used ring is filled.
>  
>  For more details on the uAPI, please see include/uapi/linux/vduse.h.
> +
> +HOW VDUSE devices reconnectoin works

typo

> +
> +1. What is reconnection?
> +
> +   When the userspace application loads, it should establish a connection
> +   to the vduse kernel device. Sometimes,the userspace application exists,
> +   and we want to support its restart and connect to the kernel device again
> +
> +2. How can I support reconnection in a userspace application?
> +
> +2.1 During initialization, the userspace application should first verify the
> +existence of the device "/dev/vduse/vduse_name".
> +If it doesn't exist, it means this is the first-time for connection. 
> goto step 2.2
> +If it exists, it means this is a reconnection, and we should goto step 
> 2.3
> +
> +2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
> +/dev/vduse/control.
> +When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
> +the reconnect information. The total memory size is PAGE_SIZE*vq_mumber.
> +
> +2.3 Check if the information is suitable for reconnect
> +If this is reconnection :
> +Before attempting to reconnect, The userspace application needs to use 
> the
> +ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
> VDUSE_DEV_GET_FEATURES...)
> +to get the information from kernel.
> +Please review the information and confirm if it is suitable to reconnect.
> +
> +2.4 Userspace application needs to mmap the memory to userspace
> +The userspace application requires mapping one page for every vq. These 
> pages
> +should be used to save vq-related information during system running. 
> Additionally,
> +the application must define its own structure to store information for 
> reconnection.
> +
> +2.5 Completed the initialization and running the application.
> +While the application is running, it is important to store relevant 
> information
> +about reconnections in mapped pages. When calling the ioctl 
> VDUSE_VQ_GET_INFO to
> +get vq information, it's necessary to check whether it's a reconnection. 
> If it is
> +a reconnection, the vq-related information must be get from the mapped 
> pages.
> +


I don't get it. So this is just a way for the application to allocate
memory? Why do we need this new way to do it?
Why not just mmap a file anywhere at all?


> +2.6 When the Userspace application exits, it is necessary to unmap all the
> +pages for reconnection
> -- 
> 2.43.0




[PATCH v2] Documentation: Add reconnect process for VDUSE

2024-03-29 Thread Cindy Lu
Add a document explaining the reconnect process, including what the
Userspace App needs to do and how it works with the kernel.

Signed-off-by: Cindy Lu 
---
 Documentation/userspace-api/vduse.rst | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/userspace-api/vduse.rst 
b/Documentation/userspace-api/vduse.rst
index bdb880e01132..f903aed714d1 100644
--- a/Documentation/userspace-api/vduse.rst
+++ b/Documentation/userspace-api/vduse.rst
@@ -231,3 +231,44 @@ able to start the dataplane processing as follows:
after the used ring is filled.
 
 For more details on the uAPI, please see include/uapi/linux/vduse.h.
+
+HOW VDUSE devices reconnectoin works
+
+1. What is reconnection?
+
+   When the userspace application loads, it should establish a connection
+   to the vduse kernel device. Sometimes,the userspace application exists,
+   and we want to support its restart and connect to the kernel device again
+
+2. How can I support reconnection in a userspace application?
+
+2.1 During initialization, the userspace application should first verify the
+existence of the device "/dev/vduse/vduse_name".
+If it doesn't exist, it means this is the first-time for connection. goto 
step 2.2
+If it exists, it means this is a reconnection, and we should goto step 2.3
+
+2.2 Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
+/dev/vduse/control.
+When ioctl(VDUSE_CREATE_DEV) is called, kernel allocates memory for
+the reconnect information. The total memory size is PAGE_SIZE*vq_mumber.
+
+2.3 Check if the information is suitable for reconnect
+If this is reconnection :
+Before attempting to reconnect, The userspace application needs to use the
+ioctl(VDUSE_DEV_GET_CONFIG, VDUSE_DEV_GET_STATUS, 
VDUSE_DEV_GET_FEATURES...)
+to get the information from kernel.
+Please review the information and confirm if it is suitable to reconnect.
+
+2.4 Userspace application needs to mmap the memory to userspace
+The userspace application requires mapping one page for every vq. These 
pages
+should be used to save vq-related information during system running. 
Additionally,
+the application must define its own structure to store information for 
reconnection.
+
+2.5 Completed the initialization and running the application.
+While the application is running, it is important to store relevant 
information
+about reconnections in mapped pages. When calling the ioctl 
VDUSE_VQ_GET_INFO to
+get vq information, it's necessary to check whether it's a reconnection. 
If it is
+a reconnection, the vq-related information must be get from the mapped 
pages.
+
+2.6 When the Userspace application exits, it is necessary to unmap all the
+pages for reconnection
-- 
2.43.0




Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-03-29 Thread Michael S. Tsirkin
On Fri, Mar 29, 2024 at 11:55:50AM +0800, Jason Wang wrote:
> On Wed, Mar 27, 2024 at 5:08 PM Jason Wang  wrote:
> >
> > On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > > From: Rong Wang 
> > > >
> > > > Once enable iommu domain for one device, the MSI
> > > > translation tables have to be there for software-managed MSI.
> > > > Otherwise, platform with software-managed MSI without an
> > > > irq bypass function, can not get a correct memory write event
> > > > from pcie, will not get irqs.
> > > > The solution is to obtain the MSI phy base address from
> > > > iommu reserved region, and set it to iommu MSI cookie,
> > > > then translation tables will be created while request irq.
> > > >
> > > > Change log
> > > > --
> > > >
> > > > v1->v2:
> > > > - add resv iotlb to avoid overlap mapping.
> > > > v2->v3:
> > > > - there is no need to export the iommu symbol anymore.
> > > >
> > > > Signed-off-by: Rong Wang 
> > >
> > > There's in interest to keep extending vhost iotlb -
> > > we should just switch over to iommufd which supports
> > > this already.
> >
> > IOMMUFD is good but VFIO supports this before IOMMUFD. This patch
> > makes vDPA run without a backporting of full IOMMUFD in the production
> > environment. I think it's worth.
> >
> > If you worry about the extension, we can just use the vhost iotlb
> > existing facility to do this.
> >
> > Thanks
> 
> Btw, Wang Rong,
> 
> It looks that Cindy does have the bandwidth in working for IOMMUFD support.

I think you mean she does not.

> Do you have the will to do that?
> 
> Thanks




Re: [PATCH v3] vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI

2024-03-29 Thread Michael S. Tsirkin
On Wed, Mar 27, 2024 at 05:08:57PM +0800, Jason Wang wrote:
> On Thu, Mar 21, 2024 at 3:00 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Mar 20, 2024 at 06:19:12PM +0800, Wang Rong wrote:
> > > From: Rong Wang 
> > >
> > > Once enable iommu domain for one device, the MSI
> > > translation tables have to be there for software-managed MSI.
> > > Otherwise, platform with software-managed MSI without an
> > > irq bypass function, can not get a correct memory write event
> > > from pcie, will not get irqs.
> > > The solution is to obtain the MSI phy base address from
> > > iommu reserved region, and set it to iommu MSI cookie,
> > > then translation tables will be created while request irq.
> > >
> > > Change log
> > > --
> > >
> > > v1->v2:
> > > - add resv iotlb to avoid overlap mapping.
> > > v2->v3:
> > > - there is no need to export the iommu symbol anymore.
> > >
> > > Signed-off-by: Rong Wang 
> >
> > There's in interest to keep extending vhost iotlb -
> > we should just switch over to iommufd which supports
> > this already.
> 
> IOMMUFD is good but VFIO supports this before IOMMUFD.

You mean VFIO migrated to IOMMUFD but of course they keep supporting
their old UAPI? OK and point being?

> This patch
> makes vDPA run without a backporting of full IOMMUFD in the production
> environment. I think it's worth.

Where do we stop? saying no to features is the only tool maintainers
have to make cleanups happen, otherwise people will just keep piling
stuff up.

> If you worry about the extension, we can just use the vhost iotlb
> existing facility to do this.
> 
> Thanks
> 
> >
> > > ---
> > >  drivers/vhost/vdpa.c | 59 +---
> > >  1 file changed, 56 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index ba52d128aeb7..28b56b10372b 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -49,6 +49,7 @@ struct vhost_vdpa {
> > >   struct completion completion;
> > >   struct vdpa_device *vdpa;
> > >   struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS];
> > > + struct vhost_iotlb resv_iotlb;
> > >   struct device dev;
> > >   struct cdev cdev;
> > >   atomic_t opened;
> > > @@ -247,6 +248,7 @@ static int _compat_vdpa_reset(struct vhost_vdpa *v)
> > >  static int vhost_vdpa_reset(struct vhost_vdpa *v)
> > >  {
> > >   v->in_batch = 0;
> > > + vhost_iotlb_reset(>resv_iotlb);
> > >   return _compat_vdpa_reset(v);
> > >  }
> > >
> > > @@ -1219,10 +1221,15 @@ static int vhost_vdpa_process_iotlb_update(struct 
> > > vhost_vdpa *v,
> > >   msg->iova + msg->size - 1 > v->range.last)
> > >   return -EINVAL;
> > >
> > > + if (vhost_iotlb_itree_first(>resv_iotlb, msg->iova,
> > > + msg->iova + msg->size - 1))
> > > + return -EINVAL;
> > > +
> > >   if (vhost_iotlb_itree_first(iotlb, msg->iova,
> > >   msg->iova + msg->size - 1))
> > >   return -EEXIST;
> > >
> > > +
> > >   if (vdpa->use_va)
> > >   return vhost_vdpa_va_map(v, iotlb, msg->iova, msg->size,
> > >msg->uaddr, msg->perm);
> > > @@ -1307,6 +1314,45 @@ static ssize_t vhost_vdpa_chr_write_iter(struct 
> > > kiocb *iocb,
> > >   return vhost_chr_write_iter(dev, from);
> > >  }
> > >
> > > +static int vhost_vdpa_resv_iommu_region(struct iommu_domain *domain, 
> > > struct device *dma_dev,
> > > + struct vhost_iotlb *resv_iotlb)
> > > +{
> > > + struct list_head dev_resv_regions;
> > > + phys_addr_t resv_msi_base = 0;
> > > + struct iommu_resv_region *region;
> > > + int ret = 0;
> > > + bool with_sw_msi = false;
> > > + bool with_hw_msi = false;
> > > +
> > > + INIT_LIST_HEAD(_resv_regions);
> > > + iommu_get_resv_regions(dma_dev, _resv_regions);
> > > +
> > > + list_for_each_entry(region, _resv_regions, list) {
> > > + ret = vhost_iotlb_add_range_ctx(resv_iotlb, region->start,
> > > + region->start + region->length - 1,
> > > + 0, 0, NULL);
> > > + if (ret) {
> > > + vhost_iotlb_reset(resv_iotlb);
> > > + break;
> > > + }
> > > +
> > > + if (region->type == IOMMU_RESV_MSI)
> > > + with_hw_msi = true;
> > > +
> > > + if (region->type == IOMMU_RESV_SW_MSI) {
> > > + resv_msi_base = region->start;
> > > + with_sw_msi = true;
> > > + }
> > > + }
> > > +
> > > + if (!ret && !with_hw_msi && with_sw_msi)
> > > + ret = iommu_get_msi_cookie(domain, resv_msi_base);
> > > +
> > > + iommu_put_resv_regions(dma_dev, _resv_regions);
> > > +
> > > + return ret;
> > > +}
> > > +
> > >  static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
> > >  {
> 

Re: [PATCH net-next v3 3/3] tcp: add location into reset trace process

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> In addition to knowing the 4-tuple of the flow which generates RST,
> the reason why it does so is very important because we have some
> cases where the RST should be sent and have no clue which one
> exactly.
>
> Adding location of reset process can help us more, like what
> trace_kfree_skb does.

Well, I would prefer a drop_reason here, even if there is no 'dropped' packet.

This would be more stable than something based on function names that
could be changed.

tracepoints do not have to get ugly, we can easily get stack traces if needed.

perf record -a -g  -e tcp:tcp_send_reset ...



Re: [PATCH net-next v3 2/3] trace: tcp: fully support trace_tcp_send_reset

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Prior to this patch, what we can see by enabling trace_tcp_send is
> only happening under two circumstances:
> 1) active rst mode
> 2) non-active rst mode and based on the full socket
>
> That means the inconsistency occurs if we use tcpdump and trace
> simultaneously to see how rst happens.
>
> It's necessary that we should take into other cases into considerations,
> say:
> 1) time-wait socket
> 2) no socket
> ...
>
> By parsing the incoming skb and reversing its 4-tuple can
> we know the exact 'flow' which might not exist.
>
> Samples after applied this patch:
> 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
> state=TCP_ESTABLISHED
> 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
> state=UNKNOWN
> Note:
> 1) UNKNOWN means we cannot extract the right information from skb.
> 2) skbaddr/skaddr could be 0
>
> Signed-off-by: Jason Xing 
> ---
>  include/trace/events/tcp.h | 39 --
>  net/ipv4/tcp_ipv4.c|  4 ++--
>  net/ipv6/tcp_ipv6.c|  3 ++-
>  3 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 194425f69642..289438c54227 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -78,11 +78,46 @@ DEFINE_EVENT(tcp_event_sk_skb, tcp_retransmit_skb,
>   * skb of trace_tcp_send_reset is the skb that caused RST. In case of
>   * active reset, skb should be NULL
>   */
> -DEFINE_EVENT(tcp_event_sk_skb, tcp_send_reset,
> +TRACE_EVENT(tcp_send_reset,
>
> TP_PROTO(const struct sock *sk, const struct sk_buff *skb),
>
> -   TP_ARGS(sk, skb)
> +   TP_ARGS(sk, skb),
> +
> +   TP_STRUCT__entry(
> +   __field(const void *, skbaddr)
> +   __field(const void *, skaddr)
> +   __field(int, state)
> +   __array(__u8, saddr, sizeof(struct sockaddr_in6))
> +   __array(__u8, daddr, sizeof(struct sockaddr_in6))
> +   ),
> +
> +   TP_fast_assign(
> +   __entry->skbaddr = skb;
> +   __entry->skaddr = sk;
> +   /* Zero means unknown state. */
> +   __entry->state = sk ? sk->sk_state : 0;
> +
> +   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> +   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> +
> +   if (sk && sk_fullsock(sk)) {
> +   const struct inet_sock *inet = inet_sk(sk);
> +
> +   TP_STORE_ADDR_PORTS(__entry, inet, sk);
> +   } else {

To be on the safe side, I would test if (skb) here.
We have one caller with skb == NULL, we might have more in the future.

> +   /*
> +* We should reverse the 4-tuple of skb, so later
> +* it can print the right flow direction of rst.
> +*/
> +   TP_STORE_ADDR_PORTS_SKB(skb, entry->daddr, 
> entry->saddr);
> +   }
> +   ),
> +
> +   TP_printk("skbaddr=%p skaddr=%p src=%pISpc dest=%pISpc state=%s",
> + __entry->skbaddr, __entry->skaddr,
> + __entry->saddr, __entry->daddr,
> + __entry->state ? show_tcp_state_name(__entry->state) : 
> "UNKNOWN")
>  );
>
>  /*
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a22ee5838751..d5c4a969c066 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -868,10 +868,10 @@ static void tcp_v4_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
>  */
> if (sk) {
> arg.bound_dev_if = sk->sk_bound_dev_if;
> -   if (sk_fullsock(sk))
> -   trace_tcp_send_reset(sk, skb);
> }

Remove the { } ?


>
> +   trace_tcp_send_reset(sk, skb);
> +
> BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) !=
>  offsetof(struct inet_timewait_sock, tw_bound_dev_if));
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 3f4cba49e9ee..8e9c59b6c00c 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1113,7 +1113,6 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
> if (sk) {
> oif = sk->sk_bound_dev_if;
> if (sk_fullsock(sk)) {
> -   trace_tcp_send_reset(sk, skb);
> if (inet6_test_bit(REPFLOW, sk))
> label = ip6_flowlabel(ipv6h);
> priority = READ_ONCE(sk->sk_priority);
> @@ -1129,6 +1128,8 @@ static void tcp_v6_send_reset(const struct sock *sk, 
> struct sk_buff *skb)
> label = ip6_flowlabel(ipv6h);
> }
>
> +   trace_tcp_send_reset(sk, skb);
> +
> tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, 1,
>

Re: [PATCH v4 1/4] remoteproc: Add TEE support

2024-03-29 Thread Arnaud POULIQUEN
Hello Mathieu,

On 3/27/24 18:07, Mathieu Poirier wrote:
> On Tue, Mar 26, 2024 at 08:18:23PM +0100, Arnaud POULIQUEN wrote:
>> Hello Mathieu,
>>
>> On 3/25/24 17:46, Mathieu Poirier wrote:
>>> On Fri, Mar 08, 2024 at 03:47:05PM +0100, Arnaud Pouliquen wrote:
 Add a remoteproc TEE (Trusted Execution Environment) driver
 that will be probed by the TEE bus. If the associated Trusted
 application is supported on secure part this device offers a client
>>>
>>> Device or driver?  I thought I touched on that before.
>>
>> Right, I changed the first instance and missed this one
>>
>>>
 interface to load a firmware in the secure part.
 This firmware could be authenticated by the secure trusted application.

 Signed-off-by: Arnaud Pouliquen 
 ---
 Updates from V3:
 - rework TEE_REMOTEPROC description in Kconfig
 - fix some namings
 - add tee_rproc_parse_fw  to support rproc_ops::parse_fw
 - add proc::tee_interface;
 - add rproc struct as parameter of the tee_rproc_register() function
 ---
  drivers/remoteproc/Kconfig  |  10 +
  drivers/remoteproc/Makefile |   1 +
  drivers/remoteproc/tee_remoteproc.c | 434 
  include/linux/remoteproc.h  |   4 +
  include/linux/tee_remoteproc.h  | 112 +++
  5 files changed, 561 insertions(+)
  create mode 100644 drivers/remoteproc/tee_remoteproc.c
  create mode 100644 include/linux/tee_remoteproc.h

 diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
 index 48845dc8fa85..2cf1431b2b59 100644
 --- a/drivers/remoteproc/Kconfig
 +++ b/drivers/remoteproc/Kconfig
 @@ -365,6 +365,16 @@ config XLNX_R5_REMOTEPROC
  
  It's safe to say N if not interested in using RPU r5f cores.
  
 +
 +config TEE_REMOTEPROC
 +  tristate "remoteproc support by a TEE application"
>>>
>>> s/remoteproc/Remoteproc
>>>
 +  depends on OPTEE
 +  help
 +Support a remote processor with a TEE application. The Trusted
 +Execution Context is responsible for loading the trusted firmware
 +image and managing the remote processor's lifecycle.
 +This can be either built-in or a loadable module.
 +
  endif # REMOTEPROC
  
  endmenu
 diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
 index 91314a9b43ce..fa8daebce277 100644
 --- a/drivers/remoteproc/Makefile
 +++ b/drivers/remoteproc/Makefile
 @@ -36,6 +36,7 @@ obj-$(CONFIG_RCAR_REMOTEPROC)+= rcar_rproc.o
  obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
  obj-$(CONFIG_ST_SLIM_REMOTEPROC)  += st_slim_rproc.o
  obj-$(CONFIG_STM32_RPROC) += stm32_rproc.o
 +obj-$(CONFIG_TEE_REMOTEPROC)  += tee_remoteproc.o
  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)+= ti_k3_dsp_remoteproc.o
  obj-$(CONFIG_TI_K3_R5_REMOTEPROC) += ti_k3_r5_remoteproc.o
  obj-$(CONFIG_XLNX_R5_REMOTEPROC)  += xlnx_r5_remoteproc.o
 diff --git a/drivers/remoteproc/tee_remoteproc.c 
 b/drivers/remoteproc/tee_remoteproc.c
 new file mode 100644
 index ..c855210e52e3
 --- /dev/null
 +++ b/drivers/remoteproc/tee_remoteproc.c
 @@ -0,0 +1,434 @@
 +// SPDX-License-Identifier: GPL-2.0-or-later
 +/*
 + * Copyright (C) STMicroelectronics 2024 - All Rights Reserved
 + * Author: Arnaud Pouliquen 
 + */
 +
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
 +#include "remoteproc_internal.h"
 +
 +#define MAX_TEE_PARAM_ARRY_MEMBER 4
 +
 +/*
 + * Authentication of the firmware and load in the remote processor memory
 + *
 + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
 processor
 + * [in]params[1].memref:  buffer containing the image of the 
 buffer
 + */
 +#define TA_RPROC_FW_CMD_LOAD_FW   1
 +
 +/*
 + * Start the remote processor
 + *
 + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
 processor
 + */
 +#define TA_RPROC_FW_CMD_START_FW  2
 +
 +/*
 + * Stop the remote processor
 + *
 + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
 processor
 + */
 +#define TA_RPROC_FW_CMD_STOP_FW   3
 +
 +/*
 + * Return the address of the resource table, or 0 if not found
 + * No check is done to verify that the address returned is accessible by
 + * the non secure context. If the resource table is loaded in a protected
 + * memory the access by the non secure context will lead to a data abort.
 + *
 + * [in]  params[0].value.a:   unique 32bit identifier of the remote 
 processor
 + * [out]  params[1].value.a:  32bit LSB resource table memory address
 + * [out]  

Re: [PATCH net-next v3 1/3] trace: adjust TP_STORE_ADDR_PORTS_SKB() parameters

2024-03-29 Thread Eric Dumazet
On Fri, Mar 29, 2024 at 4:43 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Introducing entry_saddr and entry_daddr parameters in this macro
> for later use can help us record the reverse 4-tuple by analyzing
> the 4-tuple of the incoming skb when receiving.
>
> Signed-off-by: Jason Xing 

Reviewed-by: Eric Dumazet 



[PATCH] x86/mce: Clean up TP_printk() output line of the mce_record tracepoint

2024-03-29 Thread Ingo Molnar


* Avadhut Naik  wrote:

>  
> +/*
> + * MCE Event Record.
> + *
> + * Only very relevant and transient information which cannot be
> + * gathered from a system by any other means or which can only be
> + * acquired arduously should be added to this record.
> + */
> +
>  TRACE_EVENT(mce_record,
>  
>   TP_PROTO(struct mce *m),
> @@ -25,6 +33,7 @@ TRACE_EVENT(mce_record,
>   __field(u64,ipid)
>   __field(u64,ip  )
>   __field(u64,tsc )
> + __field(u64,ppin)
>   __field(u64,walltime)
>   __field(u32,cpu )
>   __field(u32,cpuid   )
> @@ -45,6 +54,7 @@ TRACE_EVENT(mce_record,
>   __entry->ipid   = m->ipid;
>   __entry->ip = m->ip;
>   __entry->tsc= m->tsc;
> + __entry->ppin   = m->ppin;
>   __entry->walltime   = m->time;
>   __entry->cpu= m->extcpu;
>   __entry->cpuid  = m->cpuid;
> @@ -55,7 +65,7 @@ TRACE_EVENT(mce_record,
>   __entry->cpuvendor  = m->cpuvendor;
>   ),
>  
> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: 
> %llx, processor: %u:%x, time: %llu, socket: %u, APIC: %x",

Please split out the other (capitalization) changes to the output into 
a separate patch.

- While at it, don't forget to:

   s/ADDR/MISC/SYND
/addr/misc/synd

- Also, it's a bit weird that we have 'CPU' but also 'processor' 
  fields, why isn't it 'vendor' and 'CPUID'?

- Finally, why are some fields 'merged' as per ADDR/MISC/SYND, while 
  others are listed separately? All that have separate names should be 
  listed separately.

Ie. something like the patch below?

Thanks,

Ingo

>
From: Ingo Molnar 
Date: Fri, 29 Mar 2024 08:09:23 +0100
Subject: [PATCH] x86/mce: Clean up TP_printk() output line of the mce_record 
tracepoint

 - Only capitalize entries where that makes sense
 - Print separate values separately
 - Rename 'PROCESSOR' to vendor & CPUID

Signed-off-by: Ingo Molnar 
---
 include/trace/events/mce.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 1391ada0da3b..c5b0523f25ee 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -55,15 +55,18 @@ TRACE_EVENT(mce_record,
__entry->cpuvendor  = m->cpuvendor;
),
 
-   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PROCESSOR: 
%u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
+   TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
addr: %016Lx, misc: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x",
__entry->cpu,
__entry->mcgcap, __entry->mcgstatus,
__entry->bank, __entry->status,
__entry->ipid,
-   __entry->addr, __entry->misc, __entry->synd,
+   __entry->addr,
+   __entry->misc,
+   __entry->synd,
__entry->cs, __entry->ip,
__entry->tsc,
-   __entry->cpuvendor, __entry->cpuid,
+   __entry->cpuvendor,
+   __entry->cpuid,
__entry->walltime,
__entry->socketid,
__entry->apicid)