[PATCH] modules: Drop the .export_symbol section from the final modules
From: Wang Yao Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") forget drop the .export_symbol section from the final modules. Signed-off-by: Wang Yao --- scripts/module.lds.S | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/module.lds.S b/scripts/module.lds.S index bf5bcf2836d8..89ff01a22634 100644 --- a/scripts/module.lds.S +++ b/scripts/module.lds.S @@ -13,6 +13,7 @@ SECTIONS { /DISCARD/ : { *(.discard) *(.discard.*) + *(.export_symbol) } __ksymtab 0 : { *(SORT(___ksymtab+*)) } -- 2.27.0
Re: [PATCH v20 0/5] Introducing trace buffer mapping by user-space
(added linux-mm) On Wed, Apr 10, 2024 at 01:56:12PM -0400, Steven Rostedt wrote: > > Hi Andrew, et.al. > > Linus said it's a hard requirement that this code gets an Acked-by (or > Reviewed-by) from the memory sub-maintainers before he will accept it. > He was upset that we faulted in pages one at a time instead of mapping it > in one go: > > > https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w...@mail.gmail.com/ > > Could you take a look at patches 1-3 to make sure they look sane from a > memory management point of view? > > I really want this applied in the next merge window. > > Thanks! > > -- Steve > > > On Sat, 6 Apr 2024 18:36:44 +0100 > Vincent Donnefort wrote: > > > The tracing ring-buffers can be stored on disk or sent to network > > without any copy via splice. However the later doesn't allow real time > > processing of the traces. A solution is to give userspace direct access > > to the ring-buffer pages via a mapping. An application can now become a > > consumer of the ring-buffer, in a similar fashion to what trace_pipe > > offers. > > > > Support for this new feature can already be found in libtracefs from > > version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE. > > > > Vincent > > > > v19 -> v20: > > * Fix typos in documentation. > > * Remove useless mmap open and fault callbacks. > > * add mm.h include for vm_insert_pages > > > > v18 -> v19: > > * Use VM_PFNMAP and vm_insert_pages > > * Allocate ring-buffer subbufs with __GFP_COMP > > * Pad the meta-page with the zero-page to align on the subbuf_order > > * Extend the ring-buffer test with mmap() dedicated suite > > > > v17 -> v18: > > * Fix lockdep_assert_held > > * Fix spin_lock_init typo > > * Fix CONFIG_TRACER_MAX_TRACE typo > > > > v16 -> v17: > > * Documentation and comments improvements. > > * Create get/put_snapshot_map() for clearer code. > > * Replace kzalloc with kcalloc. > > * Fix -ENOMEM handling in rb_alloc_meta_page(). > > * Move flush(cpu_buffer->reader_page) behind the reader lock. > > * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer > > mutex. (removes READ_ONCE/WRITE_ONCE accesses). > > > > v15 -> v16: > > * Add comment for the dcache flush. > > * Remove now unnecessary WRITE_ONCE for the meta-page. > > > > v14 -> v15: > > * Add meta-page and reader-page flush. Intends to fix the mapping > > for VIVT and aliasing-VIPT data caches. > > * -EPERM on VM_EXEC. > > * Fix build warning !CONFIG_TRACER_MAX_TRACE. > > > > v13 -> v14: > > * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu) > > * on unmap, sync meta-page teardown with the reader_lock instead of > > the synchronize_rcu. > > * Add a dedicated spinlock for trace_array ->snapshot and ->mapped. > > (intends to fix a lockdep issue) > > * Add kerneldoc for flags and Reserved fields. > > * Add kselftest for snapshot/map mutual exclusion. > > > > v12 -> v13: > > * Swap subbufs_{touched,lost} for Reserved fields. > > * Add a flag field in the meta-page. > > * Fix CONFIG_TRACER_MAX_TRACE. > > * Rebase on top of trace/urgent. > > * Add a comment for try_unregister_trigger() > > > > v11 -> v12: > > * Fix code sample mmap bug. > > * Add logging in sample code. > > * Reset tracer in selftest. > > * Add a refcount for the snapshot users. > > * Prevent mapping when there are snapshot users and vice versa. > > * Refine the meta-page. > > * Fix types in the meta-page. > > * Collect Reviewed-by. > > > > v10 -> v11: > > * Add Documentation and code sample. > > * Add a selftest. > > * Move all the update to the meta-page into a single > > rb_update_meta_page(). > > * rb_update_meta_page() is now called from > > ring_buffer_map_get_reader() to fix NOBLOCK callers. > > * kerneldoc for struct trace_meta_page. > > * Add a patch to zero all the ring-buffer allocations. > > > > v9 -> v10: > > * Refactor rb_update_meta_page() > > * In-loop declaration for foreach_subbuf_page() > > * Check for cpu_buffer->mapped overflow > > > > v8 -> v9: > > * Fix the unlock path in ring_buffer_map() > > * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer > > * Rebase on linux-trace/for-next > > (3cb3091138ca0921c4569bcf7ffa062519639b6a) > > > > v7 -> v8: > > * Drop the subbufs renaming into bpages > > * Use subbuf as a name when relevant > > > > v6 -> v7: > > * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/ > > * Support for subbufs > > * Rename subbufs into bpages > > > > v5 -> v6: > > * Rebase on next-20230802. > > * (unsigned long) -> (void *) cast for virt_to_page(). > > * Add a wait for the GET_READER_PAGE ioctl. > > * Move writer fields update (overrun/pages_lost/entries/pages_touched) > > in the irq_work. > > * Rearrange id in struct buffer_page. > > * Rearrange the meta-page. > > * ring_buffer_meta_page ->
[PATCH v3] ftrace: Fix possible use-after-free issue in ftrace_location()
KASAN reports a bug: BUG: KASAN: use-after-free in ftrace_location+0x90/0x120 Read of size 8 at addr 888141d40010 by task insmod/424 CPU: 8 PID: 424 Comm: insmod Tainted: GW 6.9.0-rc2+ [...] Call Trace: dump_stack_lvl+0x68/0xa0 print_report+0xcf/0x610 kasan_report+0xb5/0xe0 ftrace_location+0x90/0x120 register_kprobe+0x14b/0xa40 kprobe_init+0x2d/0xff0 [kprobe_example] do_one_initcall+0x8f/0x2d0 do_init_module+0x13a/0x3c0 load_module+0x3082/0x33d0 init_module_from_file+0xd2/0x130 __x64_sys_finit_module+0x306/0x440 do_syscall_64+0x68/0x140 entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause is that, in lookup_rec(), ftrace record of some address is being searched in ftrace pages of some module, but those ftrace pages at the same time is being freed in ftrace_release_mod() as the corresponding module is being deleted: CPU1 | CPU2 register_kprobes() {| delete_module() { check_kprobe_address_safe() { | arch_check_ftrace_location() { | ftrace_location() { | lookup_rec() // USE!| ftrace_release_mod() // Free! To fix this issue: 1. Hold rcu lock as accessing ftrace pages in ftrace_location_range(); 2. Use ftrace_location_range() instead of lookup_rec() in ftrace_location(); 3. Call synchronize_rcu() before freeing any ftrace pages both in ftrace_process_locs()/ftrace_release_mod()/ftrace_free_mem(). Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization") Suggested-by: Steven Rostedt Signed-off-by: Zheng Yejian --- kernel/trace/ftrace.c | 46 --- 1 file changed, 30 insertions(+), 16 deletions(-) v3: - Complete the commit description and add Suggested-by tag - Add comments around where synchronize_rcu() is called v2: - Link: https://lore.kernel.org/all/20240416112459.1444612-1-zhengyeji...@huawei.com/ - Use RCU lock instead of holding ftrace_lock as suggested by Steve. Link: https://lore.kernel.org/all/20240410112823.1d084...@gandalf.local.home/ v1: - Link: https://lore.kernel.org/all/20240401125543.1282845-1-zhengyeji...@huawei.com/ diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index da1710499698..e05d3e3dc06a 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1581,7 +1581,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end) } /** - * ftrace_location_range - return the first address of a traced location + * ftrace_location_range_rcu - return the first address of a traced location * if it touches the given ip range * @start: start of range to search. * @end: end of range to search (inclusive). @end points to the last byte @@ -1592,7 +1592,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end) * that is either a NOP or call to the function tracer. It checks the ftrace * internal tables to determine if the address belongs or not. */ -unsigned long ftrace_location_range(unsigned long start, unsigned long end) +static unsigned long ftrace_location_range_rcu(unsigned long start, unsigned long end) { struct dyn_ftrace *rec; @@ -1603,6 +1603,16 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) return 0; } +unsigned long ftrace_location_range(unsigned long start, unsigned long end) +{ + unsigned long loc; + + rcu_read_lock(); + loc = ftrace_location_range_rcu(start, end); + rcu_read_unlock(); + return loc; +} + /** * ftrace_location - return the ftrace location * @ip: the instruction pointer to check @@ -1614,25 +1624,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) */ unsigned long ftrace_location(unsigned long ip) { - struct dyn_ftrace *rec; + unsigned long loc; unsigned long offset; unsigned long size; - rec = lookup_rec(ip, ip); - if (!rec) { + loc = ftrace_location_range(ip, ip); + if (!loc) { if (!kallsyms_lookup_size_offset(ip, , )) goto out; /* map sym+0 to __fentry__ */ if (!offset) - rec = lookup_rec(ip, ip + size - 1); + loc = ftrace_location_range(ip, ip + size - 1); } - if (rec) - return rec->ip; - out: - return 0; + return loc; } /** @@ -6596,6 +6603,8 @@ static int ftrace_process_locs(struct module *mod, /* We should have used all pages unless we skipped some */ if (pg_unuse) { WARN_ON(!skipped); + /* Need to synchronize with ftrace_location_range() */ + synchronize_rcu(); ftrace_free_pages(pg_unuse); } return ret; @@ -6809,6 +6818,9 @@ void ftrace_release_mod(struct module *mod) out_unlock:
Re: [PATCH net-next v8] virtio_net: Support RX hash XDP hint
On Tue, Apr 16, 2024 at 3:35 PM Heng Qi wrote: > > > > 在 2024/4/16 下午2:19, Liang Chen 写道: > > The RSS hash report is a feature that's part of the virtio specification. > > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost > > (still a work in progress as per [1]) support this feature. While the > > capability to obtain the RSS hash has been enabled in the normal path, > > it's currently missing in the XDP path. Therefore, we are introducing > > XDP hints through kfuncs to allow XDP programs to access the RSS hash. > > > > 1. > > https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r > > > > Signed-off-by: Liang Chen > > --- > >Changes from v7: > > - use table lookup for rss hash type > >Changes from v6: > > - fix a coding style issue > >Changes from v5: > > - Preservation of the hash value has been dropped, following the conclusion > >from discussions in V3 reviews. The virtio_net driver doesn't > >accessing/using the virtio_net_hdr after the XDP program execution, so > >nothing tragic should happen. As to the xdp program, if it smashes the > >entry in virtio header, it is likely buggy anyways. Additionally, looking > >up the Intel IGC driver, it also does not bother with this particular > >aspect. > > --- > > drivers/net/virtio_net.c| 42 + > > include/uapi/linux/virtio_net.h | 1 + > > 2 files changed, 43 insertions(+) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c22d1118a133..1d750009f615 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -4621,6 +4621,47 @@ static void virtnet_set_big_packets(struct > > virtnet_info *vi, const int mtu) > > } > > } > > > > +static enum xdp_rss_hash_type > > +virtnet_xdp_rss_type[VIRTIO_NET_HASH_REPORT_MAX_TABLE] = { > > + [VIRTIO_NET_HASH_REPORT_NONE] = XDP_RSS_TYPE_NONE, > > + [VIRTIO_NET_HASH_REPORT_IPv4] = XDP_RSS_TYPE_L3_IPV4, > > + [VIRTIO_NET_HASH_REPORT_TCPv4] = XDP_RSS_TYPE_L4_IPV4_TCP, > > + [VIRTIO_NET_HASH_REPORT_UDPv4] = XDP_RSS_TYPE_L4_IPV4_UDP, > > + [VIRTIO_NET_HASH_REPORT_IPv6] = XDP_RSS_TYPE_L3_IPV6, > > + [VIRTIO_NET_HASH_REPORT_TCPv6] = XDP_RSS_TYPE_L4_IPV6_TCP, > > + [VIRTIO_NET_HASH_REPORT_UDPv6] = XDP_RSS_TYPE_L4_IPV6_UDP, > > + [VIRTIO_NET_HASH_REPORT_IPv6_EX] = XDP_RSS_TYPE_L3_IPV6_EX, > > + [VIRTIO_NET_HASH_REPORT_TCPv6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX, > > + [VIRTIO_NET_HASH_REPORT_UDPv6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX > > +}; > > + > > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, > > +enum xdp_rss_hash_type *rss_type) > > +{ > > + const struct xdp_buff *xdp = (void *)_ctx; > > + struct virtio_net_hdr_v1_hash *hdr_hash; > > + struct virtnet_info *vi; > > + u16 hash_report; > > + > > + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) > > + return -ENODATA; > > + > > + vi = netdev_priv(xdp->rxq->dev); > > + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len); > > + hash_report = __le16_to_cpu(hdr_hash->hash_report); > > + > > + if (hash_report >= VIRTIO_NET_HASH_REPORT_MAX_TABLE) > > + hash_report = VIRTIO_NET_HASH_REPORT_NONE; > > When this happens, it may mean an error or user modification of the > header occurred. > Should the following *hash* value be cleared to 0? > Referring to the igc and mlx5 drivers, the hash value is not cleared to 0 in such cases either. This is because if the rss type is XDP_RSS_TYPE_NONE, the hash value is considered to be not relevant. Thanks! > Thanks, > Heng > > > + > > + *rss_type = virtnet_xdp_rss_type[hash_report]; > > + *hash = __le32_to_cpu(hdr_hash->hash_value); > > + return 0; > > +} > > + > > +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { > > + .xmo_rx_hash= virtnet_xdp_rx_hash, > > +}; > > + > > static int virtnet_probe(struct virtio_device *vdev) > > { > > int i, err = -ENOMEM; > > @@ -4747,6 +4788,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); > > > > dev->hw_features |= NETIF_F_RXHASH; > > + dev->xdp_metadata_ops = _xdp_metadata_ops; > > } > > > > if (vi->has_rss_hash_report) > > diff --git a/include/uapi/linux/virtio_net.h > > b/include/uapi/linux/virtio_net.h > > index cc65ef0f3c3e..3ee695450096 100644 > > --- a/include/uapi/linux/virtio_net.h > > +++ b/include/uapi/linux/virtio_net.h > > @@ -176,6 +176,7 @@ struct virtio_net_hdr_v1_hash { > > #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 > > #define VIRTIO_NET_HASH_REPORT_TCPv6_EX8 > > #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 > > +#define VIRTIO_NET_HASH_REPORT_MAX_TABLE 10 > > __le16 hash_report; > > __le16 padding; > > }; >
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue, 16 Apr 2024 17:21:24 -0500, Haitao Huang wrote: On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang wrote: On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen wrote: On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: I did declare the configs in the config file but I missed it in my patch as stated earlier. IIUC, that would not cause this error though. Maybe I should exit with the skip code if no CGROUP_MISC (no more CGROUP_SGX_EPC) is configured? OK, so I wanted to do a distro kernel test here, and used the default OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. tools/testing/selftests$ find . -name README ./futex/README ./tc-testing/README ./net/forwarding/README ./powerpc/nx-gzip/README ./ftrace/README ./arm64/signal/README ./arm64/fp/README ./arm64/README ./zram/README ./livepatch/README ./resctrl/README So is there a README because of override timeout parameter? Maybe it should be just set to a high enough value? BR, Jarkko From the docs, I think we are supposed to use override. See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests Thanks Haitao Maybe you are suggesting we add settings file? I can do that. README also explains what the tests do though. Do you still think they should not exist? I was mostly following resctrl as example. Thanks Haitao With the settings I shortened the README quite bit. Now I also lean towards removing it. Let me know your preference. You can check the latest at my branch for reference: https://github.com/haitaohuang/linux/tree/sgx_cg_upstream_v12_plus Thanks Haitao
Re: [PATCH net-next v8] virtio_net: Support RX hash XDP hint
On Tue, Apr 16, 2024 at 3:20 PM Jason Wang wrote: > > On Tue, Apr 16, 2024 at 2:20 PM Liang Chen wrote: > > > > The RSS hash report is a feature that's part of the virtio specification. > > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost > > (still a work in progress as per [1]) support this feature. While the > > capability to obtain the RSS hash has been enabled in the normal path, > > it's currently missing in the XDP path. Therefore, we are introducing > > XDP hints through kfuncs to allow XDP programs to access the RSS hash. > > > > 1. > > https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r > > > > Signed-off-by: Liang Chen > > --- > > Changes from v7: > > - use table lookup for rss hash type > > Changes from v6: > > - fix a coding style issue > > Changes from v5: > > - Preservation of the hash value has been dropped, following the conclusion > > from discussions in V3 reviews. The virtio_net driver doesn't > > accessing/using the virtio_net_hdr after the XDP program execution, so > > nothing tragic should happen. As to the xdp program, if it smashes the > > entry in virtio header, it is likely buggy anyways. Additionally, looking > > up the Intel IGC driver, it also does not bother with this particular > > aspect. > > --- > > drivers/net/virtio_net.c| 42 + > > include/uapi/linux/virtio_net.h | 1 + > > 2 files changed, 43 insertions(+) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c22d1118a133..1d750009f615 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -4621,6 +4621,47 @@ static void virtnet_set_big_packets(struct > > virtnet_info *vi, const int mtu) > > } > > } > > > > +static enum xdp_rss_hash_type > > +virtnet_xdp_rss_type[VIRTIO_NET_HASH_REPORT_MAX_TABLE] = { > > + [VIRTIO_NET_HASH_REPORT_NONE] = XDP_RSS_TYPE_NONE, > > + [VIRTIO_NET_HASH_REPORT_IPv4] = XDP_RSS_TYPE_L3_IPV4, > > + [VIRTIO_NET_HASH_REPORT_TCPv4] = XDP_RSS_TYPE_L4_IPV4_TCP, > > + [VIRTIO_NET_HASH_REPORT_UDPv4] = XDP_RSS_TYPE_L4_IPV4_UDP, > > + [VIRTIO_NET_HASH_REPORT_IPv6] = XDP_RSS_TYPE_L3_IPV6, > > + [VIRTIO_NET_HASH_REPORT_TCPv6] = XDP_RSS_TYPE_L4_IPV6_TCP, > > + [VIRTIO_NET_HASH_REPORT_UDPv6] = XDP_RSS_TYPE_L4_IPV6_UDP, > > + [VIRTIO_NET_HASH_REPORT_IPv6_EX] = XDP_RSS_TYPE_L3_IPV6_EX, > > + [VIRTIO_NET_HASH_REPORT_TCPv6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX, > > + [VIRTIO_NET_HASH_REPORT_UDPv6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX > > +}; > > + > > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, > > + enum xdp_rss_hash_type *rss_type) > > +{ > > + const struct xdp_buff *xdp = (void *)_ctx; > > + struct virtio_net_hdr_v1_hash *hdr_hash; > > + struct virtnet_info *vi; > > + u16 hash_report; > > + > > + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) > > + return -ENODATA; > > + > > + vi = netdev_priv(xdp->rxq->dev); > > + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - > > vi->hdr_len); > > + hash_report = __le16_to_cpu(hdr_hash->hash_report); > > + > > + if (hash_report >= VIRTIO_NET_HASH_REPORT_MAX_TABLE) > > + hash_report = VIRTIO_NET_HASH_REPORT_NONE; > > + > > + *rss_type = virtnet_xdp_rss_type[hash_report]; > > + *hash = __le32_to_cpu(hdr_hash->hash_value); > > + return 0; > > +} > > + > > +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { > > + .xmo_rx_hash= virtnet_xdp_rx_hash, > > +}; > > + > > static int virtnet_probe(struct virtio_device *vdev) > > { > > int i, err = -ENOMEM; > > @@ -4747,6 +4788,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); > > > > dev->hw_features |= NETIF_F_RXHASH; > > + dev->xdp_metadata_ops = _xdp_metadata_ops; > > } > > > > if (vi->has_rss_hash_report) > > diff --git a/include/uapi/linux/virtio_net.h > > b/include/uapi/linux/virtio_net.h > > index cc65ef0f3c3e..3ee695450096 100644 > > --- a/include/uapi/linux/virtio_net.h > > +++ b/include/uapi/linux/virtio_net.h > > @@ -176,6 +176,7 @@ struct virtio_net_hdr_v1_hash { > > #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 > > #define VIRTIO_NET_HASH_REPORT_TCPv6_EX8 > > #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 > > +#define VIRTIO_NET_HASH_REPORT_MAX_TABLE 10 > > This should not be part of uAPI. It may confuse the userspace. > Sure. I will just move it to virtio_net.c right above the table definition. Thanks! > Others look good. > > Thanks > > > __le16 hash_report; > > __le16 padding; > > }; > > -- > > 2.40.1 > > >
Re: [PATCH] ASoC: tracing: Export SND_SOC_DAPM_DIR_OUT to its value
On Tue, 16 Apr 2024 00:03:03 -0400, Steven Rostedt wrote: > The string SND_SOC_DAPM_DIR_OUT is printed in the snd_soc_dapm_path trace > event instead of its value: > >(((REC->path_dir) == SND_SOC_DAPM_DIR_OUT) ? "->" : "<-") > > User space cannot parse this, as it has no idea what SND_SOC_DAPM_DIR_OUT > is. Use TRACE_DEFINE_ENUM() to convert it to its value: > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: tracing: Export SND_SOC_DAPM_DIR_OUT to its value commit: 58300f8d6a48e58d1843199be743f819e2791ea3 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH][next] dax: remove redundant assignment to variable rc
On Mon, 2024-04-15 at 11:19 +0100, Colin Ian King wrote: > The variable rc is being assigned an value and then is being re-assigned > a new value in the next statement. The assignment is redundant and can > be removed. > > Cleans up clang scan build warning: > drivers/dax/bus.c:1207:2: warning: Value stored to 'rc' is never > read [deadcode.DeadStores] > > Signed-off-by: Colin Ian King Reviewed-by: Vishal Verma > --- > drivers/dax/bus.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > index 797e1ebff299..f758afbf8f09 100644 > --- a/drivers/dax/bus.c > +++ b/drivers/dax/bus.c > @@ -1204,7 +1204,6 @@ static ssize_t mapping_store(struct device *dev, struct > device_attribute *attr, > if (rc) > return rc; > > - rc = -ENXIO; > rc = down_write_killable(_region_rwsem); > if (rc) > return rc;
Re: [PATCH v12 07/14] x86/sgx: Abstract tracking reclaimable pages in LRU
On Tue, 16 Apr 2024 09:07:33 -0500, Huang, Kai wrote: On Mon, 2024-04-15 at 20:20 -0700, Haitao Huang wrote: From: Kristen Carlson Accardi The functions, sgx_{mark,unmark}_page_reclaimable(), manage the tracking of reclaimable EPC pages: sgx_mark_page_reclaimable() adds a newly allocated page into the global LRU list while sgx_unmark_page_reclaimable() does the opposite. Abstract the hard coded global LRU references in these functions to make them reusable when pages are tracked in per-cgroup LRUs. Create a helper, sgx_lru_list(), that returns the LRU that tracks a given EPC page. It simply returns the global LRU now, and will later return the LRU of the cgroup within which the EPC page was allocated. Replace the hard coded global LRU with a call to this helper. Next patches will first get the cgroup reclamation flow ready while keeping pages tracked in the global LRU and reclaimed by ksgxd before we make the switch in the end for sgx_lru_list() to return per-cgroup LRU. I found the first paragraph hard to read. Provide my version below for your reference: " The SGX driver tracks reclaimable EPC pages via sgx_mark_page_reclaimable(), which adds the newly allocated page into the global LRU list. sgx_unmark_page_reclaimable() does the opposite. To support SGX EPC cgroup, the SGX driver will need to maintain an LRU list for each cgroup, and the new allocated EPC page will need to be added to the LRU of associated cgroup, but not always the global LRU list. When sgx_mark_page_reclaimable() is called, the cgroup that the new allocated EPC page belongs to is already known, i.e., it has been set to the 'struct sgx_epc_page'. Add a helper, sgx_lru_list(), to return the LRU that the EPC page should be/is added to for the given EPC page. Currently it just returns the global LRU. Change sgx_{mark|unmark}_page_reclaimable() to use the helper function to get the LRU from the EPC page instead of referring to the global LRU directly. This allows EPC page being able to be tracked in "per-cgroup" LRU when that becomes ready. " Thanks. Will use Nit: That being said, is sgx_epc_page_lru() better than sgx_lru_list()? Sure Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen Tested-by: Jarkko Sakkinen --- Feel free to add: Reviewed-by: Kai Huang Thanks Haitao
[PATCH 2/2] selftests/user_events: Add non-spacing separator check
The ABI documentation indicates that field separators do not need a space between them, only a ';'. When no spacing is used, the register must work. Any subsequent register, with or without spaces, must match and not return -EADDRINUSE. Add a non-spacing separator case to our self-test register case to ensure it works going forward. Signed-off-by: Beau Belgrave --- tools/testing/selftests/user_events/ftrace_test.c | 8 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c index dcd7509fe2e0..0bb46793dcd4 100644 --- a/tools/testing/selftests/user_events/ftrace_test.c +++ b/tools/testing/selftests/user_events/ftrace_test.c @@ -261,6 +261,12 @@ TEST_F(user, register_events) { ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, )); ASSERT_EQ(0, reg.write_index); + /* Register without separator spacing should still match */ + reg.enable_bit = 29; + reg.name_args = (__u64)"__test_event u32 field1;u32 field2"; + ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, )); + ASSERT_EQ(0, reg.write_index); + /* Multiple registers to same name but different args should fail */ reg.enable_bit = 29; reg.name_args = (__u64)"__test_event u32 field1;"; @@ -288,6 +294,8 @@ TEST_F(user, register_events) { ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, )); unreg.disable_bit = 30; ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, )); + unreg.disable_bit = 29; + ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, )); /* Delete should have been auto-done after close and unregister */ close(self->data_fd); -- 2.34.1
[PATCH 1/2] tracing/user_events: Fix non-spaced field matching
When the ABI was updated to prevent same name w/different args, it missed an important corner case when fields don't end with a space. Typically, space is used for fields to help separate them, like "u8 field1; u8 field2". If no spaces are used, like "u8 field1;u8 field2", then the parsing works for the first time. However, the match check fails on a subsequent register, leading to confusion. This is because the match check uses argv_split() and assumes that all fields will be split upon the space. When spaces are used, we get back { "u8", "field1;" }, without spaces we get back { "u8", "field1;u8" }. This causes a mismatch, and the user program gets back -EADDRINUSE. Add a method to detect this case before calling argv_split(). If found force a space after the field separator character ';'. This ensures all cases work properly for matching. With this fix, the following are all treated as matching: u8 field1;u8 field2 u8 field1; u8 field2 u8 field1;\tu8 field2 u8 field1;\nu8 field2 Fixes: ba470eebc2f6 ("tracing/user_events: Prevent same name but different args event") Signed-off-by: Beau Belgrave --- kernel/trace/trace_events_user.c | 88 +++- 1 file changed, 87 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 70d428c394b6..9184d3962b2a 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -1989,6 +1989,92 @@ static int user_event_set_tp_name(struct user_event *user) return 0; } +/* + * Counts how many ';' without a trailing space are in the args. + */ +static int count_semis_no_space(char *args) +{ + int count = 0; + + while ((args = strchr(args, ';'))) { + args++; + + if (!isspace(*args)) + count++; + } + + return count; +} + +/* + * Copies the arguments while ensuring all ';' have a trailing space. + */ +static char *fix_semis_no_space(char *args, int count) +{ + char *fixed, *pos; + char c, last; + int len; + + len = strlen(args) + count; + fixed = kmalloc(len + 1, GFP_KERNEL); + + if (!fixed) + return NULL; + + pos = fixed; + last = '\0'; + + while (len > 0) { + c = *args++; + + if (last == ';' && !isspace(c)) { + *pos++ = ' '; + len--; + } + + if (len > 0) { + *pos++ = c; + len--; + } + + last = c; + } + + /* +* len is the length of the copy excluding the null. +* This ensures we always have room for a null. +*/ + *pos = '\0'; + + return fixed; +} + +static char **user_event_argv_split(char *args, int *argc) +{ + /* Count how many ';' without a trailing space */ + int count = count_semis_no_space(args); + + if (count) { + /* We must fixup 'field;field' to 'field; field' */ + char *fixed = fix_semis_no_space(args, count); + char **split; + + if (!fixed) + return NULL; + + /* We do a normal split afterwards */ + split = argv_split(GFP_KERNEL, fixed, argc); + + /* We can free since argv_split makes a copy */ + kfree(fixed); + + return split; + } + + /* No fixup is required */ + return argv_split(GFP_KERNEL, args, argc); +} + /* * Parses the event name, arguments and flags then registers if successful. * The name buffer lifetime is owned by this method for success cases only. @@ -2012,7 +2098,7 @@ static int user_event_parse(struct user_event_group *group, char *name, return -EPERM; if (args) { - argv = argv_split(GFP_KERNEL, args, ); + argv = user_event_argv_split(args, ); if (!argv) return -ENOMEM; -- 2.34.1
[PATCH 0/2] tracing/user_events: Fix non-spaced field matching
When the ABI was updated to prevent same name w/different args, it missed an important corner case when fields don't end with a space. Typically, space is used for fields to help separate them, like "u8 field1; u8 field2". If no spaces are used, like "u8 field1;u8 field2", then the parsing works for the first time. However, the match check fails on a subsequent register, leading to confusion. This is because the match check uses argv_split() and assumes that all fields will be split upon the space. When spaces are used, we get back { "u8", "field1;" }, without spaces we get back { "u8", "field1;u8" }. This causes a mismatch, and the user program gets back -EADDRINUSE. Add a method to detect this case before calling argv_split(). If found force a space after the field separator character ';'. This ensures all cases work properly for matching. I could not find an existing function to accomplish this, so I had to hand code a copy with this logic. If there is a better way to achieve this, I'm all ears. This series also adds a selftest to ensure this doesn't break again. With this fix, the following are all treated as matching: u8 field1;u8 field2 u8 field1; u8 field2 u8 field1;\tu8 field2 u8 field1;\nu8 field2 Beau Belgrave (2): tracing/user_events: Fix non-spaced field matching selftests/user_events: Add non-spacing separator check kernel/trace/trace_events_user.c | 88 ++- .../selftests/user_events/ftrace_test.c | 8 ++ 2 files changed, 95 insertions(+), 1 deletion(-) base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680 -- 2.34.1
Re: [PATCH v12 05/14] x86/sgx: Implement basic EPC misc cgroup functionality
On Mon, 15 Apr 2024 22:20:02 -0500, Haitao Huang wrote: diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..400baa7cfb69 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -1,6 +1,7 @@ obj-y += \ driver.o \ encl.o \ + epc_cgroup.o \ It should be: +obj-$(CONFIG_CGROUP_MISC) += epc_cgroup.o Haitao
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue, 16 Apr 2024 17:04:21 -0500, Haitao Huang wrote: On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen wrote: On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: I did declare the configs in the config file but I missed it in my patch as stated earlier. IIUC, that would not cause this error though. Maybe I should exit with the skip code if no CGROUP_MISC (no more CGROUP_SGX_EPC) is configured? OK, so I wanted to do a distro kernel test here, and used the default OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. tools/testing/selftests$ find . -name README ./futex/README ./tc-testing/README ./net/forwarding/README ./powerpc/nx-gzip/README ./ftrace/README ./arm64/signal/README ./arm64/fp/README ./arm64/README ./zram/README ./livepatch/README ./resctrl/README So is there a README because of override timeout parameter? Maybe it should be just set to a high enough value? BR, Jarkko From the docs, I think we are supposed to use override. See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests Thanks Haitao Maybe you are suggesting we add settings file? I can do that. README also explains what the tests do though. Do you still think they should not exist? I was mostly following resctrl as example. Thanks Haitao
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue, 16 Apr 2024 11:08:11 -0500, Jarkko Sakkinen wrote: On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: I did declare the configs in the config file but I missed it in my patch as stated earlier. IIUC, that would not cause this error though. Maybe I should exit with the skip code if no CGROUP_MISC (no more CGROUP_SGX_EPC) is configured? OK, so I wanted to do a distro kernel test here, and used the default OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. tools/testing/selftests$ find . -name README ./futex/README ./tc-testing/README ./net/forwarding/README ./powerpc/nx-gzip/README ./ftrace/README ./arm64/signal/README ./arm64/fp/README ./arm64/README ./zram/README ./livepatch/README ./resctrl/README So is there a README because of override timeout parameter? Maybe it should be just set to a high enough value? BR, Jarkko From the docs, I think we are supposed to use override. See: https://docs.kernel.org/dev-tools/kselftest.html#timeout-for-selftests Thanks Haitao
[PATCH v2 3/4] dax/bus.c: Don't use down_write_killable for non-user processes
Change an instance of down_write_killable() to a simple down_write() where there is no user process that might want to interrupt the operation. Fixes: c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem") Reported-by: Dan Williams Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 4e04b228b080..db183eb5ce3a 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1542,12 +1542,8 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) { struct dev_dax *dev_dax; - int rc; - - rc = down_write_killable(_region_rwsem); - if (rc) - return ERR_PTR(rc); + down_write(_region_rwsem); dev_dax = __devm_create_dev_dax(data); up_write(_region_rwsem); -- 2.44.0
[PATCH v2 4/4] dax/bus.c: Use the right locking mode (read vs write) in size_show
In size_show(), the dax_dev_rwsem only needs a read lock, but was acquiring a write lock. Change it to down_read_interruptible() so it doesn't unnecessarily hold a write lock. Cc: Dan Williams Fixes: c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem") Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index db183eb5ce3a..66095e60a279 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -939,11 +939,11 @@ static ssize_t size_show(struct device *dev, unsigned long long size; int rc; - rc = down_write_killable(_dev_rwsem); + rc = down_read_interruptible(_dev_rwsem); if (rc) return rc; size = dev_dax_size(dev_dax); - up_write(_dev_rwsem); + up_read(_dev_rwsem); return sysfs_emit(buf, "%llu\n", size); } -- 2.44.0
[PATCH v2 2/4] dax/bus.c: fix locking for unregister_dax_dev / unregister_dax_mapping paths
Commit c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem") was a bit overzealous in eliminating device_lock() usage, and ended up removing a couple of lock acquisitions which were needed, and as a result, fix some of the conditional locking missteps that the above commit introduced in unregister_dax_dev() and unregister_dax_mapping(). Fixes: c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem") Reported-by: Dan Williams Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 44 ++-- 1 file changed, 10 insertions(+), 34 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 7924dd542a13..4e04b228b080 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -465,26 +465,17 @@ static void free_dev_dax_ranges(struct dev_dax *dev_dax) trim_dev_dax_range(dev_dax); } -static void __unregister_dev_dax(void *dev) +static void unregister_dev_dax(void *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); dev_dbg(dev, "%s\n", __func__); + down_write(_region_rwsem); kill_dev_dax(dev_dax); device_del(dev); free_dev_dax_ranges(dev_dax); put_device(dev); -} - -static void unregister_dev_dax(void *dev) -{ - if (rwsem_is_locked(_region_rwsem)) - return __unregister_dev_dax(dev); - - if (WARN_ON_ONCE(down_write_killable(_region_rwsem) != 0)) - return; - __unregister_dev_dax(dev); up_write(_region_rwsem); } @@ -560,15 +551,12 @@ static ssize_t delete_store(struct device *dev, struct device_attribute *attr, if (!victim) return -ENXIO; - rc = down_write_killable(_region_rwsem); - if (rc) - return rc; - rc = down_write_killable(_dev_rwsem); - if (rc) { - up_write(_region_rwsem); - return rc; - } + device_lock(dev); + device_lock(victim); dev_dax = to_dev_dax(victim); + rc = down_write_killable(_dev_rwsem); + if (rc) + return rc; if (victim->driver || dev_dax_size(dev_dax)) rc = -EBUSY; else { @@ -589,11 +577,12 @@ static ssize_t delete_store(struct device *dev, struct device_attribute *attr, rc = -EBUSY; } up_write(_dev_rwsem); + device_unlock(victim); /* won the race to invalidate the device, clean it up */ if (do_del) devm_release_action(dev, unregister_dev_dax, victim); - up_write(_region_rwsem); + device_unlock(dev); put_device(victim); return rc; @@ -705,7 +694,7 @@ static void dax_mapping_release(struct device *dev) put_device(parent); } -static void __unregister_dax_mapping(void *data) +static void unregister_dax_mapping(void *data) { struct device *dev = data; struct dax_mapping *mapping = to_dax_mapping(dev); @@ -713,25 +702,12 @@ static void __unregister_dax_mapping(void *data) dev_dbg(dev, "%s\n", __func__); - lockdep_assert_held_write(_region_rwsem); - dev_dax->ranges[mapping->range_id].mapping = NULL; mapping->range_id = -1; device_unregister(dev); } -static void unregister_dax_mapping(void *data) -{ - if (rwsem_is_locked(_region_rwsem)) - return __unregister_dax_mapping(data); - - if (WARN_ON_ONCE(down_write_killable(_region_rwsem) != 0)) - return; - __unregister_dax_mapping(data); - up_write(_region_rwsem); -} - static struct dev_dax_range *get_dax_range(struct device *dev) { struct dax_mapping *mapping = to_dax_mapping(dev); -- 2.44.0
[PATCH v2 1/4] dax/bus.c: replace WARN_ON_ONCE() with lockdep asserts
In [1], Dan points out that all of the WARN_ON_ONCE() usage in the referenced patch should be replaced with lockdep_assert_held, or lockdep_held_assert_write(). Replace these as appropriate. Link: https://lore.kernel.org/r/65f0b5ef41817_aa2229...@dwillia2-mobl3.amr.corp.intel.com.notmuch [1] Fixes: c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem") Cc: Andrew Morton Reported-by: Dan Williams Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 797e1ebff299..7924dd542a13 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -192,7 +192,7 @@ static u64 dev_dax_size(struct dev_dax *dev_dax) u64 size = 0; int i; - WARN_ON_ONCE(!rwsem_is_locked(_dev_rwsem)); + lockdep_assert_held(_dev_rwsem); for (i = 0; i < dev_dax->nr_range; i++) size += range_len(_dax->ranges[i].range); @@ -302,7 +302,7 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region) resource_size_t size = resource_size(_region->res); struct resource *res; - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held(_region_rwsem); for_each_dax_region_resource(dax_region, res) size -= resource_size(res); @@ -447,7 +447,7 @@ static void trim_dev_dax_range(struct dev_dax *dev_dax) struct range *range = _dax->ranges[i].range; struct dax_region *dax_region = dev_dax->region; - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held_write(_region_rwsem); dev_dbg(_dax->dev, "delete range[%d]: %#llx:%#llx\n", i, (unsigned long long)range->start, (unsigned long long)range->end); @@ -507,7 +507,7 @@ static int __free_dev_dax_id(struct dev_dax *dev_dax) struct dax_region *dax_region; int rc = dev_dax->id; - WARN_ON_ONCE(!rwsem_is_locked(_dev_rwsem)); + lockdep_assert_held_write(_dev_rwsem); if (!dev_dax->dyn_id || dev_dax->id < 0) return -1; @@ -713,7 +713,7 @@ static void __unregister_dax_mapping(void *data) dev_dbg(dev, "%s\n", __func__); - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held_write(_region_rwsem); dev_dax->ranges[mapping->range_id].mapping = NULL; mapping->range_id = -1; @@ -830,7 +830,7 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) struct device *dev; int rc; - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held_write(_region_rwsem); if (dev_WARN_ONCE(_dax->dev, !dax_region->dev->driver, "region disabled\n")) @@ -876,7 +876,7 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, struct resource *alloc; int i, rc; - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held_write(_region_rwsem); /* handle the seed alloc special case */ if (!size) { @@ -935,7 +935,7 @@ static int adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, r struct device *dev = _dax->dev; int rc; - WARN_ON_ONCE(!rwsem_is_locked(_region_rwsem)); + lockdep_assert_held_write(_region_rwsem); if (dev_WARN_ONCE(dev, !size, "deletion is handled by dev_dax_shrink\n")) return -EINVAL; -- 2.44.0
Re: [PATCH v15 0/4] add zynqmp TCM bindings
On Fri, Apr 12, 2024 at 11:37:04AM -0700, Tanmay Shah wrote: > Tightly-Coupled Memories(TCMs) are low-latency memory that provides > predictable instruction execution and predictable data load/store > timing. Each Cortex-R5F processor contains exclusive two 64 KB memory > banks on the ATCM and BTCM ports, for a total of 128 KB of memory. > In lockstep mode, both 128KB memory is accessible to the cluster. > > As per ZynqMP Ultrascale+ Technical Reference Manual UG1085, following > is address space of TCM memory. The bindings in this patch series > introduces properties to accommodate following address space with > address translation between Linux and Cortex-R5 views. > > | | | | > | --- | --- | --- | > | *Mode*| *R5 View* | *Linux view* | Notes | > | *Split Mode* | *start addr*| *start addr* | | > | R5_0 ATCM (64 KB) | 0x_ | 0xFFE0_ | | > | R5_0 BTCM (64 KB) | 0x0002_ | 0xFFE2_ | | > | R5_1 ATCM (64 KB) | 0x_ | 0xFFE9_ | alias of 0xFFE1_ | > | R5_1 BTCM (64 KB) | 0x0002_ | 0xFFEB_ | alias of 0xFFE3_ | > | ___ | ___ |___ | | > | *Lockstep Mode*| | | | > | R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_ | | > | R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_ | | > > References: > UG1085 TCM address space: > https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm/Tightly-Coupled-Memory-Address-Map > I have applied patches 1, 2 and 4 of this set. Patch 3 should to through Michal's tree. Thanks, Mathieu > --- > > prerequisite-patch-link: > https://lore.kernel.org/all/d4556268-8274-4089-949f-3b97d6779...@gmail.com/ > Base Branch: 6.9.rc2 > > Changes in v15: > - Use hardcode TCM addresses as fallback method if "reg" unavailable > - Use new bindings for r5fss subsystem > > Changes in v14: > - Add xlnx,tcm-mode property and use it for TCM configuration > - Add Versal and Versal-NET platform support > - Maintain backward compatibility for ZynqMP platform and use hardcode > TCM addresses > > Changes in v13: > - Have power-domains property for lockstep case instead of > keeping it flexible. > - Add "items:" list in power-domains property > > > Radhey Shyam Pandey (1): > dt-bindings: remoteproc: add Tightly Coupled Memory (TCM) bindings > > Tanmay Shah (3): > remoteproc: zynqmp: fix lockstep mode memory region > dts: zynqmp: add properties for TCM in remoteproc > remoteproc: zynqmp: parse TCM from device tree > > .../remoteproc/xlnx,zynqmp-r5fss.yaml | 279 -- > .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts | 8 + > arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 67 - > drivers/remoteproc/xlnx_r5_remoteproc.c | 273 + > 4 files changed, 459 insertions(+), 168 deletions(-) > > > base-commit: 4d5aabb6843939fad36912be8bf109adf9af0848 > -- > 2.25.1 >
Re: [PATCH v15 3/4] dts: zynqmp: add properties for TCM in remoteproc
On Fri, Apr 12, 2024 at 11:37:07AM -0700, Tanmay Shah wrote: > Add properties as per new bindings in zynqmp remoteproc node > to represent TCM address and size. > > This patch also adds alternative remoteproc node to represent > remoteproc cluster in split mode. By default lockstep mode is > enabled and users should disable it before using split mode > dts. Both device-tree nodes can't be used simultaneously one > of them must be disabled. For zcu102-1.0 and zcu102-1.1 board > remoteproc split mode dts node is enabled and lockstep mode > dts is disabled. > > Signed-off-by: Tanmay Shah > --- > .../boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts | 8 +++ > arch/arm64/boot/dts/xilinx/zynqmp.dtsi| 67 +-- > 2 files changed, 70 insertions(+), 5 deletions(-) > Reviewed-by: Mathieu Poirier > diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts > b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts > index c8f71a1aec89..495ca94b45db 100644 > --- a/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts > +++ b/arch/arm64/boot/dts/xilinx/zynqmp-zcu102-rev1.0.dts > @@ -14,6 +14,14 @@ / { > compatible = "xlnx,zynqmp-zcu102-rev1.0", "xlnx,zynqmp-zcu102", > "xlnx,zynqmp"; > }; > > +_split { > + status = "okay"; > +}; > + > +_lockstep { > + status = "disabled"; > +}; > + > { > #address-cells = <1>; > #size-cells = <1>; > diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi > b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi > index 25d20d803230..ef31b0fc73d1 100644 > --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi > +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi > @@ -260,19 +260,76 @@ fpga_full: fpga-full { > ranges; > }; > > - remoteproc { > + rproc_lockstep: remoteproc@ffe0 { > compatible = "xlnx,zynqmp-r5fss"; > xlnx,cluster-mode = <1>; > + xlnx,tcm-mode = <1>; > > - r5f-0 { > + #address-cells = <2>; > + #size-cells = <2>; > + > + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, > + <0x0 0x2 0x0 0xffe2 0x0 0x1>, > + <0x0 0x1 0x0 0xffe1 0x0 0x1>, > + <0x0 0x3 0x0 0xffe3 0x0 0x1>; > + > + r5f@0 { > + compatible = "xlnx,zynqmp-r5f"; > + reg = <0x0 0x0 0x0 0x1>, > + <0x0 0x2 0x0 0x1>, > + <0x0 0x1 0x0 0x1>, > + <0x0 0x3 0x0 0x1>; > + reg-names = "atcm0", "btcm0", "atcm1", "btcm1"; > + power-domains = <_firmware PD_RPU_0>, > + <_firmware PD_R5_0_ATCM>, > + <_firmware PD_R5_0_BTCM>, > + <_firmware PD_R5_1_ATCM>, > + <_firmware PD_R5_1_BTCM>; > + memory-region = <_0_fw_image>; > + }; > + > + r5f@1 { > + compatible = "xlnx,zynqmp-r5f"; > + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; > + reg-names = "atcm0", "btcm0"; > + power-domains = <_firmware PD_RPU_1>, > + <_firmware PD_R5_1_ATCM>, > + <_firmware PD_R5_1_BTCM>; > + memory-region = <_1_fw_image>; > + }; > + }; > + > + rproc_split: remoteproc-split@ffe0 { > + status = "disabled"; > + compatible = "xlnx,zynqmp-r5fss"; > + xlnx,cluster-mode = <0>; > + xlnx,tcm-mode = <0>; > + > + #address-cells = <2>; > + #size-cells = <2>; > + > + ranges = <0x0 0x0 0x0 0xffe0 0x0 0x1>, > + <0x0 0x2 0x0 0xffe2 0x0 0x1>, > + <0x1 0x0 0x0 0xffe9 0x0 0x1>, > + <0x1 0x2 0x0 0xffeb 0x0 0x1>; > + > + r5f@0 { > compatible = "xlnx,zynqmp-r5f"; > - power-domains = <_firmware PD_RPU_0>; > + reg = <0x0 0x0 0x0 0x1>, <0x0 0x2 0x0 0x1>; > + reg-names = "atcm0", "btcm0"; > + power-domains = <_firmware PD_RPU_0>, > + <_firmware PD_R5_0_ATCM>, > + <_firmware PD_R5_0_BTCM>; > memory-region = <_0_fw_image>; > }; > > - r5f-1 { > + r5f@1 { > compatible = "xlnx,zynqmp-r5f"; > - power-domains = <_firmware PD_RPU_1>; > + reg = <0x1 0x0 0x0 0x1>, <0x1 0x2 0x0 0x1>; > + reg-names = "atcm0", "btcm0"; > +
Re: [PATCH net-next v2 07/15] mm: page_frag: add '_va' suffix to page_frag API
On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote: > Currently most of the API for page_frag API is returning > 'virtual address' as output or expecting 'virtual address' > as input, in order to differentiate the API handling between > 'virtual address' and 'struct page', add '_va' suffix to the > corresponding API mirroring the page_pool_alloc_va() API of > the page_pool. > > Signed-off-by: Yunsheng Lin This patch is a total waste of time. By that logic we should be renaming __get_free_pages since it essentially does the same thing. This just seems like more code changes for the sake of adding code changes rather than fixing anything. In my opinion it should be dropped from the set.
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue Apr 16, 2024 at 5:54 PM EEST, Haitao Huang wrote: > I did declare the configs in the config file but I missed it in my patch > as stated earlier. IIUC, that would not cause this error though. > > Maybe I should exit with the skip code if no CGROUP_MISC (no more > CGROUP_SGX_EPC) is configured? OK, so I wanted to do a distro kernel test here, and used the default OpenSUSE kernel config. I need to check if it has CGROUP_MISC set. > tools/testing/selftests$ find . -name README > ./futex/README > ./tc-testing/README > ./net/forwarding/README > ./powerpc/nx-gzip/README > ./ftrace/README > ./arm64/signal/README > ./arm64/fp/README > ./arm64/README > ./zram/README > ./livepatch/README > ./resctrl/README So is there a README because of override timeout parameter? Maybe it should be just set to a high enough value? BR, Jarkko
Re: [PATCH v2] ftrace: Fix possible use-after-free issue in ftrace_location()
… > To fix it, we hold rcu lock as lookuping ftrace record, and call > synchronize_rcu() before freeing any ftrace pages. I suggest to convert this description into an imperative wording. Regards, Markus
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue, 16 Apr 2024 00:42:41 -0500, Huang, Kai wrote: I'll send a fixup for this patch or another version of the series if more changes are needed. Hi Haitao, I don't like to say but in general I think you are sending too frequently. The last version was sent April, 11th (my time), so considering the weekend it has only been 3 or at most 4 days. Please slow down a little bit to give people more time. More information please also see: https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders Thanks for the feedback. I'll slow down sending new versions. Haitao
[PATCH net-next] ipvs: allow some sysctls in non-init user namespaces
Let's make all IPVS sysctls visible and RO even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be writable: - conntrack - conn_reuse_mode - expire_nodest_conn - expire_quiescent_template I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This list of sysctls was chosen because I can't see any security risks allowing them and also Kubernetes uses [2] these specific sysctls. This patch is motivated by user request in the LXC project [1]. [1] https://github.com/lxc/lxc/issues/4278 [2] https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 Cc: Stéphane Graber Cc: Christian Brauner Cc: Julian Anastasov Cc: Simon Horman Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: Florian Westphal Signed-off-by: Alexander Mikhalitsyn --- net/netfilter/ipvs/ip_vs_ctl.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 143a341bbc0a..92a818c2f783 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -4285,10 +4285,22 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs) if (tbl == NULL) return -ENOMEM; - /* Don't export sysctls to unprivileged users */ + /* Let's show all sysctls in non-init user namespace-owned +* net namespaces, but make them read-only. +* +* Allow only a few specific sysctls to be writable. +*/ if (net->user_ns != _user_ns) { - tbl[0].procname = NULL; - ctl_table_size = 0; + for (idx = 0; idx < ARRAY_SIZE(vs_vars); idx++) { + if (!tbl[idx].procname) + continue; + + if (!((strcmp(tbl[idx].procname, "conntrack") == 0) || + (strcmp(tbl[idx].procname, "conn_reuse_mode") == 0) || + (strcmp(tbl[idx].procname, "expire_nodest_conn") == 0) || + (strcmp(tbl[idx].procname, "expire_quiescent_template") == 0))) + tbl[idx].mode = 0444; + } } } else tbl = vs_vars; -- 2.34.1
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue, 16 Apr 2024 09:10:12 -0500, Jarkko Sakkinen wrote: On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote: On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: > With different cgroups, the script starts one or multiple concurrent SGX > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed > test case, which loads an enclave of EPC size equal to the EPC capacity > available on the platform. The script checks results against the > expectation set for each cgroup and reports success or failure. > > The script creates 3 different cgroups at the beginning with following > expectations: > > 1) SMALL - intentionally small enough to fail the test loading an > enclave of size equal to the capacity. > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if > more than 4 concurrent tests are run. The script starts 4 expecting at > least one test to pass, and then starts 5 expecting at least one test > to fail. > 3) LARGER - limit is the same as the capacity, large enough to run lots of > concurrent tests. The script starts 8 of them and expects all pass. > Then it reruns the same test with one process randomly killed and > usage checked to be zero after all processes exit. > > The script also includes a test with low mem_cg limit and LARGE sgx_epc > limit to verify that the RAM used for per-cgroup reclamation is charged > to a proper mem_cg. For this test, it turns off swapping before start, > and turns swapping back on afterwards. > > Add README to document how to run the tests. > > Signed-off-by: Haitao Huang jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker TAP version 13 1..2 # timeout set to 45 # selftests: sgx: test_sgx # TAP version 13 # 1..16 # # Starting 16 tests from 1 test cases. # # RUN enclave.unclobbered_vdso ... # #OK enclave.unclobbered_vdso # ok 1 enclave.unclobbered_vdso # # RUN enclave.unclobbered_vdso_oversubscribed ... # #OK enclave.unclobbered_vdso_oversubscribed # ok 2 enclave.unclobbered_vdso_oversubscribed # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ... # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ... # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ... # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ... # #OK enclave.unclobbered_vdso_oversubscribed_remove # ok 3 enclave.unclobbered_vdso_oversubscribed_remove # # RUN
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > > > > I'll send a fixup for this patch or another version of the series if more > > changes are needed. > > Hi Haitao, > > I don't like to say but in general I think you are sending too frequently. > The > last version was sent April, 11th (my time), so considering the weekend it has > only been 3 or at most 4 days. > > Please slow down a little bit to give people more time. > > More information please also see: > > https://www.kernel.org/doc/html/next/process/submitting-patches.html#resend-reminders +1 Yes, exactly. I'd take one week break and cycle the kselftest part internally a bit as I said my previous response. I'm sure that there is experise inside Intel how to implement it properly. I.e. take some time to find the right person, and wait as long as that person has a bit of bandwidth to go through the test and suggest modifications. Cannot blame, as I've done the same mistake a few times in past but yeah this would be the best possible corrective action to take. BR, Jarkko
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue Apr 16, 2024 at 5:05 PM EEST, Jarkko Sakkinen wrote: > On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: > > With different cgroups, the script starts one or multiple concurrent SGX > > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed > > test case, which loads an enclave of EPC size equal to the EPC capacity > > available on the platform. The script checks results against the > > expectation set for each cgroup and reports success or failure. > > > > The script creates 3 different cgroups at the beginning with following > > expectations: > > > > 1) SMALL - intentionally small enough to fail the test loading an > > enclave of size equal to the capacity. > > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if > > more than 4 concurrent tests are run. The script starts 4 expecting at > > least one test to pass, and then starts 5 expecting at least one test > > to fail. > > 3) LARGER - limit is the same as the capacity, large enough to run lots of > > concurrent tests. The script starts 8 of them and expects all pass. > > Then it reruns the same test with one process randomly killed and > > usage checked to be zero after all processes exit. > > > > The script also includes a test with low mem_cg limit and LARGE sgx_epc > > limit to verify that the RAM used for per-cgroup reclamation is charged > > to a proper mem_cg. For this test, it turns off swapping before start, > > and turns swapping back on afterwards. > > > > Add README to document how to run the tests. > > > > Signed-off-by: Haitao Huang > > jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C > tools/testing/selftests/sgx run_tests > make: Entering directory > '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -c sigstruct.c -o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -c sign_key.S -o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o > gcc -Wall -Werror -g > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z > noexecstack -lcrypto > gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE > -fno-stack-protector -mrdrnd > -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include > test_encl.c test_encl_bootstrap.S -o > /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf > -Wl,-T,test_encl.lds,--build-id=none > /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: > warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable > stack > /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: > NOTE: This behaviour is deprecated and will be removed in a future version of > the linker > TAP version 13 > 1..2 > # timeout set to 45 > # selftests: sgx: test_sgx > # TAP version 13 > # 1..16 > # # Starting 16 tests from 1 test cases. > # # RUN enclave.unclobbered_vdso ... > # #OK enclave.unclobbered_vdso > # ok 1 enclave.unclobbered_vdso > # # RUN enclave.unclobbered_vdso_oversubscribed ... > # #OK enclave.unclobbered_vdso_oversubscribed > # ok 2 enclave.unclobbered_vdso_oversubscribed > # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... > # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave > with 98566144 bytes heap may take a while ... > # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of > 98566144 bytes to trimmed may take a while ... > # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run > EACCEPT for each page of 98566144 bytes may take a while ... > # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes > from enclave may take a while ... > # #OK enclave.unclobbered_vdso_oversubscribed_remove > # ok 3
Re: [PATCH v12 07/14] x86/sgx: Abstract tracking reclaimable pages in LRU
On Mon, 2024-04-15 at 20:20 -0700, Haitao Huang wrote: > From: Kristen Carlson Accardi > > The functions, sgx_{mark,unmark}_page_reclaimable(), manage the tracking > of reclaimable EPC pages: sgx_mark_page_reclaimable() adds a newly > allocated page into the global LRU list while > sgx_unmark_page_reclaimable() does the opposite. Abstract the hard coded > global LRU references in these functions to make them reusable when > pages are tracked in per-cgroup LRUs. > > Create a helper, sgx_lru_list(), that returns the LRU that tracks a given > EPC page. It simply returns the global LRU now, and will later return > the LRU of the cgroup within which the EPC page was allocated. Replace > the hard coded global LRU with a call to this helper. > > Next patches will first get the cgroup reclamation flow ready while > keeping pages tracked in the global LRU and reclaimed by ksgxd before we > make the switch in the end for sgx_lru_list() to return per-cgroup > LRU. I found the first paragraph hard to read. Provide my version below for your reference: " The SGX driver tracks reclaimable EPC pages via sgx_mark_page_reclaimable(), which adds the newly allocated page into the global LRU list. sgx_unmark_page_reclaimable() does the opposite. To support SGX EPC cgroup, the SGX driver will need to maintain an LRU list for each cgroup, and the new allocated EPC page will need to be added to the LRU of associated cgroup, but not always the global LRU list. When sgx_mark_page_reclaimable() is called, the cgroup that the new allocated EPC page belongs to is already known, i.e., it has been set to the 'struct sgx_epc_page'. Add a helper, sgx_lru_list(), to return the LRU that the EPC page should be/is added to for the given EPC page. Currently it just returns the global LRU. Change sgx_{mark|unmark}_page_reclaimable() to use the helper function to get the LRU from the EPC page instead of referring to the global LRU directly. This allows EPC page being able to be tracked in "per-cgroup" LRU when that becomes ready. " Nit: That being said, is sgx_epc_page_lru() better than sgx_lru_list()? > > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > Signed-off-by: Kristen Carlson Accardi > Co-developed-by: Haitao Huang > Signed-off-by: Haitao Huang > Reviewed-by: Jarkko Sakkinen > Tested-by: Jarkko Sakkinen > --- > Feel free to add: Reviewed-by: Kai Huang
Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Tue Apr 16, 2024 at 6:20 AM EEST, Haitao Huang wrote: > With different cgroups, the script starts one or multiple concurrent SGX > selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed > test case, which loads an enclave of EPC size equal to the EPC capacity > available on the platform. The script checks results against the > expectation set for each cgroup and reports success or failure. > > The script creates 3 different cgroups at the beginning with following > expectations: > > 1) SMALL - intentionally small enough to fail the test loading an > enclave of size equal to the capacity. > 2) LARGE - large enough to run up to 4 concurrent tests but fail some if > more than 4 concurrent tests are run. The script starts 4 expecting at > least one test to pass, and then starts 5 expecting at least one test > to fail. > 3) LARGER - limit is the same as the capacity, large enough to run lots of > concurrent tests. The script starts 8 of them and expects all pass. > Then it reruns the same test with one process randomly killed and > usage checked to be zero after all processes exit. > > The script also includes a test with low mem_cg limit and LARGE sgx_epc > limit to verify that the RAM used for per-cgroup reclamation is charged > to a proper mem_cg. For this test, it turns off swapping before start, > and turns swapping back on afterwards. > > Add README to document how to run the tests. > > Signed-off-by: Haitao Huang jarkko@mustatorvisieni:~/linux-tpmdd> sudo make -C tools/testing/selftests/sgx run_tests make: Entering directory '/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx' gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c main.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c load.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sigstruct.c -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c call.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -c sign_key.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o gcc -Wall -Werror -g -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include -fPIC -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_sgx /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/main.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/load.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sigstruct.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/call.o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/sign_key.o -z noexecstack -lcrypto gcc -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE -fno-stack-protector -mrdrnd -I/home/jarkko/linux-tpmdd/tools/testing/selftests/../../../tools/include test_encl.c test_encl_bootstrap.S -o /home/jarkko/linux-tpmdd/tools/testing/selftests/sgx/test_encl.elf -Wl,-T,test_encl.lds,--build-id=none /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: warning: /tmp/ccqvDJVg.o: missing .note.GNU-stack section implies executable stack /usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker TAP version 13 1..2 # timeout set to 45 # selftests: sgx: test_sgx # TAP version 13 # 1..16 # # Starting 16 tests from 1 test cases. # # RUN enclave.unclobbered_vdso ... # #OK enclave.unclobbered_vdso # ok 1 enclave.unclobbered_vdso # # RUN enclave.unclobbered_vdso_oversubscribed ... # #OK enclave.unclobbered_vdso_oversubscribed # ok 2 enclave.unclobbered_vdso_oversubscribed # # RUN enclave.unclobbered_vdso_oversubscribed_remove ... # # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an enclave with 98566144 bytes heap may take a while ... # # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of 98566144 bytes to trimmed may take a while ... # # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave to run EACCEPT for each page of 98566144 bytes may take a while ... # # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144 bytes from enclave may take a while ... # #OK enclave.unclobbered_vdso_oversubscribed_remove # ok 3 enclave.unclobbered_vdso_oversubscribed_remove # # RUN enclave.clobbered_vdso ... # #OK enclave.clobbered_vdso # ok 4 enclave.clobbered_vdso # # RUN enclave.clobbered_vdso_and_user_function ... # #OK
Re: [PATCH v12 05/14] x86/sgx: Implement basic EPC misc cgroup functionality
On Mon, 2024-04-15 at 20:20 -0700, Haitao Huang wrote: > From: Kristen Carlson Accardi > > SGX Enclave Page Cache (EPC) memory allocations are separate from normal > RAM allocations, and are managed solely by the SGX subsystem. The > existing cgroup memory controller cannot be used to limit or account for > SGX EPC memory, which is a desirable feature in some environments. For > instance, within a Kubernetes environment, while a user may specify a > particular EPC quota for a pod, the orchestrator requires a mechanism to > enforce that the pod's actual runtime EPC usage does not exceed the > allocated quota. > > Utilize the misc controller [admin-guide/cgroup-v2.rst, 5-9. Misc] to > limit and track EPC allocations per cgroup. Earlier patches have added > the "sgx_epc" resource type in the misc cgroup subsystem. Add basic > support in SGX driver as the "sgx_epc" resource provider: > > - Set "capacity" of EPC by calling misc_cg_set_capacity() > - Update EPC usage counter, "current", by calling charge and uncharge > APIs for EPC allocation and deallocation, respectively. > - Setup sgx_epc resource type specific callbacks, which perform > initialization and cleanup during cgroup allocation and deallocation, > respectively. > > With these changes, the misc cgroup controller enables users to set a hard > limit for EPC usage in the "misc.max" interface file. It reports current > usage in "misc.current", the total EPC memory available in > "misc.capacity", and the number of times EPC usage reached the max limit > in "misc.events". > > For now, the EPC cgroup simply blocks additional EPC allocation in > sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are > still tracked in the global active list, only reclaimed by the global > reclaimer when the total free page count is lower than a threshold. > > Later patches will reorganize the tracking and reclamation code in the > global reclaimer and implement per-cgroup tracking and reclaiming. > > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > Signed-off-by: Kristen Carlson Accardi > Co-developed-by: Haitao Huang > Signed-off-by: Haitao Huang > Reviewed-by: Jarkko Sakkinen > Reviewed-by: Tejun Heo > Tested-by: Jarkko Sakkinen I don't see any big issue, so feel free to add: Reviewed-by: Kai Huang Nitpickings below: [...] > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c > @@ -0,0 +1,72 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright(c) 2022-2024 Intel Corporation. */ > + > +#include > +#include It doesn't seem you need the above two here. Probably they are needed in later patches, in that case we can move to the relevant patch(es) that they got used. However I think it's better to explicitly include since kzalloc()/kfree() are used. Btw, I am not sure whether you want to use because looks it contains a lot of unrelated staff. Anyway I guess nobody cares. > +#include "epc_cgroup.h" > + > +/* The root SGX EPC cgroup */ > +static struct sgx_cgroup sgx_cg_root; The comment isn't necessary (sorry didn't notice before), because the code is pretty clear saying that IMHO. [...] > > --- /dev/null > +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h > @@ -0,0 +1,72 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _SGX_EPC_CGROUP_H_ > +#define _SGX_EPC_CGROUP_H_ > + > +#include I don't see why you need here. Also, ... > +#include > +#include > + > +#include "sgx.h" ... "sgx.h" already includes [...] > > +static inline struct sgx_cgroup *sgx_get_current_cg(void) > +{ > + /* get_current_misc_cg() never returns NULL when Kconfig enabled */ > + return sgx_cgroup_from_misc_cg(get_current_misc_cg()); > +} I spent some time looking into this. And yes if I was reading code correctly the get_current_misc_cg() should never return NULL when Kconfig is on. I typed my analysis below in [*]. And it would be helpful if any cgroup expert can have a second eye on this. [...] > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include Is this needed? I believe SGX variants in "epc_cgroup.h" should be enough for sgx/main.c? [...] [*] IIUC get_current_misc_cg() should never return NULL when Kconfig is on (code indent slight adjusted for text wrap). Firstly, during kernel boot there's always a valid @css allocated for MISC cgroup, regardless whether it is disabled in kernel command line. int __init cgroup_init(void) { ... for_each_subsys(ss, ssid) { if (ss->early_init) { ... } else { cgroup_init_subsys(ss, false); } ...
[PATCH v2] ftrace: Fix possible use-after-free issue in ftrace_location()
KASAN reports a bug: BUG: KASAN: use-after-free in ftrace_location+0x90/0x120 Read of size 8 at addr 888141d40010 by task insmod/424 CPU: 8 PID: 424 Comm: insmod Tainted: GW 6.9.0-rc2+ [...] Call Trace: dump_stack_lvl+0x68/0xa0 print_report+0xcf/0x610 kasan_report+0xb5/0xe0 ftrace_location+0x90/0x120 register_kprobe+0x14b/0xa40 kprobe_init+0x2d/0xff0 [kprobe_example] do_one_initcall+0x8f/0x2d0 do_init_module+0x13a/0x3c0 load_module+0x3082/0x33d0 init_module_from_file+0xd2/0x130 __x64_sys_finit_module+0x306/0x440 do_syscall_64+0x68/0x140 entry_SYSCALL_64_after_hwframe+0x71/0x79 The root cause is that when lookup_rec() is lookuping ftrace record of an address in ftrace pages of some module, and those ftrace pages may at the same time being freed in ftrace_release_mod() as the corresponding module is being deleted: register_kprobes() { check_kprobe_address_safe() { arch_check_ftrace_location() { ftrace_location() { lookup_rec() // access memory that has been freed by // ftrace_release_mod() !!! To fix it, we hold rcu lock as lookuping ftrace record, and call synchronize_rcu() before freeing any ftrace pages. Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization") Signed-off-by: Zheng Yejian --- kernel/trace/ftrace.c | 43 +++ 1 file changed, 27 insertions(+), 16 deletions(-) v2: - Use RCU lock instead of holding ftrace_lock as suggested by Steve. Link: https://lore.kernel.org/all/20240410112823.1d084...@gandalf.local.home/ v1: - Link: https://lore.kernel.org/all/20240401125543.1282845-1-zhengyeji...@huawei.com/ diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index da1710499698..2b41837a2fac 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1581,7 +1581,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end) } /** - * ftrace_location_range - return the first address of a traced location + * ftrace_location_range_rcu - return the first address of a traced location * if it touches the given ip range * @start: start of range to search. * @end: end of range to search (inclusive). @end points to the last byte @@ -1592,7 +1592,7 @@ static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end) * that is either a NOP or call to the function tracer. It checks the ftrace * internal tables to determine if the address belongs or not. */ -unsigned long ftrace_location_range(unsigned long start, unsigned long end) +static unsigned long ftrace_location_range_rcu(unsigned long start, unsigned long end) { struct dyn_ftrace *rec; @@ -1603,6 +1603,16 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) return 0; } +unsigned long ftrace_location_range(unsigned long start, unsigned long end) +{ + unsigned long loc; + + rcu_read_lock(); + loc = ftrace_location_range_rcu(start, end); + rcu_read_unlock(); + return loc; +} + /** * ftrace_location - return the ftrace location * @ip: the instruction pointer to check @@ -1614,25 +1624,22 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end) */ unsigned long ftrace_location(unsigned long ip) { - struct dyn_ftrace *rec; + unsigned long loc; unsigned long offset; unsigned long size; - rec = lookup_rec(ip, ip); - if (!rec) { + loc = ftrace_location_range(ip, ip); + if (!loc) { if (!kallsyms_lookup_size_offset(ip, , )) goto out; /* map sym+0 to __fentry__ */ if (!offset) - rec = lookup_rec(ip, ip + size - 1); + loc = ftrace_location_range(ip, ip + size - 1); } - if (rec) - return rec->ip; - out: - return 0; + return loc; } /** @@ -6596,6 +6603,7 @@ static int ftrace_process_locs(struct module *mod, /* We should have used all pages unless we skipped some */ if (pg_unuse) { WARN_ON(!skipped); + synchronize_rcu(); ftrace_free_pages(pg_unuse); } return ret; @@ -6809,6 +6817,8 @@ void ftrace_release_mod(struct module *mod) out_unlock: mutex_unlock(_lock); + if (tmp_page) + synchronize_rcu(); for (pg = tmp_page; pg; pg = tmp_page) { /* Needs to be called outside of ftrace_lock */ @@ -7142,6 +7152,7 @@ void ftrace_free_mem(struct module *mod, void *start_ptr, void *end_ptr) unsigned long start = (unsigned long)(start_ptr); unsigned long end = (unsigned long)(end_ptr); struct ftrace_page **last_pg = _pages_start; + struct ftrace_page *tmp_page = NULL; struct ftrace_page *pg; struct dyn_ftrace *rec;
Re: [RFC PATCH 3/7] module: prepare to handle ROX allocations for text
> On 11 Apr 2024, at 19:05, Mike Rapoport wrote: > > @@ -2440,7 +2479,24 @@ static int post_relocation(struct module *mod, const > struct load_info *info) > add_kallsyms(mod, info); > > /* Arch-specific module finalizing. */ > - return module_finalize(info->hdr, info->sechdrs, mod); > + ret = module_finalize(info->hdr, info->sechdrs, mod); > + if (ret) > + return ret; > + > + for_each_mod_mem_type(type) { > + struct module_memory *mem = >mem[type]; > + > + if (mem->is_rox) { > + if (!execmem_update_copy(mem->base, mem->rw_copy, > + mem->size)) > + return -ENOMEM; > + > + vfree(mem->rw_copy); > + mem->rw_copy = NULL; > + } > + } > + > + return 0; > } I might be missing something, but it seems a bit racy. IIUC, module_finalize() calls alternatives_smp_module_add(). At this point, since you don’t hold the text_mutex, some might do text_poke(), e.g., by enabling/disabling static-key, and the update would be overwritten. No?
Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages
On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote: > On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote: > > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote: > > > > > To populate the cache, a writable large page is allocated from vmalloc > > > with > > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as > > > ROX. > > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable) > > > +{ > > > + if (execmem_info->invalidate) > > > + execmem_info->invalidate(ptr, size, writable); > > > + else > > > + memset(ptr, 0, size); > > > +} > > > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable) > > +{ > > + /* fill memory with INT3 instructions */ > > + if (writeable) > > + memset(ptr, 0xcc, size); > > + else > > + text_poke_set(ptr, 0xcc, size); > > +} > > > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction. > > It raises #BP not #UD. > > Do you mean that _invalidate is a poor name choice or that it's necessary > to use an instruction that raises #UD? Poor naming, mostly. #BP handler will still scream bloody murder if the site is otherwise unclaimed. It just isn't an invalid instruction.
Re: [PATCH net-next v8] virtio_net: Support RX hash XDP hint
在 2024/4/16 下午2:19, Liang Chen 写道: The RSS hash report is a feature that's part of the virtio specification. Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost (still a work in progress as per [1]) support this feature. While the capability to obtain the RSS hash has been enabled in the normal path, it's currently missing in the XDP path. Therefore, we are introducing XDP hints through kfuncs to allow XDP programs to access the RSS hash. 1. https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r Signed-off-by: Liang Chen --- Changes from v7: - use table lookup for rss hash type Changes from v6: - fix a coding style issue Changes from v5: - Preservation of the hash value has been dropped, following the conclusion from discussions in V3 reviews. The virtio_net driver doesn't accessing/using the virtio_net_hdr after the XDP program execution, so nothing tragic should happen. As to the xdp program, if it smashes the entry in virtio header, it is likely buggy anyways. Additionally, looking up the Intel IGC driver, it also does not bother with this particular aspect. --- drivers/net/virtio_net.c| 42 + include/uapi/linux/virtio_net.h | 1 + 2 files changed, 43 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c22d1118a133..1d750009f615 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -4621,6 +4621,47 @@ static void virtnet_set_big_packets(struct virtnet_info *vi, const int mtu) } } +static enum xdp_rss_hash_type +virtnet_xdp_rss_type[VIRTIO_NET_HASH_REPORT_MAX_TABLE] = { + [VIRTIO_NET_HASH_REPORT_NONE] = XDP_RSS_TYPE_NONE, + [VIRTIO_NET_HASH_REPORT_IPv4] = XDP_RSS_TYPE_L3_IPV4, + [VIRTIO_NET_HASH_REPORT_TCPv4] = XDP_RSS_TYPE_L4_IPV4_TCP, + [VIRTIO_NET_HASH_REPORT_UDPv4] = XDP_RSS_TYPE_L4_IPV4_UDP, + [VIRTIO_NET_HASH_REPORT_IPv6] = XDP_RSS_TYPE_L3_IPV6, + [VIRTIO_NET_HASH_REPORT_TCPv6] = XDP_RSS_TYPE_L4_IPV6_TCP, + [VIRTIO_NET_HASH_REPORT_UDPv6] = XDP_RSS_TYPE_L4_IPV6_UDP, + [VIRTIO_NET_HASH_REPORT_IPv6_EX] = XDP_RSS_TYPE_L3_IPV6_EX, + [VIRTIO_NET_HASH_REPORT_TCPv6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX, + [VIRTIO_NET_HASH_REPORT_UDPv6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX +}; + +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, + enum xdp_rss_hash_type *rss_type) +{ + const struct xdp_buff *xdp = (void *)_ctx; + struct virtio_net_hdr_v1_hash *hdr_hash; + struct virtnet_info *vi; + u16 hash_report; + + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) + return -ENODATA; + + vi = netdev_priv(xdp->rxq->dev); + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len); + hash_report = __le16_to_cpu(hdr_hash->hash_report); + + if (hash_report >= VIRTIO_NET_HASH_REPORT_MAX_TABLE) + hash_report = VIRTIO_NET_HASH_REPORT_NONE; When this happens, it may mean an error or user modification of the header occurred. Should the following *hash* value be cleared to 0? Thanks, Heng + + *rss_type = virtnet_xdp_rss_type[hash_report]; + *hash = __le32_to_cpu(hdr_hash->hash_value); + return 0; +} + +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { + .xmo_rx_hash= virtnet_xdp_rx_hash, +}; + static int virtnet_probe(struct virtio_device *vdev) { int i, err = -ENOMEM; @@ -4747,6 +4788,7 @@ static int virtnet_probe(struct virtio_device *vdev) VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); dev->hw_features |= NETIF_F_RXHASH; + dev->xdp_metadata_ops = _xdp_metadata_ops; } if (vi->has_rss_hash_report) diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index cc65ef0f3c3e..3ee695450096 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -176,6 +176,7 @@ struct virtio_net_hdr_v1_hash { #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 #define VIRTIO_NET_HASH_REPORT_TCPv6_EX8 #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 +#define VIRTIO_NET_HASH_REPORT_MAX_TABLE 10 __le16 hash_report; __le16 padding; };
Re: [PATCHv3 2/2] usb: typec: anx7688: Add driver for ANX7688 USB-C HDMI bridge
On Tue, Apr 09, 2024 at 01:04:12PM +0200, Pavel Machek wrote: > Hi! > > > > This is driver for ANX7688 USB-C HDMI, with flashing and debugging > > > features removed. ANX7688 is rather criticial piece on PinePhone, > > > there's no display and no battery charging without it. > > > > > > There's likely more work to be done here, but having basic support > > > in mainline is needed to be able to work on the other stuff > > > (networking, cameras, power management). > > > > > > Signed-off-by: Ondrej Jirman > > > Co-developed-by: Martijn Braam > > > Co-developed-by: Samuel Holland > > > Signed-off-by: Pavel Machek > > > > Just couple of quick comments below - I did not have time to go over > > this very thoroughly, but I think you need to make a new version in > > any case because of comments in 1/2. > > Yes, there will be new version. > > There is ton of sleep primitives, and existing ones are okay. I can > change everything to fdelay or whatever primitive-of-the-day is, but > I'd rather not do pointless changes. > > You can ask for major changes, or complain about extra newlines, but > doing both in the same email does not make sense. I'm not telling you to do any major changes, yet. Right now I'm trying to understand why you are doing thing the way you are doing them. > > Btw, Co-developed-by comes before Signed-off-by I think. > > I believe this order is fine, too. > > > > +++ b/drivers/usb/typec/anx7688.c > > > @@ -0,0 +1,1830 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * ANX7688 USB-C HDMI bridge/PD driver > > > + * > > > + * Warning, this driver is somewhat PinePhone specific. > > > > So why not split this into core part and PinePhone specific glue > > part? > > Potentially a lot of work I can't really test and would rather not do. Glue layer is usually an easy way to organise your driver into smaller more manageable chunks, and most importantly, keep the core parts generic. But just to be clear here, it is not always necessary - maybe that's the case here. > > > + struct delayed_work work; > > > + struct timer_list work_timer; > > > + > > > + struct mutex lock; > > > > Undocumented lock. > > This is simple driver. How do you expect me to document it? Protects > this data structure, not exactly uncommon. You use a comment to describe what the lock protects. You can use "scripts/checkpatch.pl --strict" to spot this kind of stuff for you. > > > + > > > + /* wait for power to stabilize and release reset */ > > > + msleep(10); > > > + gpiod_set_value(anx7688->gpio_reset, 0); > > > + usleep_range(2, 4); > > > > Why not just use usleep_range() in both cases. > > It does not really make code any better. Can do if you insist. Consistency makes any code better. > > > +static int anx7688_connect(struct anx7688 *anx7688) > > > +{ > > > + struct typec_partner_desc desc = {}; > > > + int ret, i; > > > + u8 fw[2]; > > > + const u8 dp_snk_identity[16] = { > > > + 0x00, 0x00, 0x00, 0xec, /* id header */ > > > + 0x00, 0x00, 0x00, 0x00, /* cert stat */ > > > + 0x00, 0x00, 0x00, 0x00, /* product type */ > > > + 0x39, 0x00, 0x00, 0x51 /* alt mode adapter */ > > > + }; > > > + const u8 svid[4] = { > > > + 0x00, 0x00, 0x01, 0xff, > > > + }; > > > > Why not get those from DT? > > Are you sure it belongs to the DT (and that DT people will agree)? Yes, they belong to the DT, and there are already bindings. Dmitry gave you the link to the bindings I think (thanks for that Dmitry). > > > + u32 caps[8]; > > > + > > > + dev_dbg(anx7688->dev, "cable inserted\n"); > > > + > > > + anx7688->last_status = -1; > > > + anx7688->last_cc_status = -1; > > > + anx7688->last_dp_state = -1; > > > + > > > + msleep(10); > > > > Please make a comment here why you have to wait, and use > > usleep_range(). > > /* Dunno because working code does that and waiting for hardware to > settle down after cable insertion kind of looks like a good thing */ > > I did not write the driver, and there's no good documentation for this > chip. I can try to invent something, but... > > > > + i = 0; > > > + while (1) { > > > + ret = anx7688_reg_read(anx7688, > > > ANX7688_REG_EEPROM_LOAD_STATUS0); > > > + > > > + if (ret >= 0 && (ret & ANX7688_EEPROM_FW_LOADED) == > > > ANX7688_EEPROM_FW_LOADED) { > > > + dev_dbg(anx7688->dev, "eeprom0 = 0x%02x\n", ret); > > > + dev_dbg(anx7688->dev, "fw loaded after %d ms\n", i * > > > 10); > > > + break; > > > + } > > > + > > > + if (i > 99) { > > > + set_bit(ANX7688_F_FW_FAILED, anx7688->flags); > > > + dev_err(anx7688->dev, > > > + "boot firmware load failed (you may need to > > > flash FW to anx7688 first)\n"); > > > + ret = -ETIMEDOUT; > > > + goto err_vconoff; > > > + } > > > + msleep(5); > > > + i++; > > > + } > > > > You need to use
Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
On Mon, Apr 15, 2024 at 06:36:39PM +0100, Mark Rutland wrote: > On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote: > > On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote: > > > +/** > > > + * enum execmem_type - types of executable memory ranges > > > + * > > > + * There are several subsystems that allocate executable memory. > > > + * Architectures define different restrictions on placement, > > > + * permissions, alignment and other parameters for memory that can be > > > used > > > + * by these subsystems. > > > + * Types in this enum identify subsystems that allocate executable memory > > > + * and let architectures define parameters for ranges suitable for > > > + * allocations by each subsystem. > > > + * > > > + * @EXECMEM_DEFAULT: default parameters that would be used for types that > > > + * are not explcitly defined. > > > + * @EXECMEM_MODULE_TEXT: parameters for module text sections > > > + * @EXECMEM_KPROBES: parameters for kprobes > > > + * @EXECMEM_FTRACE: parameters for ftrace > > > + * @EXECMEM_BPF: parameters for BPF > > > + * @EXECMEM_TYPE_MAX: > > > + */ > > > +enum execmem_type { > > > + EXECMEM_DEFAULT, > > > + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT, > > > + EXECMEM_KPROBES, > > > + EXECMEM_FTRACE, > > > + EXECMEM_BPF, > > > + EXECMEM_TYPE_MAX, > > > +}; > > > > Can we please get a break-down of how all these types are actually > > different from one another? > > > > I'm thinking some platforms have a tiny immediate space (arm64 comes to > > mind) and has less strict placement constraints for some of them? > > Yeah, and really I'd *much* rather deal with that in arch code, as I have said > several times. > > For arm64 we have two bsaic restrictions: > > 1) Direct branches can go +/-128M >We can expand this range by having direct branches go to PLTs, at a >performance cost. > > 2) PREL32 relocations can go +/-2G >We cannot expand this further. > > * We don't need to allocate memory for ftrace. We do not use trampolines. > > * Kprobes XOL areas don't care about either of those; we don't place any > PC-relative instructions in those. Maybe we want to in future. > > * Modules care about both; we'd *prefer* to place them within +/-128M of all > other kernel/module code, but if there's no space we can use PLTs and expand > that to +/-2G. Since modules can refreence other modules, that ends up > actually being halved, and modules have to fit within some 2G window that > also covers the kernel. > > * I'm not sure about BPF's requirements; it seems happy doing the same as > modules. BPF are happy with vmalloc(). > So if we *must* use a common execmem allocator, what we'd reall want is our > own > types, e.g. > > EXECMEM_ANYWHERE > EXECMEM_NOPLT > EXECMEM_PREL32 > > ... and then we use those in arch code to implement module_alloc() and > friends. I'm looking at execmem_types more as definition of the consumers, maybe I should have named the enum execmem_consumer at the first place. And the arch constrains defined in struct execmem_range describe how memory should be allocated for each consumer. These constraints are defined early at boot and remain static, so initializing them once and letting a common allocator use them makes perfect sense to me. I agree that fallback_{start,end} are not ideal, but we have 3 architectures that have preferred and secondary range for modules. And arm and powerpc use the same logic for kprobes as well, and I don't see why this code should be duplicated. And, for instance, if you decide to place PC-relative instructions if kprobes XOL areas, you'd only need to update execmem_range for kprobes to be more like the range for modules. With central allocator it's easier to deal with the things like VM_FLUSH_RESET_PERMS and caching of ROX memory and I think it will be more maintainable that module_alloc(), alloc_insn_page() and bpf_jit_alloc_exec() spread all over the place. > Mark. -- Sincerely yours, Mike.
Re: [PATCH net-next v8] virtio_net: Support RX hash XDP hint
On Tue, Apr 16, 2024 at 2:20 PM Liang Chen wrote: > > The RSS hash report is a feature that's part of the virtio specification. > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost > (still a work in progress as per [1]) support this feature. While the > capability to obtain the RSS hash has been enabled in the normal path, > it's currently missing in the XDP path. Therefore, we are introducing > XDP hints through kfuncs to allow XDP programs to access the RSS hash. > > 1. > https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r > > Signed-off-by: Liang Chen > --- > Changes from v7: > - use table lookup for rss hash type > Changes from v6: > - fix a coding style issue > Changes from v5: > - Preservation of the hash value has been dropped, following the conclusion > from discussions in V3 reviews. The virtio_net driver doesn't > accessing/using the virtio_net_hdr after the XDP program execution, so > nothing tragic should happen. As to the xdp program, if it smashes the > entry in virtio header, it is likely buggy anyways. Additionally, looking > up the Intel IGC driver, it also does not bother with this particular > aspect. > --- > drivers/net/virtio_net.c| 42 + > include/uapi/linux/virtio_net.h | 1 + > 2 files changed, 43 insertions(+) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c22d1118a133..1d750009f615 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -4621,6 +4621,47 @@ static void virtnet_set_big_packets(struct > virtnet_info *vi, const int mtu) > } > } > > +static enum xdp_rss_hash_type > +virtnet_xdp_rss_type[VIRTIO_NET_HASH_REPORT_MAX_TABLE] = { > + [VIRTIO_NET_HASH_REPORT_NONE] = XDP_RSS_TYPE_NONE, > + [VIRTIO_NET_HASH_REPORT_IPv4] = XDP_RSS_TYPE_L3_IPV4, > + [VIRTIO_NET_HASH_REPORT_TCPv4] = XDP_RSS_TYPE_L4_IPV4_TCP, > + [VIRTIO_NET_HASH_REPORT_UDPv4] = XDP_RSS_TYPE_L4_IPV4_UDP, > + [VIRTIO_NET_HASH_REPORT_IPv6] = XDP_RSS_TYPE_L3_IPV6, > + [VIRTIO_NET_HASH_REPORT_TCPv6] = XDP_RSS_TYPE_L4_IPV6_TCP, > + [VIRTIO_NET_HASH_REPORT_UDPv6] = XDP_RSS_TYPE_L4_IPV6_UDP, > + [VIRTIO_NET_HASH_REPORT_IPv6_EX] = XDP_RSS_TYPE_L3_IPV6_EX, > + [VIRTIO_NET_HASH_REPORT_TCPv6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX, > + [VIRTIO_NET_HASH_REPORT_UDPv6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX > +}; > + > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, > + enum xdp_rss_hash_type *rss_type) > +{ > + const struct xdp_buff *xdp = (void *)_ctx; > + struct virtio_net_hdr_v1_hash *hdr_hash; > + struct virtnet_info *vi; > + u16 hash_report; > + > + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) > + return -ENODATA; > + > + vi = netdev_priv(xdp->rxq->dev); > + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len); > + hash_report = __le16_to_cpu(hdr_hash->hash_report); > + > + if (hash_report >= VIRTIO_NET_HASH_REPORT_MAX_TABLE) > + hash_report = VIRTIO_NET_HASH_REPORT_NONE; > + > + *rss_type = virtnet_xdp_rss_type[hash_report]; > + *hash = __le32_to_cpu(hdr_hash->hash_value); > + return 0; > +} > + > +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { > + .xmo_rx_hash= virtnet_xdp_rx_hash, > +}; > + > static int virtnet_probe(struct virtio_device *vdev) > { > int i, err = -ENOMEM; > @@ -4747,6 +4788,7 @@ static int virtnet_probe(struct virtio_device *vdev) > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); > > dev->hw_features |= NETIF_F_RXHASH; > + dev->xdp_metadata_ops = _xdp_metadata_ops; > } > > if (vi->has_rss_hash_report) > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h > index cc65ef0f3c3e..3ee695450096 100644 > --- a/include/uapi/linux/virtio_net.h > +++ b/include/uapi/linux/virtio_net.h > @@ -176,6 +176,7 @@ struct virtio_net_hdr_v1_hash { > #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 > #define VIRTIO_NET_HASH_REPORT_TCPv6_EX8 > #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 > +#define VIRTIO_NET_HASH_REPORT_MAX_TABLE 10 This should not be part of uAPI. It may confuse the userspace. Others look good. Thanks > __le16 hash_report; > __le16 padding; > }; > -- > 2.40.1 >
[PATCH net-next v8] virtio_net: Support RX hash XDP hint
The RSS hash report is a feature that's part of the virtio specification. Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost (still a work in progress as per [1]) support this feature. While the capability to obtain the RSS hash has been enabled in the normal path, it's currently missing in the XDP path. Therefore, we are introducing XDP hints through kfuncs to allow XDP programs to access the RSS hash. 1. https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r Signed-off-by: Liang Chen --- Changes from v7: - use table lookup for rss hash type Changes from v6: - fix a coding style issue Changes from v5: - Preservation of the hash value has been dropped, following the conclusion from discussions in V3 reviews. The virtio_net driver doesn't accessing/using the virtio_net_hdr after the XDP program execution, so nothing tragic should happen. As to the xdp program, if it smashes the entry in virtio header, it is likely buggy anyways. Additionally, looking up the Intel IGC driver, it also does not bother with this particular aspect. --- drivers/net/virtio_net.c| 42 + include/uapi/linux/virtio_net.h | 1 + 2 files changed, 43 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c22d1118a133..1d750009f615 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -4621,6 +4621,47 @@ static void virtnet_set_big_packets(struct virtnet_info *vi, const int mtu) } } +static enum xdp_rss_hash_type +virtnet_xdp_rss_type[VIRTIO_NET_HASH_REPORT_MAX_TABLE] = { + [VIRTIO_NET_HASH_REPORT_NONE] = XDP_RSS_TYPE_NONE, + [VIRTIO_NET_HASH_REPORT_IPv4] = XDP_RSS_TYPE_L3_IPV4, + [VIRTIO_NET_HASH_REPORT_TCPv4] = XDP_RSS_TYPE_L4_IPV4_TCP, + [VIRTIO_NET_HASH_REPORT_UDPv4] = XDP_RSS_TYPE_L4_IPV4_UDP, + [VIRTIO_NET_HASH_REPORT_IPv6] = XDP_RSS_TYPE_L3_IPV6, + [VIRTIO_NET_HASH_REPORT_TCPv6] = XDP_RSS_TYPE_L4_IPV6_TCP, + [VIRTIO_NET_HASH_REPORT_UDPv6] = XDP_RSS_TYPE_L4_IPV6_UDP, + [VIRTIO_NET_HASH_REPORT_IPv6_EX] = XDP_RSS_TYPE_L3_IPV6_EX, + [VIRTIO_NET_HASH_REPORT_TCPv6_EX] = XDP_RSS_TYPE_L4_IPV6_TCP_EX, + [VIRTIO_NET_HASH_REPORT_UDPv6_EX] = XDP_RSS_TYPE_L4_IPV6_UDP_EX +}; + +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, + enum xdp_rss_hash_type *rss_type) +{ + const struct xdp_buff *xdp = (void *)_ctx; + struct virtio_net_hdr_v1_hash *hdr_hash; + struct virtnet_info *vi; + u16 hash_report; + + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) + return -ENODATA; + + vi = netdev_priv(xdp->rxq->dev); + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len); + hash_report = __le16_to_cpu(hdr_hash->hash_report); + + if (hash_report >= VIRTIO_NET_HASH_REPORT_MAX_TABLE) + hash_report = VIRTIO_NET_HASH_REPORT_NONE; + + *rss_type = virtnet_xdp_rss_type[hash_report]; + *hash = __le32_to_cpu(hdr_hash->hash_value); + return 0; +} + +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { + .xmo_rx_hash= virtnet_xdp_rx_hash, +}; + static int virtnet_probe(struct virtio_device *vdev) { int i, err = -ENOMEM; @@ -4747,6 +4788,7 @@ static int virtnet_probe(struct virtio_device *vdev) VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); dev->hw_features |= NETIF_F_RXHASH; + dev->xdp_metadata_ops = _xdp_metadata_ops; } if (vi->has_rss_hash_report) diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index cc65ef0f3c3e..3ee695450096 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -176,6 +176,7 @@ struct virtio_net_hdr_v1_hash { #define VIRTIO_NET_HASH_REPORT_IPv6_EX 7 #define VIRTIO_NET_HASH_REPORT_TCPv6_EX8 #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9 +#define VIRTIO_NET_HASH_REPORT_MAX_TABLE 10 __le16 hash_report; __le16 padding; }; -- 2.40.1
Re: [PATCH net-next v7] virtio_net: Support RX hash XDP hint
On Mon, Apr 15, 2024 at 3:36 PM Jesper Dangaard Brouer wrote: > > > > On 13/04/2024 06.10, Liang Chen wrote: > > The RSS hash report is a feature that's part of the virtio specification. > > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost > > (still a work in progress as per [1]) support this feature. While the > > capability to obtain the RSS hash has been enabled in the normal path, > > it's currently missing in the XDP path. Therefore, we are introducing > > XDP hints through kfuncs to allow XDP programs to access the RSS hash. > > > > 1. > > https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r > > > > Signed-off-by: Liang Chen > > --- > >Changes from v6: > > - fix a coding style issue > >Changes from v5: > > - Preservation of the hash value has been dropped, following the conclusion > >from discussions in V3 reviews. The virtio_net driver doesn't > >accessing/using the virtio_net_hdr after the XDP program execution, so > >nothing tragic should happen. As to the xdp program, if it smashes the > >entry in virtio header, it is likely buggy anyways. Additionally, looking > >up the Intel IGC driver, it also does not bother with this particular > >aspect. > > --- > > drivers/net/virtio_net.c | 55 > > 1 file changed, 55 insertions(+) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c22d1118a133..2a1892b7b8d3 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -4621,6 +4621,60 @@ static void virtnet_set_big_packets(struct > > virtnet_info *vi, const int mtu) > > } > > } > > > > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash, > > +enum xdp_rss_hash_type *rss_type) > > +{ > > + const struct xdp_buff *xdp = (void *)_ctx; > > + struct virtio_net_hdr_v1_hash *hdr_hash; > > + struct virtnet_info *vi; > > + > > + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH)) > > + return -ENODATA; > > + > > + vi = netdev_priv(xdp->rxq->dev); > > + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - vi->hdr_len); > > + > > + switch (__le16_to_cpu(hdr_hash->hash_report)) { > > + case VIRTIO_NET_HASH_REPORT_TCPv4: > > + *rss_type = XDP_RSS_TYPE_L4_IPV4_TCP; > > + break; > > + case VIRTIO_NET_HASH_REPORT_UDPv4: > > + *rss_type = XDP_RSS_TYPE_L4_IPV4_UDP; > > + break; > > + case VIRTIO_NET_HASH_REPORT_TCPv6: > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP; > > + break; > > + case VIRTIO_NET_HASH_REPORT_UDPv6: > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP; > > + break; > > + case VIRTIO_NET_HASH_REPORT_TCPv6_EX: > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP_EX; > > + break; > > + case VIRTIO_NET_HASH_REPORT_UDPv6_EX: > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP_EX; > > + break; > > + case VIRTIO_NET_HASH_REPORT_IPv4: > > + *rss_type = XDP_RSS_TYPE_L3_IPV4; > > + break; > > + case VIRTIO_NET_HASH_REPORT_IPv6: > > + *rss_type = XDP_RSS_TYPE_L3_IPV6; > > + break; > > + case VIRTIO_NET_HASH_REPORT_IPv6_EX: > > + *rss_type = XDP_RSS_TYPE_L3_IPV6_EX; > > + break; > > + case VIRTIO_NET_HASH_REPORT_NONE: > > + default: > > + *rss_type = XDP_RSS_TYPE_NONE; > > + } > > Why is this not implemented as a table lookup? > Sure. Thanks for the suggestion! > Like: > > https://elixir.bootlin.com/linux/v6.9-rc4/source/drivers/net/ethernet/intel/igc/igc_main.c#L6652 > https://elixir.bootlin.com/linux/latest/A/ident/xdp_rss_hash_type > > --Jesper > > > + > > + *hash = __le32_to_cpu(hdr_hash->hash_value); > > + return 0; > > +} > > + > > +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = { > > + .xmo_rx_hash= virtnet_xdp_rx_hash, > > +}; > > + > > static int virtnet_probe(struct virtio_device *vdev) > > { > > int i, err = -ENOMEM; > > @@ -4747,6 +4801,7 @@ static int virtnet_probe(struct virtio_device *vdev) > > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX); > > > > dev->hw_features |= NETIF_F_RXHASH; > > + dev->xdp_metadata_ops = _xdp_metadata_ops; > > } > > > > if (vi->has_rss_hash_report)