Re: [PATCH v3] selftests: clone3: Use the capget and capset syscall directly
On 10/29/24 20:50, zhouyuhang wrote: From: zhouyuhang The libcap commit aca076443591 ("Make cap_t operations thread safe.") added a __u8 mutex at the beginning of the struct _cap_struct, it changes the offset of the members in the structure that breaks the assumption made in the "struct libcap" definition in clone3_cap_checkpoint_restore.c. This will cause the test case to fail with the following output: # RUN global.clone3_cap_checkpoint_restore ... # clone3() syscall supported # clone3_cap_checkpoint_restore.c:151:clone3_cap_checkpoint_restore:Child has PID 130508 cap_set_proc: Operation not permitted Sounds like EPERM is returned here. What's the error number from cap_set_proc(). # clone3_cap_checkpoint_restore.c:160:clone3_cap_checkpoint_restore:Expected set_capability() (-1) == 0 (0) What's the error number here? Looks like this test simply uses perror - it is better to use strerror() which includes the error number. Is this related EPERM? # clone3_cap_checkpoint_restore.c:161:clone3_cap_checkpoint_restore:Could not set CAP_CHECKPOINT_RESTORE # clone3_cap_checkpoint_restore: Test terminated by assertion # FAIL global.clone3_cap_checkpoint_restore Changing to using capget and capset syscall directly here can fix this error, just like what the commit 663af70aabb7 ("bpf: selftests: Add helpers to directly use the capget and capset syscall") does. Is this still accurate for v3 - Does this patch match the bpf commit? What is the output with this patch? Include it in the change log. Signed-off-by: zhouyuhang > --- Please mention the changes from v2 to v3 here so it makes it easier for reviewers associating the changes to the reviewer. I had to go look up v1 and v2. .../clone3/clone3_cap_checkpoint_restore.c| 58 +-- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c index 3c196fa86c99..8b61702bf721 100644 --- a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c +++ b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c @@ -27,6 +27,13 @@ #include "../kselftest_harness.h" #include "clone3_selftests.h" +/* + * Prevent not being defined in the header file + */ +#ifndef CAP_CHECKPOINT_RESTORE +#define CAP_CHECKPOINT_RESTORE 40 +#endif + static void child_exit(int ret) { fflush(stdout); @@ -87,47 +94,36 @@ static int test_clone3_set_tid(struct __test_metadata *_metadata, return ret; } -struct libcap { - struct __user_cap_header_struct hdr; - struct __user_cap_data_struct data[2]; -}; - static int set_capability(void) { - cap_value_t cap_values[] = { CAP_SETUID, CAP_SETGID }; - struct libcap *cap; - int ret = -1; - cap_t caps; - - caps = cap_get_proc(); - if (!caps) { - perror("cap_get_proc"); + struct __user_cap_data_struct data[2]; + struct __user_cap_header_struct hdr = { + .version = _LINUX_CAPABILITY_VERSION_3, cap_validate_magic() handles _LINUX_CAPABILITY_VERSION_1, _LINUX_CAPABILITY_VERSION_2, and _LINUX_CAPABILITY_VERSION_3 It would help to add a comment on why it is necessary to set the version here. + }; + __u32 cap0 = 1 << CAP_SETUID | 1 << CAP_SETGID; + __u32 cap1 = 1 << (CAP_CHECKPOINT_RESTORE - 32); Explain why this is necessary - a comment will help future maintenance of this code. + int ret; + + ret = capget(&hdr, data); + if (ret) { + perror("capget"); return -1; } /* Drop all capabilities */ - if (cap_clear(caps)) { - perror("cap_clear"); - goto out; - } - - cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_values, CAP_SET); - cap_set_flag(caps, CAP_PERMITTED, 2, cap_values, CAP_SET); + memset(&data, 0, sizeof(data)); - cap = (struct libcap *) caps; + data[0].effective |= cap0; + data[0].permitted |= cap0; - /* 40 -> CAP_CHECKPOINT_RESTORE */ - cap->data[1].effective |= 1 << (40 - 32); - cap->data[1].permitted |= 1 << (40 - 32); + data[1].effective |= cap1; + data[1].permitted |= cap1; - if (cap_set_proc(caps)) { - perror("cap_set_proc"); - goto out; + ret = capset(&hdr, data); + if (ret) { + perror("capset"); + return -1; } - ret = 0; -out: - if (cap_free(caps)) - perror("cap_free"); return ret; }
Re: [PATCH 1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote: > Zbigniew mentioned at Linux Plumber's that systemd is interested in > switching to execveat() for service execution, but can't, because the > contents of /proc/pid/comm are the file descriptor which was used, > instead of the path to the binary. This makes the output of tools like > top and ps useless, especially in a world where most fds are opened > CLOEXEC so the number is truly meaningless. > > [...] Applied to for-next/execve, thanks! [1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case https://git.kernel.org/kees/c/7bdc6fc85c9a [2/2] selftests/exec: add a test for execveat()'s comm https://git.kernel.org/kees/c/bd104872311a Take care, -- Kees Cook
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
Hi, On Wed, Oct 30, 2024 at 10:03 PM Paul E. McKenney wrote: > > > > > Note that: > > > > > > > > * Switching to real NMIs is impossible on many existing arm64 CPUs. > > > > The hardware support simply isn't there. Pseudo-NMIs should work fine > > > > and are (in nearly all cases) just as good as NMIs but they have a > > > > small performance impact. There are also compatibility issues with > > > > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI > > > > will work and needs to be able to fall back. Prior to my recent > > > > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now > > > > at least they fall back to regular IPIs. > > > > > > But those IPIs are of no help whatsoever when the target CPU is spinning > > > with interrupts disabled, which is the scenario under discussion. > > > > Right that we can't trace the target CPU spinning with interrupts > > disabled. > > You can by invoking sched_show_task(cpu_curr(cpu)) on the CPU requesting > the backtrace. The resulting backtrace will not be as good as you > would get from using an NMI, but if you don't have NMIs, it is better > than nothing. > > > ...but in the case where NMI isn't available it _still_ > > makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI > > because: > > > > 1. There are many cases where the trigger.*cpu.*backtrace() class of > > functions are used to trace CPUs that _aren't_ looping with interrupts > > disabled. We still want to be able to backtrace in that case. For > > instance, you can see in "panic.c" that there are cases where > > trigger_all_cpu_backtrace() is called. It's valuable to make cases > > like this (where there's no indication that a CPU is locked) actually > > work. > > > > 2. Even if there's a CPU that's looping with interrupts disabled and > > we can't trace it because of no NMI, it could still be useful to get > > backtraces of other CPUs. It's possible that what the other CPUs are > > doing could give a hint about the CPU that's wedged. This isn't a > > great hint, but some info is better than no info. > > And it also makes sense for an IRQ-based backtrace to fall back to > something like the aforementioned sched_show_task(cpu_curr(cpu)) if > the destination CPU has interrupts disabled. > > > > Hence my suggestion that arm64, when using IRQs to request stack > > > backtraces, have an additional short timeout (milliseconds) in > > > order to fall back to remote stack tracing. In many cases, this > > > fallback happens automatically, as you can see in dump_cpu_task(). > > > If trigger_single_cpu_backtrace() returns false, the stack is dumped > > > remotely. > > > > I think the answer here is that the API needs to change. The whole > > boolean return value for trigger.*cpu.*backtrace() is mostly ignored > > by callers. Yes, you've pointed at one of the two places that it's not > > ignored and it tries a reasonable fallback, but all the other callers > > just silently do nothing when the function returns false. That led to > > many places where arm64 devices were simply not getting _any_ > > stacktrace. > > > > Perhaps we need some type of API that says "I actually have a > > fallback, so if you don't think the stracktrace will succeed then it's > > OK to return false". > > Boolean is fine for trigger_single_cpu_backtrace(), which is directed at > a single CPU. And the one call to this function that does not currently > check its return value could just call dump_cpu_task() instead. > > There are only 20 or so uses of all of these functions, so the API can > change, give or take the pain involved in changing architecture-specific > code. Right. Falling back to "sched_show_task(cpu_curr(cpu))" or something similar if the IPI doesn't go through is a good idea. Indeed, falling back to that if the NMI doesn't go through is _also_ a good idea, right? ...and we don't want to change all 20 callers to all add a fallback. To that end, it feels like someone should change it so that the common code includes the fallback and we get rid of the boolean return value. > > > > * Even if we decided that we absolutely had to disable stacktrades on > > > > arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has > > > > been using regular IPIs for backtraces like this for many, many years. > > > > Maybe folks don't care as much about arm32 anymore but it feels bad if > > > > we totally break it. > > > > > > Because arm32 isn't used much for datacenter workloads, it will not > > > be suffering from this issue. Not enough to have motivated anyone to > > > fix it, anyway. In contrast, in the datacenter, CPUs really can and > > > do have interrupts disabled for many seconds. (No, this is not a good > > > thing, but it is common enough for us to need to avoid disabling other > > > debugging facilities.) > > > > > > So it might well be that arm32 will continue to do just fine with > > > IRQ-based stack tracing. Or maybe someday arm32 will also need to
Re: [PATCH RFT v12 0/8] fork: Support shadow stacks in clone3()
On Thu, 2024-10-31 at 19:25 +, Mark Brown wrote: > --- > base-commit: d17cd7b7cc92d37ee8b2df8f975fc859a261f4dc Where can I find this base commit? > change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Re: [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs
Bumping this for Mingwei Colton Lewis writes: Extend pmu_counters_test to AMD CPUs. As the AMD PMU is quite different from Intel with different events and feature sets, this series introduces a new code path to test it, specifically focusing on the core counters including the PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache counters exist, but are not as important and can be deferred to a later series. The first patch is a bug fix that could be submitted separately. The series has been tested on both Intel and AMD machines, but I have not found an AMD machine old enough to lack PerfCtrExtCore. I have made efforts that no part of the code has any dependency on its presence. I am aware of similar work in this direction done by Jinrong Liang [1]. He told me he is not working on it currently and I am not intruding by making my own submission. [1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudli...@tencent.com/ v2: * Test all combinations of VM setup rather than only the maximum allowed by hardware * Add fixes tag to bug fix in patch 1 * Refine some names v1: https://lore.kernel.org/kvm/20240813164244.751597-1-coltonle...@google.com/ Colton Lewis (6): KVM: x86: selftests: Fix typos in macro variable use KVM: x86: selftests: Define AMD PMU CPUID leaves KVM: x86: selftests: Set up AMD VM in pmu_counters_test KVM: x86: selftests: Test read/write core counters KVM: x86: selftests: Test core events KVM: x86: selftests: Test PerfMonV2 .../selftests/kvm/include/x86_64/processor.h | 7 + .../selftests/kvm/x86_64/pmu_counters_test.c | 304 -- 2 files changed, 277 insertions(+), 34 deletions(-) base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63 -- 2.46.0.662.g92d0881bb0-goog
Re: [PATCH] hardware_disable_test: remove unused macro
On Tue, 03 Sep 2024 12:31:35 +0800, Ba Jing wrote: > The macro GUEST_CODE_PIO_PORT is never referenced in the code, > just remove it. Applied to kvm-x86 selftests, thanks! [1/1] hardware_disable_test: remove unused macro https://github.com/kvm-x86/linux/commit/600aa88014e9 -- https://github.com/kvm-x86/linux/tree/next
Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information
> The question is, if only extended moversions are used, what new tooling > requirements are there? Can you test using only extended modversions? > > Luis I'm not sure precisely what you're asking for. Do you want: 1. A kconfig that suppresses the emission of today's MODVERSIONS format? This would be fairly easy to do, but I was leaving it enabled for compatibility's sake, at least until extended modversions become more common. This way existing `kmod` tools and kernels would continue to be able to load new-style modules. 2. libkmod support for parsing the new format? I can do that fairly easily too, but wanted the format actually decided on and accepted before I started modifying things that read modversions. 3. Something else? Maybe I'm not understanding your comment?
Re: [PATCH -next] KVM: selftests: Use ARRAY_SIZE for array length
On Fri, 13 Sep 2024 13:43:15 +0800, Jiapeng Chong wrote: > Use of macro ARRAY_SIZE to calculate array size minimizes > the redundant code and improves code reusability. > > ./tools/testing/selftests/kvm/x86_64/debug_regs.c:169:32-33: WARNING: Use > ARRAY_SIZE. Applied to kvm-x86 selftests, thanks! [1/1] KVM: selftests: Use ARRAY_SIZE for array length https://github.com/kvm-x86/linux/commit/f8912210eb21 -- https://github.com/kvm-x86/linux/tree/next
Re: [PATCH] kvm: selftest: fix noop test in guest_memfd_test.c
On Thu, 24 Oct 2024 10:59:53 +0100, Patrick Roy wrote: > The loop in test_create_guest_memfd_invalid that is supposed to test > that nothing is accepted as a valid flag to KVM_CREATE_GUEST_MEMFD was > initializing `flag` as 0 instead of BIT(0). This caused the loop to > immediately exit instead of iterating over BIT(0), BIT(1), ... . Applied to kvm-x86 fixes, thanks! [1/1] kvm: selftest: fix noop test in guest_memfd_test.c https://github.com/kvm-x86/linux/commit/fd5b88cc7fbf -- https://github.com/kvm-x86/linux/tree/next
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On 10/31, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 8:37 AM Stanislav Fomichev > wrote: > > > > On 10/30, Mina Almasry wrote: > > > On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev > > > wrote: > > > > > > > > On 10/30, Mina Almasry wrote: > > > > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev > > > > > wrote: > > > > > > > > > > > > The goal of the series is to simplify and make it possible to use > > > > > > ncdevmem in an automated way from the ksft python wrapper. > > > > > > > > > > > > ncdevmem is slowly mutated into a state where it uses stdout > > > > > > to print the payload and the python wrapper is added to > > > > > > make sure the arrived payload matches the expected one. > > > > > > > > > > > > v6: > > > > > > - fix compilation issue in 'Unify error handling' patch (Jakub) > > > > > > > > > > > > > > > > Since I saw a compilation failures on a couple of iterations I > > > > > cherry-picked this locally and tested compilation. I'm seeing this: > > > > > > > > Are you cherry picking the whole series or just this patch? It looks > > > > too broken. > > > > > > > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > > > > TARGETS=ncdevmem 2>&1 > > > > > make: Entering directory > > > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > > > > CC ncdevmem > > > > > In file included from ncdevmem.c:63: > > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: > > > > > warning: ‘enum ethtool_header_flags’ declared inside parameter list > > > > > will not be visible outside of this definition or declaration > > > > >23 | const char *ethtool_header_flags_str(enum > > > > > ethtool_header_flags value); > > > > > | ^~~~ > > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: > > > > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside > > > > > parameter list will not be visible outside of this definition or > > > > > declaration > > > > >25 | ethtool_module_fw_flash_status_str(enum > > > > > ethtool_module_fw_flash_status value); > > > > > | > > > > > ^~ > > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: > > > > > error: field ‘status’ has incomplete type > > > > > 6766 | enum ethtool_module_fw_flash_status status; > > > > > | ^~ > > > > > > > > This has been fixed via '#include ' > > > > > > > > > ncdevmem.c: In function ‘do_server’: > > > > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known > > > > > 517 | struct dmabuf_token token; > > > > > > > > And this, and the rest, don't make sense at all? > > > > > > > > I'll double check on my side. > > > > > > Oh, whoops, I forgot to headers_install first. This works for me: > > > > > > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install && > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > > TARGETS=ncdevmem 2>&1 > > > INSTALL ./usr/include > > > make: Entering directory > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > > make: Nothing to be done for 'all'. > > > make: Leaving directory > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem > > > ./tools/testing/selftests/drivers/net/hw/ncdevmem > > > > > > Sorry for the noise :D > > > > Whew, thanks and no worries! > > Sorry, 2 issues testing this series: Thank you for testing! > 1. ipv4 addresses seem broken, or maybe i'm using them wrong. > > Client command: > yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 | head -c > 1G | nc 192.168.1.4 5224 -p 5224 > > Server command and logs: > mina-1 /home/almasrymina # ./ncdevmem -s 192.168.1.4 -c 192.168.1.5 -l > -p 5224 -v 7 -f eth1 > here: ynl.c:887:ynl_req_trampoline > using queues 15..16 > Running: sudo ethtool -K eth1 ntuple off >&2 > Running: sudo ethtool -K eth1 ntuple on >&2 > Running: sudo ethtool -n eth1 | grep 'Filter:' | awk '{print $2}' | > xargs -n1 ethtool -N eth1 delete >&2 > ethtool: bad command line argument(s) > For more information run ethtool -h > here: ynl.c:887:ynl_req_trampoline > TCP header split: on > Running: sudo ethtool -X eth1 equal 15 >&2 > Running: sudo ethtool -N eth1 flow-type tcp6 src-ip 192.168.1.5 dst-ip > 192.168.1.4 src-port 5224 dst-port 5224 queue 15 >&2 > Invalid src-ip value[192.168.1.5] > ethtool: bad command line argument(s) > For more information run ethtool -h > ./ncdevmem: Failed to configure flow steering > > The ethtool co
Re: [PATCH net-next v2 4/4] selftests: hsr: Add test for VLAN
On 10/24/24 12:30, MD Danish Anwar wrote: > @@ -183,9 +232,21 @@ trap cleanup_all_ns EXIT > setup_hsr_interfaces 0 > do_complete_ping_test > > +# Run VLAN Test > +if $vlan; then > + setup_vlan_interfaces > + hsr_vlan_ping > +fi > + > setup_ns ns1 ns2 ns3 > > setup_hsr_interfaces 1 > do_complete_ping_test > > +# Run VLAN Test > +if $vlan; then > + setup_vlan_interfaces > + hsr_vlan_ping > +fi The new tests should be enabled by default. Indeed ideally the test script should be able to run successfully on kernel not supporting such feature; you could cope with that looking for the hsr exposed feature and skipping the vlan test when the relevant feature is not present. Cheers, Paolo
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On Thu, Oct 31, 2024 at 9:45 AM Mina Almasry wrote: > ... > > Sorry, 2 issues testing this series: > ... > > 2. Validation is now broken: > Validation is re-fixed with this diff: diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index 692c189bb5cc..494ae66d8abf 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -568,8 +568,7 @@ int do_server(struct memory_buffer *mem) if (do_validation) validate_buffer( - ((unsigned char *)tmp_mem) + - dmabuf_cmsg->frag_offset, + ((unsigned char *)tmp_mem), dmabuf_cmsg->frag_size); else print_nonzero_bytes(tmp_mem, dmabuf_cmsg->frag_size); Since memcpy_from_device copies to the beginning of tmp_mem, then the beginning tmp_mem should be passed to the validation. -- Thanks, Mina
Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test
On Thu, Oct 31, 2024, Pratik R. Sampat wrote: > Hi Sean, > > On 10/30/2024 12:57 PM, Sean Christopherson wrote: > > On Wed, Oct 30, 2024, Pratik R. Sampat wrote: > >> On 10/30/2024 8:46 AM, Sean Christopherson wrote: > >>> +/* Minimum firmware version required for the SEV-SNP support */ > >>> +#define SNP_FW_REQ_VER_MAJOR 1 > >>> +#define SNP_FW_REQ_VER_MINOR 51 > >>> > >>> Side topic, why are these hardcoded? And where did they come from? If > >>> they're > >>> arbitrary KVM selftests values, make that super duper clear. > >> > >> Well, it's not entirely arbitrary. This was the version that SNP GA'd > >> with first so that kind of became the minimum required version needed. > >> > >> I think the only place we've documented this is here - > >> https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware. > >> > >> Maybe, I can modify the comment above to say something like - > >> Minimum general availability release firmware required for SEV-SNP support. > > > > Hmm, so if AMD says SNP is only supported for firmware version >= 1.51, why > > on > > earth is that not checked and enforced by the kernel? Relying on userspace > > to > > not crash the host (or worse) because of unsupported firmware is not a > > winning > > strategy. > > We do check against the firmware level 1.51 while setting things up > first (drivers/crypto/ccp/sev-dev.c:__sev_snp_init_locked()) and we bail > out if it's otherwise. From the userspace, calls to KVM_SEV_INIT2 or any > other corresponding SNP calls should fail cleanly without any adverse > effects to the host. And I'm saying, that's not good enough. If the platform doesn't support SNP, the KVM *must not* advertise support for SNP. > From the positive selftest perspective though, we want to make sure it's > both supported and enabled, and skip the test if not. No, we want the test to assert that KVM reports SNP support if and only if SNP is 100% supported. > I believe we can tell if it's supported by the platform using the MSR - > MSR_AMD64_SEV_SNP_ENABLED or the X86_FEATURE_SEV_SNP from the KVM > capabilities. However, to determine if it's enabled from the kernel, I > made this check here. Having said that, I do agree that there should > probably be a better way to expose this support to the userspace. > > Thanks > Pratik
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On Wed, Oct 30, 2024 at 8:37 AM Stanislav Fomichev wrote: > > On 10/30, Mina Almasry wrote: > > On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev > > wrote: > > > > > > On 10/30, Mina Almasry wrote: > > > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev > > > > wrote: > > > > > > > > > > The goal of the series is to simplify and make it possible to use > > > > > ncdevmem in an automated way from the ksft python wrapper. > > > > > > > > > > ncdevmem is slowly mutated into a state where it uses stdout > > > > > to print the payload and the python wrapper is added to > > > > > make sure the arrived payload matches the expected one. > > > > > > > > > > v6: > > > > > - fix compilation issue in 'Unify error handling' patch (Jakub) > > > > > > > > > > > > > Since I saw a compilation failures on a couple of iterations I > > > > cherry-picked this locally and tested compilation. I'm seeing this: > > > > > > Are you cherry picking the whole series or just this patch? It looks > > > too broken. > > > > > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > > > TARGETS=ncdevmem 2>&1 > > > > make: Entering directory > > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > > > CC ncdevmem > > > > In file included from ncdevmem.c:63: > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: > > > > warning: ‘enum ethtool_header_flags’ declared inside parameter list > > > > will not be visible outside of this definition or declaration > > > >23 | const char *ethtool_header_flags_str(enum ethtool_header_flags > > > > value); > > > > | ^~~~ > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: > > > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside > > > > parameter list will not be visible outside of this definition or > > > > declaration > > > >25 | ethtool_module_fw_flash_status_str(enum > > > > ethtool_module_fw_flash_status value); > > > > | > > > > ^~ > > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: > > > > error: field ‘status’ has incomplete type > > > > 6766 | enum ethtool_module_fw_flash_status status; > > > > | ^~ > > > > > > This has been fixed via '#include ' > > > > > > > ncdevmem.c: In function ‘do_server’: > > > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known > > > > 517 | struct dmabuf_token token; > > > > > > And this, and the rest, don't make sense at all? > > > > > > I'll double check on my side. > > > > Oh, whoops, I forgot to headers_install first. This works for me: > > > > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install && > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > TARGETS=ncdevmem 2>&1 > > INSTALL ./usr/include > > make: Entering directory > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > make: Nothing to be done for 'all'. > > make: Leaving directory > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem > > ./tools/testing/selftests/drivers/net/hw/ncdevmem > > > > Sorry for the noise :D > > Whew, thanks and no worries! Sorry, 2 issues testing this series: 1. ipv4 addresses seem broken, or maybe i'm using them wrong. Client command: yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | tr \\n \\0 | head -c 1G | nc 192.168.1.4 5224 -p 5224 Server command and logs: mina-1 /home/almasrymina # ./ncdevmem -s 192.168.1.4 -c 192.168.1.5 -l -p 5224 -v 7 -f eth1 here: ynl.c:887:ynl_req_trampoline using queues 15..16 Running: sudo ethtool -K eth1 ntuple off >&2 Running: sudo ethtool -K eth1 ntuple on >&2 Running: sudo ethtool -n eth1 | grep 'Filter:' | awk '{print $2}' | xargs -n1 ethtool -N eth1 delete >&2 ethtool: bad command line argument(s) For more information run ethtool -h here: ynl.c:887:ynl_req_trampoline TCP header split: on Running: sudo ethtool -X eth1 equal 15 >&2 Running: sudo ethtool -N eth1 flow-type tcp6 src-ip 192.168.1.5 dst-ip 192.168.1.4 src-port 5224 dst-port 5224 queue 15 >&2 Invalid src-ip value[192.168.1.5] ethtool: bad command line argument(s) For more information run ethtool -h ./ncdevmem: Failed to configure flow steering The ethtool command to configure flow steering is not working for me. It thinks it's in v6 mode, when the ip address is a v4 address. (notice `-s 192.168.1.4 -c 192.168.1.5`). flow-type should be tcp in this case. Reverting patch 9e2da4faeccf ("Revert "selftests: ncdevmem: Switch to AF_IN
Re: [PATCH v4 2/2] vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
On 10/31/2024 09:16, Stefano Garzarella wrote: On Tue, Oct 29, 2024 at 09:49:54AM -0500, Konstantin Shkolnyy wrote: Change parameters of SO_VM_SOCKETS_* to uint64_t so that they are always In include/uapi/linux/vm_sockets.h we talk about "unsigned long long", but in the kernel code we use u64. IIUC "unsigned long long" should be u64 on every architecture, at least till we will have some 128-bit cpu, right? I'm not sure what "unsigned long long" would be on a 128-bit machine. What about using `unsigned long long` as documented in the vm_sockets.h? I use uint64_t because the kernel uses u64. I think, this way the code isn't vulnerable to potential variability of "unsigned long long". If we change to "unsigned long long" should we also change the kernel to "unsigned long long"?
Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test
Hi Sean, On 10/30/2024 12:57 PM, Sean Christopherson wrote: > On Wed, Oct 30, 2024, Pratik R. Sampat wrote: >> On 10/30/2024 8:46 AM, Sean Christopherson wrote: >>> +/* Minimum firmware version required for the SEV-SNP support */ >>> +#define SNP_FW_REQ_VER_MAJOR 1 >>> +#define SNP_FW_REQ_VER_MINOR 51 >>> >>> Side topic, why are these hardcoded? And where did they come from? If >>> they're >>> arbitrary KVM selftests values, make that super duper clear. >> >> Well, it's not entirely arbitrary. This was the version that SNP GA'd >> with first so that kind of became the minimum required version needed. >> >> I think the only place we've documented this is here - >> https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware. >> >> Maybe, I can modify the comment above to say something like - >> Minimum general availability release firmware required for SEV-SNP support. > > Hmm, so if AMD says SNP is only supported for firmware version >= 1.51, why on > earth is that not checked and enforced by the kernel? Relying on userspace to > not crash the host (or worse) because of unsupported firmware is not a winning > strategy. We do check against the firmware level 1.51 while setting things up first (drivers/crypto/ccp/sev-dev.c:__sev_snp_init_locked()) and we bail out if it's otherwise. From the userspace, calls to KVM_SEV_INIT2 or any other corresponding SNP calls should fail cleanly without any adverse effects to the host. >From the positive selftest perspective though, we want to make sure it's both supported and enabled, and skip the test if not. I believe we can tell if it's supported by the platform using the MSR - MSR_AMD64_SEV_SNP_ENABLED or the X86_FEATURE_SEV_SNP from the KVM capabilities. However, to determine if it's enabled from the kernel, I made this check here. Having said that, I do agree that there should probably be a better way to expose this support to the userspace. Thanks Pratik
Re: [PATCH net-next v11 12/23] ovpn: implement TCP transport
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote: > +static void ovpn_socket_release_work(struct work_struct *work) > +{ > + struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work); > + > + ovpn_socket_detach(sock->sock); > + kfree_rcu(sock, rcu); > +} > + > +static void ovpn_socket_schedule_release(struct ovpn_socket *sock) > +{ > + INIT_WORK(&sock->work, ovpn_socket_release_work); > + schedule_work(&sock->work); How does module unloading know that it has to wait for this work to complete? Will ovpn_cleanup get stuck until some refcount gets released by this work? [...] > +static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) > +{ > + struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp); > + struct strp_msg *msg = strp_msg(skb); > + size_t pkt_len = msg->full_len - 2; > + size_t off = msg->offset + 2; > + > + /* ensure skb->data points to the beginning of the openvpn packet */ > + if (!pskb_pull(skb, off)) { > + net_warn_ratelimited("%s: packet too small\n", > + peer->ovpn->dev->name); > + goto err; > + } > + > + /* strparser does not trim the skb for us, therefore we do it now */ > + if (pskb_trim(skb, pkt_len) != 0) { > + net_warn_ratelimited("%s: trimming skb failed\n", > + peer->ovpn->dev->name); > + goto err; > + } > + > + /* we need the first byte of data to be accessible > + * to extract the opcode and the key ID later on > + */ > + if (!pskb_may_pull(skb, 1)) { > + net_warn_ratelimited("%s: packet too small to fetch opcode\n", > + peer->ovpn->dev->name); > + goto err; > + } > + > + /* DATA_V2 packets are handled in kernel, the rest goes to user space */ > + if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) { > + /* hold reference to peer as required by ovpn_recv(). > + * > + * NOTE: in this context we should already be holding a > + * reference to this peer, therefore ovpn_peer_hold() is > + * not expected to fail > + */ > + if (WARN_ON(!ovpn_peer_hold(peer))) > + goto err; > + > + ovpn_recv(peer, skb); > + } else { > + /* The packet size header must be there when sending the packet > + * to userspace, therefore we put it back > + */ > + skb_push(skb, 2); > + ovpn_tcp_to_userspace(peer, strp->sk, skb); > + } > + > + return; > +err: > + netdev_err(peer->ovpn->dev, > +"cannot process incoming TCP data for peer %u\n", peer->id); This should also be ratelimited, and maybe just combined with the net_warn_ratelimited just before each goto. > + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); > + kfree_skb(skb); > + ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR); > +} [...] > +void ovpn_tcp_socket_detach(struct socket *sock) > +{ [...] > + /* restore CBs that were saved in ovpn_sock_set_tcp_cb() */ > + sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready; > + sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space; > + sock->sk->sk_prot = peer->tcp.sk_cb.prot; > + sock->sk->sk_socket->ops = peer->tcp.sk_cb.ops; > + rcu_assign_sk_user_data(sock->sk, NULL); > + > + rcu_read_unlock(); > + > + /* cancel any ongoing work. Done after removing the CBs so that these > + * workers cannot be re-armed > + */ I'm not sure whether a barrier is needed to prevent compiler/CPU reordering here. > + cancel_work_sync(&peer->tcp.tx_work); > + strp_done(&peer->tcp.strp); > +} > + > +static void ovpn_tcp_send_sock(struct ovpn_peer *peer) > +{ > + struct sk_buff *skb = peer->tcp.out_msg.skb; > + > + if (!skb) > + return; > + > + if (peer->tcp.tx_in_progress) > + return; > + > + peer->tcp.tx_in_progress = true; Sorry, I never answered your question about my concerns in a previous review here. We can reach ovpn_tcp_send_sock in two different contexts: - lock_sock (either from ovpn_tcp_sendmsg or ovpn_tcp_tx_work) - bh_lock_sock (from ovpn_tcp_send_skb, ie "data path") These are not fully mutually exclusive. lock_sock grabs bh_lock_sock (a spinlock) for a brief period to mark the (sleeping/mutex
Re: [PATCH net-next v2 2/4] net: hsr: Add VLAN CTAG filter support
On 10/24/24 12:30, MD Danish Anwar wrote: > From: Murali Karicheri > > This patch adds support for VLAN ctag based filtering at slave devices. > The slave ethernet device may be capable of filtering ethernet packets > based on VLAN ID. This requires that when the VLAN interface is created > over an HSR/PRP interface, it passes the VID information to the > associated slave ethernet devices so that it updates the hardware > filters to filter ethernet frames based on VID. This patch adds the > required functions to propagate the vid information to the slave > devices. > > Signed-off-by: Murali Karicheri > Signed-off-by: MD Danish Anwar > --- > net/hsr/hsr_device.c | 71 +++- > 1 file changed, 70 insertions(+), 1 deletion(-) > > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c > index 0ca47ebb01d3..ff586bdc2bde 100644 > --- a/net/hsr/hsr_device.c > +++ b/net/hsr/hsr_device.c > @@ -515,6 +515,68 @@ static void hsr_change_rx_flags(struct net_device *dev, > int change) > } > } > > +static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev, > +__be16 proto, u16 vid) > +{ > + struct hsr_port *port; > + struct hsr_priv *hsr; > + int ret = 0; > + > + hsr = netdev_priv(dev); > + > + hsr_for_each_port(hsr, port) { > + if (port->type == HSR_PT_MASTER) > + continue; If the desired behavior is to ignore INTERLINK port, I think you should explicitly skip them here, otherwise you will end-up in a nondeterministic state. > + ret = vlan_vid_add(port->dev, proto, vid); > + switch (port->type) { > + case HSR_PT_SLAVE_A: > + if (ret) { > + netdev_err(dev, "add vid failed for Slave-A\n"); > + return ret; > + } > + break; > + > + case HSR_PT_SLAVE_B: > + if (ret) { > + /* clean up Slave-A */ > + netdev_err(dev, "add vid failed for Slave-B\n"); > + vlan_vid_del(port->dev, proto, vid); This code relies on a specific port_list order - which is actually respected at list creation time. Still such assumption looks fragile and may lead to long term bugs. I think would be better to refactor the above loop handling arbitrary HSR_PT_SLAVE_A, HSR_PT_SLAVE_B order. Guestimate is that the complexity will not increase measurably. > + return ret; > + } > + break; > + default: > + break; > + } > + } > + > + return 0; > +} > + > +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev, > + __be16 proto, u16 vid) > +{ > + struct hsr_port *port; > + struct hsr_priv *hsr; > + > + hsr = netdev_priv(dev); > + > + hsr_for_each_port(hsr, port) { > + if (port->type == HSR_PT_MASTER) > + continue; I think it would be more consistent just removing the above statement... > + switch (port->type) { > + case HSR_PT_SLAVE_A: > + case HSR_PT_SLAVE_B: > + vlan_vid_del(port->dev, proto, vid); > + break; > + default:> + break; ... MASTER and INTERLINK port will be ignored anyway. > + } > + } > + > + return 0; > +} > + > static const struct net_device_ops hsr_device_ops = { > .ndo_change_mtu = hsr_dev_change_mtu, > .ndo_open = hsr_dev_open, Cheers, Paolo
Re: [PATCH net-next v11 12/23] ovpn: implement TCP transport
On 29/10/2024 11:47, Antonio Quartulli wrote: [...] + + /* DATA_V2 packets are handled in kernel, the rest goes to user space */ + if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) { + /* hold reference to peer as required by ovpn_recv(). +* +* NOTE: in this context we should already be holding a +* reference to this peer, therefore ovpn_peer_hold() is +* not expected to fail +*/ + if (WARN_ON(!ovpn_peer_hold(peer))) + goto err; + + ovpn_recv(peer, skb); + } else { As pointed out by Sabrina, we are indeed sending DATA_V1 packets to userspace. Not a big deal because userspace will likely ignore or drop them. However, I will change this and mirror what we do for UDP. Thanks. Regards, + /* The packet size header must be there when sending the packet +* to userspace, therefore we put it back +*/ + skb_push(skb, 2); + ovpn_tcp_to_userspace(peer, strp->sk, skb); + } + + return; -- Antonio Quartulli OpenVPN Inc.
Re: [PATCH v4 2/2] vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
On Tue, Oct 29, 2024 at 09:49:54AM -0500, Konstantin Shkolnyy wrote: Change parameters of SO_VM_SOCKETS_* to uint64_t so that they are always In include/uapi/linux/vm_sockets.h we talk about "unsigned long long", but in the kernel code we use u64. IIUC "unsigned long long" should be u64 on every architecture, at least till we will have some 128-bit cpu, right? 64-bit, because the corresponding kernel code requires them to be at least that large, no matter what architecture. Fixes: 5c338112e48a ("test/vsock: rework message bounds test") Fixes: 685a21c314a8 ("test/vsock: add big message test") Fixes: 542e893fbadc ("vsock/test: two tests to check credit update logic") Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Signed-off-by: Konstantin Shkolnyy --- tools/testing/vsock/vsock_perf.c | 2 +- tools/testing/vsock/vsock_test.c | 19 ++- 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c index 22633c2848cc..88f6be4162a6 100644 --- a/tools/testing/vsock/vsock_perf.c +++ b/tools/testing/vsock/vsock_perf.c @@ -33,7 +33,7 @@ static unsigned int port = DEFAULT_PORT; static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES; -static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES; +static uint64_t vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES; What about using `unsigned long long` as documented in the vm_sockets.h? Thanks, Stefano static bool zerocopy; static void error(const char *s) diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index 7fd25b814b4b..49a32515886f 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -429,7 +429,7 @@ static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) static void test_seqpacket_msg_bounds_server(const struct test_opts *opts) { - unsigned long sock_buf_size; + uint64_t sock_buf_size; unsigned long remote_hash; unsigned long curr_hash; int fd; @@ -634,7 +634,8 @@ static void test_seqpacket_timeout_server(const struct test_opts *opts) static void test_seqpacket_bigmsg_client(const struct test_opts *opts) { - unsigned long sock_buf_size; + uint64_t sock_buf_size; + size_t buf_size; socklen_t len; void *data; int fd; @@ -655,13 +656,19 @@ static void test_seqpacket_bigmsg_client(const struct test_opts *opts) sock_buf_size++; - data = malloc(sock_buf_size); + buf_size = (size_t) sock_buf_size; /* size_t can be < uint64_t */ + if (buf_size != sock_buf_size) { + fprintf(stderr, "Returned BUFFER_SIZE too large\n"); + exit(EXIT_FAILURE); + } + + data = malloc(buf_size); if (!data) { perror("malloc"); exit(EXIT_FAILURE); } - send_buf(fd, data, sock_buf_size, 0, -EMSGSIZE); + send_buf(fd, data, buf_size, 0, -EMSGSIZE); control_writeln("CLISENT"); @@ -1360,6 +1367,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts, int recv_buf_size; struct pollfd fds; size_t buf_size; + uint64_t sock_buf_size; void *buf; int fd; @@ -1370,9 +1378,10 @@ static void test_stream_credit_update_test(const struct test_opts *opts, } buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE; + sock_buf_size = buf_size; /* size_t can be < uint64_t */ if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, - &buf_size, sizeof(buf_size))) { + &sock_buf_size, sizeof(sock_buf_size))) { perror("setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)"); exit(EXIT_FAILURE); } -- 2.34.1
Re: [PATCH net-next v11 11/23] ovpn: store tunnel and transport statistics
2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote: > @@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) > goto drop; > } > > + /* increment RX stats */ > + ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len); > + ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len); [I don't know much about the userspace implementation, so maybe this is a silly question] What's the value of keeping track of 2 separate stats if they are incremented exactly at the same time? Packet count will be the same, and the difference in bytes will be just measuring the encap overhead. Should one of them be "packets/individual messages that get received over the UDP/TCP link" and the other "packets that get passed up to the stack"? > @@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) > goto err; > > skb_mark_not_on_list(skb); > + ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len); > + ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len); > > switch (peer->sock->sock->sk->sk_protocol) { > case IPPROTO_UDP: And on TX maybe something like "packets that the stack wants to send through the tunnel" and "packets that actually make it onto the UDP/TCP socket after encap/encrypt"? -- Sabrina
Re: [PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn
On 10/31/24 4:13 AM, Paolo Abeni wrote: > Given the issue is long-standing, and the fix is somewhat invasive, I > suggest steering this patch on net-next. > FWIW, I think net-next is best.
Re: [PATCH net-next v11 11/23] ovpn: store tunnel and transport statistics
On 31/10/2024 12:37, Sabrina Dubroca wrote: 2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote: @@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; } + /* increment RX stats */ + ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len); + ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len); [I don't know much about the userspace implementation, so maybe this is a silly question] What's the value of keeping track of 2 separate stats if they are incremented exactly at the same time? Packet count will be the same, and the difference in bytes will be just measuring the encap overhead. Should one of them be "packets/individual messages that get received over the UDP/TCP link" and the other "packets that get passed up to the stack"? You're correct: link_stats if "received over the TCP/UDP socket", while vpn_stats if what is passing through the ovpn virtual device. Packet count may not match though, for example when something happens between "received packet on the link" and "packet passed up to the device" (i.e. decryption error). This makes me wonder why we increment them at the very same place link_stats should be increased upon RX from the socket, while vpn_stats just before delivery. I'll double check. @@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) goto err; skb_mark_not_on_list(skb); + ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len); + ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len); switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: And on TX maybe something like "packets that the stack wants to send through the tunnel" and "packets that actually make it onto the UDP/TCP socket after encap/encrypt"? Correct. Same issue here. Increments should not happen back to back. Thanks a lot for spotting these. Regards, -- Antonio Quartulli OpenVPN Inc.
Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
On 31/10/2024 12:29, Sabrina Dubroca wrote: 2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ [...] + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well? If that's the case, then this is a bug in the TCP code. DATA_Vx packets are part of the data channel and userspace can't do anything with them (userspace handles the control channel only when the ovpn module is in use). I'll go check the TCP code then, because sending DATA_V1 to userspace is not expected. Thanks for noticing this discrepancy. + goto drop; + + /* unknown or control packet: let it bubble up to userspace */ + return 1; + } + + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the +* peer-id set to undef. In this case we skip the peer lookup by peer-id +* and we try with the transport address +*/ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id); + goto drop; + } + } + + if (!peer) { nit: that could be an "else" combined with the previous case? mhh that's true. Then I can combine the two "if (!peer)" in one block only. Thanks! Regards, + /* data packet with undef peer-id */ + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", + __func__); + goto drop; + } + } -- Antonio Quartulli OpenVPN Inc.
Re: [PATCH net v6] ipv6: Fix soft lockups in fib6_select_path under high next hop churn
On 10/25/24 09:30, Omid Ehtemam-Haghighi wrote: > Soft lockups have been observed on a cluster of Linux-based edge routers > located in a highly dynamic environment. Using the `bird` service, these > routers continuously update BGP-advertised routes due to frequently > changing nexthop destinations, while also managing significant IPv6 > traffic. The lockups occur during the traversal of the multipath > circular linked-list in the `fib6_select_path` function, particularly > while iterating through the siblings in the list. The issue typically > arises when the nodes of the linked list are unexpectedly deleted > concurrently on a different core—indicated by their 'next' and > 'previous' elements pointing back to the node itself and their reference > count dropping to zero. This results in an infinite loop, leading to a > soft lockup that triggers a system panic via the watchdog timer. > > Apply RCU primitives in the problematic code sections to resolve the > issue. Where necessary, update the references to fib6_siblings to > annotate or use the RCU APIs. > > Include a test script that reproduces the issue. The script > periodically updates the routing table while generating a heavy load > of outgoing IPv6 traffic through multiple iperf3 clients. It > consistently induces infinite soft lockups within a couple of minutes. > > Kernel log: > > 0 [bd13003e8d30] machine_kexec at 8ceaf3eb > 1 [bd13003e8d90] __crash_kexec at 8d0120e3 > 2 [bd13003e8e58] panic at 8cef65d4 > 3 [bd13003e8ed8] watchdog_timer_fn at 8d05cb03 > 4 [bd13003e8f08] __hrtimer_run_queues at 8cfec62f > 5 [bd13003e8f70] hrtimer_interrupt at 8cfed756 > 6 [bd13003e8fd0] __sysvec_apic_timer_interrupt at 8cea01af > 7 [bd13003e8ff0] sysvec_apic_timer_interrupt at 8df1b83d > -- -- > 8 [bd13003d3708] asm_sysvec_apic_timer_interrupt at 8e000ecb > [exception RIP: fib6_select_path+299] > RIP: 8ddafe7b RSP: bd13003d37b8 RFLAGS: 0287 > RAX: 975850b43600 RBX: 975850b40200 RCX: > RDX: 3fff RSI: 51d383e4 RDI: 975850b43618 > RBP: bd13003d3800 R8: R9: 975850b40200 > R10: R11: R12: bd13003d3830 > R13: 975850b436a8 R14: 975850b43600 R15: 0007 > ORIG_RAX: CS: 0010 SS: 0018 > 9 [bd13003d3808] ip6_pol_route at 8ddb030c > 10 [bd13003d3888] ip6_pol_route_input at 8ddb068c > 11 [bd13003d3898] fib6_rule_lookup at 8ddf02b5 > 12 [bd13003d3928] ip6_route_input at 8ddb0f47 > 13 [bd13003d3a18] ip6_rcv_finish_core.constprop.0 at 8dd950d0 > 14 [bd13003d3a30] ip6_list_rcv_finish.constprop.0 at 8dd96274 > 15 [bd13003d3a98] ip6_sublist_rcv at 8dd96474 > 16 [bd13003d3af8] ipv6_list_rcv at 8dd96615 > 17 [bd13003d3b60] __netif_receive_skb_list_core at 8dc16fec > 18 [bd13003d3be0] netif_receive_skb_list_internal at 8dc176b3 > 19 [bd13003d3c50] napi_gro_receive at 8dc565b9 > 20 [bd13003d3c80] ice_receive_skb at c087e4f5 [ice] > 21 [bd13003d3c90] ice_clean_rx_irq at c0881b80 [ice] > 22 [bd13003d3d20] ice_napi_poll at c088232f [ice] > 23 [bd13003d3d80] __napi_poll at 8dc18000 > 24 [bd13003d3db8] net_rx_action at 8dc18581 > 25 [bd13003d3e40] __do_softirq at 8df352e9 > 26 [bd13003d3eb0] run_ksoftirqd at 8ceffe47 > 27 [bd13003d3ec0] smpboot_thread_fn at 8cf36a30 > 28 [bd13003d3ee8] kthread at 8cf2b39f > 29 [bd13003d3f28] ret_from_fork at 8ce5fa64 > 30 [bd13003d3f50] ret_from_fork_asm at 8ce03cbb > > Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in > fib6_table") > Reported-by: Adrian Oliver > Signed-off-by: Omid Ehtemam-Haghighi > Cc: David S. Miller > Cc: David Ahern > Cc: Eric Dumazet > Cc: Jakub Kicinski > Cc: Paolo Abeni > Cc: Shuah Khan > Cc: Ido Schimmel > Cc: Kuniyuki Iwashima > Cc: Simon Horman > Cc: net...@vger.kernel.org > Cc: linux-kselft...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org Given the issue is long-standing, and the fix is somewhat invasive, I suggest steering this patch on net-next. Would that be ok for you? Thanks, Paolo
Re: [PATCH v8 2/3] modpost: Produce extended MODVERSIONS information
On Wed, Oct 30, 2024 at 11:05:03PM +, Matthew Maurer wrote: > diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig > index > e6b2427e5c190aacf7b9c5c1bb57fca39d311564..a31c617cd67d3d66b24d2fba34cbd5cc9c53ab78 > 100644 > --- a/kernel/module/Kconfig > +++ b/kernel/module/Kconfig > @@ -208,6 +208,16 @@ config ASM_MODVERSIONS > assembly. This can be enabled only when the target architecture > supports it. > > +config EXTENDED_MODVERSIONS > + bool "Extended Module Versioning Support" > + depends on MODVERSIONS > + help > + This enables extended MODVERSIONs support, allowing long symbol > + names to be versioned. > + > + The most likely reason you would enable this is to enable Rust > + support. If unsure, say N. > + The question is, if only extended moversions are used, what new tooling requirements are there? Can you test using only extended modversions? Luis
Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote: > +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > +{ [...] > + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); > + if (unlikely(opcode != OVPN_DATA_V2)) { > + /* DATA_V1 is not supported */ > + if (opcode == OVPN_DATA_V1) The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well? > + goto drop; > + > + /* unknown or control packet: let it bubble up to userspace */ > + return 1; > + } > + > + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); > + /* some OpenVPN server implementations send data packets with the > + * peer-id set to undef. In this case we skip the peer lookup by peer-id > + * and we try with the transport address > + */ > + if (peer_id != OVPN_PEER_ID_UNDEF) { > + peer = ovpn_peer_get_by_id(ovpn, peer_id); > + if (!peer) { > + net_err_ratelimited("%s: received data from unknown > peer (id: %d)\n", > + __func__, peer_id); > + goto drop; > + } > + } > + > + if (!peer) { nit: that could be an "else" combined with the previous case? > + /* data packet with undef peer-id */ > + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: received data with undef > peer-id from unknown source\n", > + __func__); > + goto drop; > + } > + } -- Sabrina
Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device
On Thu, Oct 31 2024 at 10:56, Frederic Weisbecker wrote: > On Thu, Oct 31, 2024 at 02:10:14PM +0530, Naresh Kamboju wrote: >> <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455 >> clockevents_register_device (kernel/time/clockevents.c:455 > > It's possible that I messed up something with clockevents. Yes. You added the warning :) > Can you try to reproduce with: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > timers/core I force pushed the branch with the warning removed. > I wish I could reproduce on my own but I don't have easy > access to such hardware. apt-get install qemu-system gives you access to all supported architectures :)
Re: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload
It seems some little changes to ovpn.yaml were not reflected in the generated files I committed. Specifically I changed some U32 to BE32 (IPv4 addresses) and files were not regenerated before committing. (I saw the failure in patchwork about this) It seems I'll have to send v12 after all. Cheers, On 29/10/2024 11:47, Antonio Quartulli wrote: Notable changes from v10: * extended commit message of 23/23 with brief description of the output * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777...@openvpn.net Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review. The latest code can also be found at: https://github.com/OpenVPN/linux-kernel-ovpn Thanks a lot! Best Regards, Antonio Quartulli OpenVPN Inc. --- Antonio Quartulli (23): netlink: add NLA_POLICY_MAX_LEN macro net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local UDP endpoint ovpn: add support for peer floating ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module Documentation/netlink/specs/ovpn.yaml | 362 +++ MAINTAINERS| 11 + drivers/net/Kconfig| 14 + drivers/net/Makefile |1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c| 54 + drivers/net/ovpn/bind.h| 117 + drivers/net/ovpn/crypto.c | 214 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 386 drivers/net/ovpn/crypto_aead.h | 33 + drivers/net/ovpn/io.c | 462 drivers/net/ovpn/io.h | 25 + drivers/net/ovpn/main.c| 337 +++ drivers/net/ovpn/main.h| 24 + drivers/net/ovpn/netlink-gen.c | 212 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1135 ++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnstruct.h | 61 + drivers/net/ovpn/packet.h | 40 + drivers/net/ovpn/peer.c| 1201 ++ drivers/net/ovpn/peer.h| 165 ++ drivers/net/ovpn/pktid.c | 130 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 104 + drivers/net/ovpn/skb.h | 56 + drivers/net/ovpn/socket.c | 178 ++ drivers/net/ovpn/socket.h | 55 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 506 + drivers/net/ovpn/tcp.h | 44 + drivers/net/ovpn/udp.c | 406 drivers/net/ovpn/udp.h | 26 + include/net/netlink.h |1 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 109 + include/uapi/linux/udp.h |1 + tools/net/ynl/ynl-gen-c.py |4 +- tools/testing/selftests/Makefile |1 + tools/testing/selftests/net/ovpn/.gitignore|2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config| 10 + tools/testing/selftests/net/ovpn/data64.key|5 + tools/testing/selftests/net/ovpn/ovpn-cli.c| 2370 tools/testing/selftests/net/ovpn/tcp_p
Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device
On Thu, Oct 31, 2024 at 10:42:45AM +0100, Thomas Gleixner wrote: > On Thu, Oct 31 2024 at 14:10, Naresh Kamboju wrote: > > The QEMU-ARM64 boot has failed with the Linux next-20241031 tag. > > The boot log shows warnings at clockevents_register_device and followed > > by rcu_preempt detected stalls. > > > > However, the system did not proceed far enough to reach the login prompt. > > The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed. > > > > Please find the incomplete boot log links below for your reference. > > The Qemu version is 9.0.2. > > <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455 > > clockevents_register_device (kernel/time/clockevents.c:455 > > <4>[ 0.225218] clockevents_register_device+0x170/0x188 P > > <4>[ 0.225367] clockevents_config_and_register+0x34/0x50 L > > <4>[ 0.225487] clockevents_config_and_register > > (kernel/time/clockevents.c:523) > > <4>[ 0.225553] arch_timer_starting_cpu > > (drivers/clocksource/arm_arch_timer.c:1034) > > <4>[ 0.225602] cpuhp_invoke_callback (kernel/cpu.c:194) > > <4>[ 0.225649] __cpuhp_invoke_callback_range (kernel/cpu.c:965) > > <4>[ 0.225691] notify_cpu_starting (kernel/cpu.c:1604) > > That's obvious what happens here. notify_cpu_starting() is invoked > before the CPU is marked online, which triggers the new check in > clockevents_register_device(). > > I removed the warning and force pushed the fixed up branch, so that > should be gone by tomorrow. Ah, phew! Thanks. > > Thanks, > > tglx >
Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device
Hi, On Thu, Oct 31, 2024 at 02:10:14PM +0530, Naresh Kamboju wrote: > The QEMU-ARM64 boot has failed with the Linux next-20241031 tag. > The boot log shows warnings at clockevents_register_device and followed > by rcu_preempt detected stalls. > > However, the system did not proceed far enough to reach the login prompt. > The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed. > > Please find the incomplete boot log links below for your reference. > The Qemu version is 9.0.2. > > This is always reproducible. > First seen on Linux next-20241031 tag. > Good: next-20241030 > Good: next-20241031 > > qemu-arm64: > boot: > * clang-19-lkftconfig > * gcc-13-lkftconfig > * clang-nightly-lkftconfig > > qemu-armv7: > boot: > * clang-19-lkftconfig > * gcc-13-lkftconfig > * clang-nightly-lkftconfig > > Reported-by: Linux Kernel Functional Testing > > Boot log: > --- > [0.00] Booting Linux on physical CPU 0x00 [0x000f0510] > 0.00] Linux version 6.12.0-rc5-next-20241031 (tuxmake@tuxmake) > (aarch64-linux-gnu-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils > for Debian) 2.43.1) #1 SMP PREEMPT @1730356841 > [0.00] KASLR enabled > [0.00] random: crng init done > [0.00] Machine model: linux,dummy-virt > > <6>[0.216503] GICv3: CPU1: found redistributor 1 region > 0:0x080c > <6>[0.218511] GICv3: CPU1: using allocated LPI pending table > @0x00010025 > <4>[0.220528] [ cut here ] > <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455 > clockevents_register_device (kernel/time/clockevents.c:455 It's possible that I messed up something with clockevents. Can you try to reproduce with: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core And if so it's possible that the bad commit is somewhere between: 17a8945f369c (clockevents: Improve clockevents_notify_released() comment) and bf9a001fb8e4 (clocksource/drivers/timer-tegra: Remove clockevents shutdown call on offlining) I wish I could reproduce on my own but I don't have easy access to such hardware. Thanks.
Re: next-20241031: kernel/time/clockevents.c:455 clockevents_register_device
On Thu, Oct 31 2024 at 14:10, Naresh Kamboju wrote: > The QEMU-ARM64 boot has failed with the Linux next-20241031 tag. > The boot log shows warnings at clockevents_register_device and followed > by rcu_preempt detected stalls. > > However, the system did not proceed far enough to reach the login prompt. > The fvp-aemva, Qemu-arm64, Qemu-armv7 and Qemu-riscv64 boot failed. > > Please find the incomplete boot log links below for your reference. > The Qemu version is 9.0.2. > <4>[ 0.220657] WARNING: CPU: 1 PID: 0 at kernel/time/clockevents.c:455 > clockevents_register_device (kernel/time/clockevents.c:455 > <4>[ 0.225218] clockevents_register_device+0x170/0x188 P > <4>[ 0.225367] clockevents_config_and_register+0x34/0x50 L > <4>[ 0.225487] clockevents_config_and_register (kernel/time/clockevents.c:523) > <4>[ 0.225553] arch_timer_starting_cpu > (drivers/clocksource/arm_arch_timer.c:1034) > <4>[ 0.225602] cpuhp_invoke_callback (kernel/cpu.c:194) > <4>[ 0.225649] __cpuhp_invoke_callback_range (kernel/cpu.c:965) > <4>[ 0.225691] notify_cpu_starting (kernel/cpu.c:1604) That's obvious what happens here. notify_cpu_starting() is invoked before the CPU is marked online, which triggers the new check in clockevents_register_device(). I removed the warning and force pushed the fixed up branch, so that should be gone by tomorrow. Thanks, tglx
Re: [PATCH v8 1/3] modules: Support extended MODVERSIONS info
On Wed, Oct 30, 2024 at 10:06:12PM -0700, Matthew Maurer wrote: > On Wed, Oct 30, 2024 at 9:37 PM Luis Chamberlain wrote: > > > > On Thu, Oct 31, 2024 at 12:22:36PM +1100, Michael Ellerman wrote: > > > Matthew Maurer writes: > > > > Adds a new format for MODVERSIONS which stores each field in a separate > > > > ELF section. This initially adds support for variable length names, but > > > > could later be used to add additional fields to MODVERSIONS in a > > > > backwards compatible way if needed. Any new fields will be ignored by > > > > old user tooling, unlike the current format where user tooling cannot > > > > tolerate adjustments to the format (for example making the name field > > > > longer). > > > > > > > > Since PPC munges its version records to strip leading dots, we reproduce > > > > the munging for the new format. Other architectures do not appear to > > > > have architecture-specific usage of this information. > > > > > > > > Reviewed-by: Sami Tolvanen > > > > Signed-off-by: Matthew Maurer > > > > --- > > > > arch/powerpc/kernel/module_64.c | 24 ++- > > > > > > Acked-by: Michael Ellerman (powerpc) > > > > Michael, Matthew, why make everyone deal with this instead of just > > making this an arch thing and ppc would be the only one doing it? > > > > Luis > > > > I'm not sure I understand - the PPC changes are in an arch-specific > directory, and triggered through the arch-implemented callback > mod_frob_arch_sections. What would you like done to make it more of an > arch-thing? Sorry, yes, I see that now, that's what I get for late night patch review. Nevermidn, this all looks good to me now. Luis
Re: [PATCH v5 00/19] Implement DWARF modversions
On Thu, Oct 31, 2024 at 2:56 AM Sedat Dilek wrote: > > On Wed, Oct 30, 2024 at 10:14 PM Sami Tolvanen > wrote: > > > > Hi Sedat, > > > > On Wed, Oct 30, 2024 at 2:00 PM Sedat Dilek wrote: > > > > > > Hi Sami, > > > > > > perfect timing: Nathan uploaded SLIM LLVM toolchain v19.1.3 > > > > > > KBUILD_GENDWARFKSYMS_STABLE is to be set manually? > > > What value is recommended? > > > > The usage is similar to KBUILD_SYMTYPES, you can just set > > KBUILD_GENDWARFKSYMS_STABLE=1 to use --stable when calculating > > versions. However, it's not normally necessary to set this flag at all > > when building your own kernel, it's mostly for distributions. > > > > Sami > > OK, thanks. > > # cat /proc/version > Linux version 6.12.0-rc5-1-amd64-clang19-kcfi > (sedat.di...@gmail.com@iniza) (ClangBuiltLinux clang version 19.1.3 > (https://github.com/llvm/llvm-project.git > ab51eccf88f5321e7c60591c5546b254b6afab99), ClangBuiltLinux LLD 19.1.3 > (https://github.com/llvm/llvm-project.git > ab51eccf88f5321e7c60591c5546b254b6afab99)) #1~trixie+dileks SMP > PREEMPT_DYNAMIC 2024-10-30 > > Tested-by: Sedat Dilek # LLVM/Clang v19.1.3 on x86-64 > Fix email-address in credit tag: Tested-by: Sedat Dilek # LLVM/Clang v19.1.3 on x86-64 -sed@-
Re: [PATCH v8 1/3] modules: Support extended MODVERSIONS info
On Wed, Oct 30, 2024 at 9:37 PM Luis Chamberlain wrote: > > On Thu, Oct 31, 2024 at 12:22:36PM +1100, Michael Ellerman wrote: > > Matthew Maurer writes: > > > Adds a new format for MODVERSIONS which stores each field in a separate > > > ELF section. This initially adds support for variable length names, but > > > could later be used to add additional fields to MODVERSIONS in a > > > backwards compatible way if needed. Any new fields will be ignored by > > > old user tooling, unlike the current format where user tooling cannot > > > tolerate adjustments to the format (for example making the name field > > > longer). > > > > > > Since PPC munges its version records to strip leading dots, we reproduce > > > the munging for the new format. Other architectures do not appear to > > > have architecture-specific usage of this information. > > > > > > Reviewed-by: Sami Tolvanen > > > Signed-off-by: Matthew Maurer > > > --- > > > arch/powerpc/kernel/module_64.c | 24 ++- > > > > Acked-by: Michael Ellerman (powerpc) > > Michael, Matthew, why make everyone deal with this instead of just > making this an arch thing and ppc would be the only one doing it? > > Luis > I'm not sure I understand - the PPC changes are in an arch-specific directory, and triggered through the arch-implemented callback mod_frob_arch_sections. What would you like done to make it more of an arch-thing?
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
On Wed, Oct 30, 2024 at 05:21:13PM -0700, Doug Anderson wrote: > Hi, > > On Wed, Oct 30, 2024 at 4:27 PM Paul E. McKenney wrote: > > > > On Wed, Oct 30, 2024 at 01:12:01PM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney > > > wrote: > > > > > > > > > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, > > > > > > so, > > > > > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like > > > > > > the name says. ;-) > > > > > > > > > > In the comments of following patch, the arm64 maintainers have > > > > > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a > > > > > bit confused and unsure which perspective is more reasonable. > > > > > > > > > > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/ > > > > > > > > I clearly need to have a chat with the arm64 maintainers, and thank > > > > you for checking. > > > > > > > > > > /* > > > > > > * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the > > > > > name, > > > > > > * nothing about it truly needs to be implemented using an NMI, it's > > > > > > * just that it's _allowed_ to work with NMIs. If > > > > > > ipi_should_be_nmi() > > > > > > * returned false our backtrace attempt will just use a regular IPI. > > > > > > */ > > > > > > > > > > > Alternatively, arm64 could continue using > > > > > > nmi_trigger_cpumask_backtrace() > > > > > > with normal interrupts (for example, on SoCs not implementing true > > > > > > NMIs), > > > > > > but have a short timeout (maybe a few jiffies?) after which its > > > > > > returns > > > > > > false (and presumably also cancels the backtrace request so that > > > > > > when > > > > > > the non-NMI interrupt eventually does happen, its handler simply > > > > > > returns > > > > > > without backtracing). This should be implemented using atomics to > > > > > > avoid > > > > > > deadlock issues. This alternative approach would provide accurate > > > > > > arm64 > > > > > > backtraces in the common case where interrupts are enabled, but > > > > > > allow > > > > > > a graceful fallback to remote tracing otherwise. > > > > > > > > > > > > Would you be interested in working this issue, whatever solution the > > > > > > arm64 maintainers end up preferring? > > > > > > > > > > The 10-second timeout is hard-coded in > > > > > nmi_trigger_cpumask_backtrace(). > > > > > It is shared code and not architecture-specific. Currently, I haven't > > > > > thought of a feasible solution. I have also CC'd the authors of the > > > > > aforementioned patch to see if they have any other ideas. > > > > > > > > It should be possible for arm64 to have an architecture-specific hook > > > > that enables them to use a much shorter timeout. Or, to eventually > > > > switch to real NMIs. > > > > > > Note that: > > > > > > * Switching to real NMIs is impossible on many existing arm64 CPUs. > > > The hardware support simply isn't there. Pseudo-NMIs should work fine > > > and are (in nearly all cases) just as good as NMIs but they have a > > > small performance impact. There are also compatibility issues with > > > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI > > > will work and needs to be able to fall back. Prior to my recent > > > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now > > > at least they fall back to regular IPIs. > > > > But those IPIs are of no help whatsoever when the target CPU is spinning > > with interrupts disabled, which is the scenario under discussion. > > Right that we can't trace the target CPU spinning with interrupts > disabled. You can by invoking sched_show_task(cpu_curr(cpu)) on the CPU requesting the backtrace. The resulting backtrace will not be as good as you would get from using an NMI, but if you don't have NMIs, it is better than nothing. > ...but in the case where NMI isn't available it _still_ > makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI > because: > > 1. There are many cases where the trigger.*cpu.*backtrace() class of > functions are used to trace CPUs that _aren't_ looping with interrupts > disabled. We still want to be able to backtrace in that case. For > instance, you can see in "panic.c" that there are cases where > trigger_all_cpu_backtrace() is called. It's valuable to make cases > like this (where there's no indication that a CPU is locked) actually > work. > > 2. Even if there's a CPU that's looping with interrupts disabled and > we can't trace it because of no NMI, it could still be useful to get > backtraces of other CPUs. It's possible that what the other CPUs are > doing could give a hint about the CPU that's wedged. This isn't a > great hint, but some info is better than no info. And it also makes sense for an IRQ-based backtrace to fall back to something like the aforementioned sched_show_task(cpu_curr(cpu)) if the destination CPU h
Re: [PATCH v8 1/3] modules: Support extended MODVERSIONS info
On Thu, Oct 31, 2024 at 12:22:36PM +1100, Michael Ellerman wrote: > Matthew Maurer writes: > > Adds a new format for MODVERSIONS which stores each field in a separate > > ELF section. This initially adds support for variable length names, but > > could later be used to add additional fields to MODVERSIONS in a > > backwards compatible way if needed. Any new fields will be ignored by > > old user tooling, unlike the current format where user tooling cannot > > tolerate adjustments to the format (for example making the name field > > longer). > > > > Since PPC munges its version records to strip leading dots, we reproduce > > the munging for the new format. Other architectures do not appear to > > have architecture-specific usage of this information. > > > > Reviewed-by: Sami Tolvanen > > Signed-off-by: Matthew Maurer > > --- > > arch/powerpc/kernel/module_64.c | 24 ++- > > Acked-by: Michael Ellerman (powerpc) Michael, Matthew, why make everyone deal with this instead of just making this an arch thing and ppc would be the only one doing it? Luis
Re: [PATCH v2] scripts: Remove export_report.pl
On Wed, Oct 30, 2024 at 05:16:34PM +, Matthew Maurer wrote: > This script has been broken for 5 years with no user complaints. > > It first had its .mod.c parser broken in commit a3d0cb04f7df ("modpost: > use __section in the output to *.mod.c"). Later, it had its object file > enumeration broken in commit f65a486821cf ("kbuild: change module.order > to list *.o instead of *.ko"). Both of these changes sat for years with > no reports. > > Rather than reviving this script as we make further changes to `.mod.c`, > this patch gets rid of it because it is clearly unused. > > Signed-off-by: Matthew Maurer Thanks! Applied and pushed! Luis
Re: [PATCH v5 00/19] Implement DWARF modversions
On Wed, Oct 30, 2024 at 10:14 PM Sami Tolvanen wrote: > > Hi Sedat, > > On Wed, Oct 30, 2024 at 2:00 PM Sedat Dilek wrote: > > > > Hi Sami, > > > > perfect timing: Nathan uploaded SLIM LLVM toolchain v19.1.3 > > > > KBUILD_GENDWARFKSYMS_STABLE is to be set manually? > > What value is recommended? > > The usage is similar to KBUILD_SYMTYPES, you can just set > KBUILD_GENDWARFKSYMS_STABLE=1 to use --stable when calculating > versions. However, it's not normally necessary to set this flag at all > when building your own kernel, it's mostly for distributions. > > Sami OK, thanks. # cat /proc/version Linux version 6.12.0-rc5-1-amd64-clang19-kcfi (sedat.di...@gmail.com@iniza) (ClangBuiltLinux clang version 19.1.3 (https://github.com/llvm/llvm-project.git ab51eccf88f5321e7c60591c5546b254b6afab99), ClangBuiltLinux LLD 19.1.3 (https://github.com/llvm/llvm-project.git ab51eccf88f5321e7c60591c5546b254b6afab99)) #1~trixie+dileks SMP PREEMPT_DYNAMIC 2024-10-30 Tested-by: Sedat Dilek # LLVM/Clang v19.1.3 on x86-64 Best regards, -Sedat-
Re: [PATCH v8 1/3] modules: Support extended MODVERSIONS info
Matthew Maurer writes: > Adds a new format for MODVERSIONS which stores each field in a separate > ELF section. This initially adds support for variable length names, but > could later be used to add additional fields to MODVERSIONS in a > backwards compatible way if needed. Any new fields will be ignored by > old user tooling, unlike the current format where user tooling cannot > tolerate adjustments to the format (for example making the name field > longer). > > Since PPC munges its version records to strip leading dots, we reproduce > the munging for the new format. Other architectures do not appear to > have architecture-specific usage of this information. > > Reviewed-by: Sami Tolvanen > Signed-off-by: Matthew Maurer > --- > arch/powerpc/kernel/module_64.c | 24 ++- Acked-by: Michael Ellerman (powerpc) cheers
Re: [PATCH net-next v6 1/3] connector/cn_proc: Add hash table for threads
On Wed, 23 Oct 2024 09:20:04 -0700 Anjali Kulkarni wrote: > Add a new type PROC_CN_MCAST_NOTIFY to proc connector API, which allows a > thread to notify the kernel that is going to exit with a non-zero exit > code and specify the exit code in it. When thread exits in the kernel, > it will send this exit code as a proc filter notification to any > listening process. While we wait for someone who knows about process management to review this - could you please split out the whitespace / checkpatch cleanup changes to a separate commit? They make the change harder to review. -- pw-bot: cr
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
Hi, On Wed, Oct 30, 2024 at 4:27 PM Paul E. McKenney wrote: > > On Wed, Oct 30, 2024 at 01:12:01PM -0700, Doug Anderson wrote: > > Hi, > > > > On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney wrote: > > > > > > > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so, > > > > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like > > > > > the name says. ;-) > > > > > > > > In the comments of following patch, the arm64 maintainers have > > > > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a > > > > bit confused and unsure which perspective is more reasonable. > > > > > > > > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/ > > > > > > I clearly need to have a chat with the arm64 maintainers, and thank > > > you for checking. > > > > > > > > /* > > > > > * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the > > > > name, > > > > > * nothing about it truly needs to be implemented using an NMI, it's > > > > > * just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi() > > > > > * returned false our backtrace attempt will just use a regular IPI. > > > > > */ > > > > > > > > > Alternatively, arm64 could continue using > > > > > nmi_trigger_cpumask_backtrace() > > > > > with normal interrupts (for example, on SoCs not implementing true > > > > > NMIs), > > > > > but have a short timeout (maybe a few jiffies?) after which its > > > > > returns > > > > > false (and presumably also cancels the backtrace request so that when > > > > > the non-NMI interrupt eventually does happen, its handler simply > > > > > returns > > > > > without backtracing). This should be implemented using atomics to > > > > > avoid > > > > > deadlock issues. This alternative approach would provide accurate > > > > > arm64 > > > > > backtraces in the common case where interrupts are enabled, but allow > > > > > a graceful fallback to remote tracing otherwise. > > > > > > > > > > Would you be interested in working this issue, whatever solution the > > > > > arm64 maintainers end up preferring? > > > > > > > > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace(). > > > > It is shared code and not architecture-specific. Currently, I haven't > > > > thought of a feasible solution. I have also CC'd the authors of the > > > > aforementioned patch to see if they have any other ideas. > > > > > > It should be possible for arm64 to have an architecture-specific hook > > > that enables them to use a much shorter timeout. Or, to eventually > > > switch to real NMIs. > > > > Note that: > > > > * Switching to real NMIs is impossible on many existing arm64 CPUs. > > The hardware support simply isn't there. Pseudo-NMIs should work fine > > and are (in nearly all cases) just as good as NMIs but they have a > > small performance impact. There are also compatibility issues with > > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI > > will work and needs to be able to fall back. Prior to my recent > > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now > > at least they fall back to regular IPIs. > > But those IPIs are of no help whatsoever when the target CPU is spinning > with interrupts disabled, which is the scenario under discussion. Right that we can't trace the target CPU spinning with interrupts disabled. ...but in the case where NMI isn't available it _still_ makes sense to make arch_trigger_cpumask_backtrace() fall back to IPI because: 1. There are many cases where the trigger.*cpu.*backtrace() class of functions are used to trace CPUs that _aren't_ looping with interrupts disabled. We still want to be able to backtrace in that case. For instance, you can see in "panic.c" that there are cases where trigger_all_cpu_backtrace() is called. It's valuable to make cases like this (where there's no indication that a CPU is locked) actually work. 2. Even if there's a CPU that's looping with interrupts disabled and we can't trace it because of no NMI, it could still be useful to get backtraces of other CPUs. It's possible that what the other CPUs are doing could give a hint about the CPU that's wedged. This isn't a great hint, but some info is better than no info. > Hence my suggestion that arm64, when using IRQs to request stack > backtraces, have an additional short timeout (milliseconds) in > order to fall back to remote stack tracing. In many cases, this > fallback happens automatically, as you can see in dump_cpu_task(). > If trigger_single_cpu_backtrace() returns false, the stack is dumped > remotely. I think the answer here is that the API needs to change. The whole boolean return value for trigger.*cpu.*backtrace() is mostly ignored by callers. Yes, you've pointed at one of the two places that it's not ignored and it tries a reasonable fallback, but all the other callers just silently do nothing when the function returns false. That led to many pl
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
On Wed, Oct 30, 2024 at 01:12:01PM -0700, Doug Anderson wrote: > Hi, > > On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney wrote: > > > > > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so, > > > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like > > > > the name says. ;-) > > > > > > In the comments of following patch, the arm64 maintainers have > > > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a > > > bit confused and unsure which perspective is more reasonable. > > > > > > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/ > > > > I clearly need to have a chat with the arm64 maintainers, and thank > > you for checking. > > > > > > /* > > > > * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the > > > name, > > > > * nothing about it truly needs to be implemented using an NMI, it's > > > > * just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi() > > > > * returned false our backtrace attempt will just use a regular IPI. > > > > */ > > > > > > > Alternatively, arm64 could continue using > > > > nmi_trigger_cpumask_backtrace() > > > > with normal interrupts (for example, on SoCs not implementing true > > > > NMIs), > > > > but have a short timeout (maybe a few jiffies?) after which its returns > > > > false (and presumably also cancels the backtrace request so that when > > > > the non-NMI interrupt eventually does happen, its handler simply returns > > > > without backtracing). This should be implemented using atomics to avoid > > > > deadlock issues. This alternative approach would provide accurate arm64 > > > > backtraces in the common case where interrupts are enabled, but allow > > > > a graceful fallback to remote tracing otherwise. > > > > > > > > Would you be interested in working this issue, whatever solution the > > > > arm64 maintainers end up preferring? > > > > > > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace(). > > > It is shared code and not architecture-specific. Currently, I haven't > > > thought of a feasible solution. I have also CC'd the authors of the > > > aforementioned patch to see if they have any other ideas. > > > > It should be possible for arm64 to have an architecture-specific hook > > that enables them to use a much shorter timeout. Or, to eventually > > switch to real NMIs. > > Note that: > > * Switching to real NMIs is impossible on many existing arm64 CPUs. > The hardware support simply isn't there. Pseudo-NMIs should work fine > and are (in nearly all cases) just as good as NMIs but they have a > small performance impact. There are also compatibility issues with > some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI > will work and needs to be able to fall back. Prior to my recent > changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now > at least they fall back to regular IPIs. But those IPIs are of no help whatsoever when the target CPU is spinning with interrupts disabled, which is the scenario under discussion. Hence my suggestion that arm64, when using IRQs to request stack backtraces, have an additional short timeout (milliseconds) in order to fall back to remote stack tracing. In many cases, this fallback happens automatically, as you can see in dump_cpu_task(). If trigger_single_cpu_backtrace() returns false, the stack is dumped remotely. > * Even if we decided that we absolutely had to disable stacktrades on > arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has > been using regular IPIs for backtraces like this for many, many years. > Maybe folks don't care as much about arm32 anymore but it feels bad if > we totally break it. Because arm32 isn't used much for datacenter workloads, it will not be suffering from this issue. Not enough to have motivated anyone to fix it, anyway. In contrast, in the datacenter, CPUs really can and do have interrupts disabled for many seconds. (No, this is not a good thing, but it is common enough for us to need to avoid disabling other debugging facilities.) So it might well be that arm32 will continue to do just fine with IRQ-based stack tracing. Or maybe someday arm32 will also need to deal with this same issue. But no "maybe" for arm64, given its increasing use in the datacenter. > IMO waiting 10 seconds for a backtrace is pretty crazy to begin with. > What about just changing that globally to 1 second? If not, it doesn't > feel like it would be impossibly hard to make an arch-specific > callback to choose the time and that callback could even take into > account whether we managed to get an NMI. I'd be happy to review such > a patch. Let's please keep the 10-second timeout for NMI-based backtraces. But I take it that you are not opposed to a shorter timeout for the special case of IRQ-based stack backtrace requests? Thanx, Paul
Re: [PATCH v2 07/13] media: i2c: imx214: Check number of lanes from device tree
Hi Ricardo, Am Mittwoch, dem 30.10.2024 um 12:38 +0100 schrieb Ricardo Ribalda Delgado: > On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay > wrote: > > > > From: André Apitzsch > > > > The imx214 camera is capable of either two-lane or four-lane > > operation. > > > > Currently only the four-lane mode is supported, as proper pixel > > rates > > and link frequences for the two-lane mode are unknown. > > > > Signed-off-by: André Apitzsch > > --- > > drivers/media/i2c/imx214.c | 26 +++--- > > 1 file changed, 19 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/media/i2c/imx214.c > > b/drivers/media/i2c/imx214.c > > index > > 0c83149bcc3e3b833a087d26104eb7dfaafdf904..497baad616ad7374a92a3da2b > > 7c1096b1d72a0c7 100644 > > --- a/drivers/media/i2c/imx214.c > > +++ b/drivers/media/i2c/imx214.c > > @@ -199,7 +199,6 @@ struct imx214 { > > > > /*From imx214_mode_tbls.h*/ > > static const struct cci_reg_sequence mode_4096x2304[] = { > > - { IMX214_REG_CSI_LANE_MODE, IMX214_CSI_4_LANE_MODE }, > > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH > > }, > > { IMX214_REG_EXPOSURE_RATIO, 1 }, > > @@ -272,7 +271,6 @@ static const struct cci_reg_sequence > > mode_4096x2304[] = { > > }; > > > > static const struct cci_reg_sequence mode_1920x1080[] = { > > - { IMX214_REG_CSI_LANE_MODE, IMX214_CSI_4_LANE_MODE }, > > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH > > }, > > { IMX214_REG_EXPOSURE_RATIO, 1 }, > > @@ -791,6 +789,13 @@ static int imx214_start_streaming(struct > > imx214 *imx214) > > return ret; > > } > > > > + ret = cci_write(imx214->regmap, IMX214_REG_CSI_LANE_MODE, > > + IMX214_CSI_4_LANE_MODE, NULL); > > + if (ret) { > > + dev_err(imx214->dev, "%s failed to configure > > lanes\n", __func__); > > + return ret; > > + } > > + > > ret = cci_multi_reg_write(imx214->regmap, imx214->cur_mode- > > >reg_table, > > imx214->cur_mode->num_of_regs, > > NULL); > > if (ret < 0) { > > @@ -932,7 +937,7 @@ static int imx214_get_regulators(struct device > > *dev, struct imx214 *imx214) > > imx214->supplies); > > } > > > > -static int imx214_parse_fwnode(struct device *dev) > > +static int imx214_parse_fwnode(struct device *dev, struct imx214 > > *imx214) > We don't seem to use imx214 in the function. You probably do not want > to add this change. > > { > > struct fwnode_handle *endpoint; > > struct v4l2_fwnode_endpoint bus_cfg = { > > @@ -951,6 +956,13 @@ static int imx214_parse_fwnode(struct device > > *dev) > > goto done; > > } > > > > + /* Check the number of MIPI CSI2 data lanes */ > > + if (bus_cfg.bus.mipi_csi2.num_data_lanes != 4) { > > + dev_err_probe(dev, -EINVAL, > > + "only 4 data lanes are currently > > supported\n"); > > + goto done; > > + } > > + > > for (i = 0; i < bus_cfg.nr_of_link_frequencies; i++) > > if (bus_cfg.link_frequencies[i] == > > IMX214_DEFAULT_LINK_FREQ) > > break; > > @@ -975,14 +987,14 @@ static int imx214_probe(struct i2c_client > > *client) > > struct imx214 *imx214; > > int ret; > > > > - ret = imx214_parse_fwnode(dev); > > - if (ret) > > - return ret; > > - > > imx214 = devm_kzalloc(dev, sizeof(*imx214), GFP_KERNEL); > > if (!imx214) > > return -ENOMEM; > > > > + ret = imx214_parse_fwnode(dev, imx214); > > + if (ret) > > + return ret; > I am not against changing the order... but the commit message does > not mention it. > I'm not sure how to argue why the order should be changed, now that the imx214 argument is gone. I'll restore the original order. It can be undone, when actually needed. Best regards, André > > + > > imx214->dev = dev; > > > > imx214->xclk = devm_clk_get(dev, NULL); > > > > -- > > 2.47.0 > > > >
Re: [PATCH RFT v11 2/8] Documentation: userspace-api: Add shadow stack API documentation
On Sat, Oct 05, 2024 at 11:31:29AM +0100, Mark Brown wrote: There are a number of architectures with shadow stack features which we are presenting to userspace with as consistent an API as we can (though there are some architecture specifics). Especially given that there are some important considerations for userspace code interacting directly with the feature let's provide some documentation covering the common aspects. Reviewed-by: Catalin Marinas Reviewed-by: Kees Cook Tested-by: Kees Cook Acked-by: Shuah Khan Signed-off-by: Mark Brown --- Documentation/userspace-api/index.rst| 1 + Documentation/userspace-api/shadow_stack.rst | 41 2 files changed, 42 insertions(+) diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 274cc7546efc2a042d2dc00aa67c71c52372179a..c39709bfba2c5682d0d1a22444db17c17bcf01ce 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -59,6 +59,7 @@ Everything else ELF netlink/index + shadow_stack sysfs-platform_profile vduse futex2 diff --git a/Documentation/userspace-api/shadow_stack.rst b/Documentation/userspace-api/shadow_stack.rst new file mode 100644 index ..c576ad3d7ec12f0f75bffa4e2bafd0c9d7230c9f --- /dev/null +++ b/Documentation/userspace-api/shadow_stack.rst @@ -0,0 +1,41 @@ += +Shadow Stacks += + +Introduction + + +Several architectures have features which provide backward edge +control flow protection through a hardware maintained stack, only +writeable by userspace through very limited operations. This feature +is referred to as shadow stacks on Linux, on x86 it is part of Intel +Control Enforcement Technology (CET), on arm64 it is Guarded Control +Stacks feature (FEAT_GCS) and for RISC-V it is the Zicfiss extension. +It is expected that this feature will normally be managed by the +system dynamic linker and libc in ways broadly transparent to +application code, this document covers interfaces and considerations. + + +Enabling + + +Shadow stacks default to disabled when a userspace process is +executed, they can be enabled for the current thread with a syscall: + + - For x86 the ARCH_SHSTK_ENABLE arch_prctl() I know when you started out, gcs and risc-v shadow stack patches were only catching up. But now that gcs patches are in -next and risc-v patches have also reached some maturity. And considering this generic generic shadow stack documentation, may be it's worth to mention arch agnostic prctls here for shadow stack (that will be used by arm64 and riscv)? What do you think? + +It is expected that this will normally be done by the dynamic linker. +Any new threads created by a thread with shadow stacks enabled will +themselves have shadow stacks enabled. + + +Enablement considerations += + +- Returning from the function that enables shadow stacks without first + disabling them will cause a shadow stack exception. nit: s/shadow stack exception/arch specific exception indicating control flow violation + This includes + any syscall wrapper or other library functions, the syscall will need + to be inlined. +- A lock feature allows userspace to prevent disabling of shadow stacks. +- Those that change the stack context like longjmp() or use of ucontext + changes on signal return will need support from libc. -- 2.39.2
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev wrote: > > On 10/29, Mina Almasry wrote: > > Check we're going to free a reasonable number of frags in token_count > > before starting the loop, to prevent looping too long. > > > > Also minor code cleanups: > > - Flip checks to reduce indentation. > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > Cc: Yi Lai > > > > Signed-off-by: Mina Almasry > > > > --- > > net/core/sock.c | 46 -- > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 7f398bd07fb7..8603b8d87f2e 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int > > bytes) > > > > #ifdef CONFIG_PAGE_POOL > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > > * 1 syscall. The limit exists to limit the amount of memory the kernel > > - * allocates to copy these tokens. > > + * allocates to copy these tokens, and to prevent looping over the frags > > for > > + * too long. > > */ > > -#define MAX_DONTNEED_TOKENS 128 > > +#define MAX_DONTNEED_FRAGS 1024 > > > > static noinline_for_stack int > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > optlen) > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > > optval, unsigned int optlen) > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > struct dmabuf_token *tokens; > > netmem_ref netmems[16]; > > + u64 num_frags = 0; > > int ret = 0; > > > > if (!sk_is_tcp(sk)) > > return -EBADF; > > > > - if (optlen % sizeof(struct dmabuf_token) || > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > + if (optlen % sizeof(*tokens) || > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > return -EINVAL; > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > + num_tokens = optlen / sizeof(*tokens); > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > > if (!tokens) > > return -ENOMEM; > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > if (copy_from_sockptr(tokens, optval, optlen)) { > > kvfree(tokens); > > return -EFAULT; > > } > > > > + for (i = 0; i < num_tokens; i++) { > > + num_frags += tokens[i].token_count; > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > + kvfree(tokens); > > + return -E2BIG; > > + } > > + } > > + > > xa_lock_bh(&sk->sk_user_frags); > > for (i = 0; i < num_tokens; i++) { > > for (j = 0; j < tokens[i].token_count; j++) { > > netmem_ref netmem = (__force netmem_ref)__xa_erase( > > &sk->sk_user_frags, tokens[i].token_start + > > j); > > > > - if (netmem && > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > - netmems[netmem_num++] = netmem; > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > - xa_unlock_bh(&sk->sk_user_frags); > > - for (k = 0; k < netmem_num; k++) > > - > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > - netmem_num = 0; > > - xa_lock_bh(&sk->sk_user_frags); > > - } > > - ret++; > > [..] > > > + if (!netmem || > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > + continue; > > Any reason we are not returning explicit error to the callers here? > That probably needs some mechanism to signal which particular one failed > so the users can restart? Only because I can't think of a simple way to return an array of frags failed to DONTNEED to the user. Also, this error should be extremely rare or never hit really. I don't know how we end up not finding a netmem here or the netmem is page. The only way is if the user is malicious (messing with the token ids passed to the kernel) or if a kernel bug is happening. Also, the information is useless to the user. If the user sees 'frag 128 failed to free'. There is nothing really the user can do to recover at runtime. Only usefulness that could come is for the user to log the error. We already WARN_ON_ONCE on the error the user would not be able to trigger. -- Thanks, Mina
Re: [PATCH v5 00/19] Implement DWARF modversions
Hi Sedat, On Wed, Oct 30, 2024 at 2:00 PM Sedat Dilek wrote: > > Hi Sami, > > perfect timing: Nathan uploaded SLIM LLVM toolchain v19.1.3 > > KBUILD_GENDWARFKSYMS_STABLE is to be set manually? > What value is recommended? The usage is similar to KBUILD_SYMTYPES, you can just set KBUILD_GENDWARFKSYMS_STABLE=1 to use --stable when calculating versions. However, it's not normally necessary to set this flag at all when building your own kernel, it's mostly for distributions. Sami
Re: [PATCH RFT v11 1/8] arm64/gcs: Return a success value from gcs_alloc_thread_stack()
On Sat, Oct 05, 2024 at 11:31:28AM +0100, Mark Brown wrote: Currently as a result of templating from x86 code gcs_alloc_thread_stack() returns a pointer as an unsigned int however on arm64 we don't actually use this pointer value as anything other than a pass/fail flag. Simplify the interface to just return an int with 0 on success and a negative error code on failure. Signed-off-by: Mark Brown Acked-by: Deepak Gupta
Re: [PATCH v5 00/19] Implement DWARF modversions
On Wed, Oct 30, 2024 at 6:01 PM Sami Tolvanen wrote: > > Hi, > > Here's v5 of the DWARF modversions series. The main motivation is > modversions support for Rust, which is important for distributions > like Android that are about to ship Rust kernel modules. Per Luis' > request [1], v2 dropped the Rust specific bits from the series and > instead added the feature as an option for the entire kernel to > make it easier to evaluate the benefits of this approach, and to > get better test coverage. Matt is addressing Rust modversion_info > compatibility issues in a separate patch set [2] that depends on this > series, and actually allows modversions to be enabled with Rust. > > Short background: Unlike C, Rust source code doesn't have sufficient > information about the final ABI, as the compiler has considerable > freedom in adjusting structure layout, for example, which makes > using a source code parser like genksyms a non-starter. Based on > earlier feedback, this series uses DWARF debugging information for > computing versions. DWARF is an established and a relatively stable > format, which includes all the necessary ABI details, and adding a > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a > reasonable trade-off as most distributions already enable it. > > The first patch moves the genksyms CRC32 implementation to a shared > header file to avoid code duplication and the next 15 patches add > gendwarfksyms, a tool for computing symbol versions from DWARF. When > passed a list of exported symbols and object files, the tool > generates an expanded type string for each symbol and computes symbol > CRCs similarly to genksyms. gendwarfksyms is written in C and uses > libdw to process DWARF. Patch 17 ensures that debugging information > is present where we need it, patch 18 adds gendwarfksyms as an > alternative to genksyms, and the last patch adds documentation. > > v5 is based on v6.12-rc5 and for your convenience the series is also > available here: > > https://github.com/samitolvanen/linux/commits/gendwarfksyms-v5 > > If you also want to test the series with Rust modules, this branch > adds Matt's latest modversion_info series: > > https://github.com/samitolvanen/linux/commits/rustmodversions-v5 > > Sami > > > [1] https://lore.kernel.org/lkml/znizetkkqweig...@bombadil.infradead.org/ > [2] https://lore.kernel.org/lkml/20240925233854.90072-1-mmau...@google.com/ > > --- > > v5: > - Rebased on v6.12-rc5. > > - Fixed an issue with limiting structure expansion, and applied > Petr's clean-up. (Patch 10) > > - Dropped an unnecessary return statement in error path. (Patch > 12) > > - Addressed several other smaller issues Petr brought up. (Patches > 13, 14, and 15) > > - Added a KBUILD_GENDWARFKSYMS_STABLE flag to enable --stable for > the entire kernel build. (Patch 18) > Hi Sami, perfect timing: Nathan uploaded SLIM LLVM toolchain v19.1.3 KBUILD_GENDWARFKSYMS_STABLE is to be set manually? What value is recommended? Thanks. -Sedat- > - Updated documentation to include KBUILD flags. (Patch 19) > > - Picked up Reviewed-by tags from v4. > > v4: > https://lore.kernel.org/lkml/20241008183823.36676-21-samitolva...@google.com/ > - Rebased on v6.12-rc2, which now includes all the prerequisites. > > - Dropped unnecessary name_only parameter for symbols.c::for_each > and cleaned up error handling. (Patch 3) > > - Fixed anonymous scope handling to ensure unnamed DIEs don't get > names. (Patch 4) > > - Added non-variant children to variant_type output, and included > DW_AT_discr_value attributes for variants. (Patch 9) > > - Added another symbol pointer test case. (Patch 16) > > - Picked up (Acked|Reviewed)-by tags from v3. > > > v3: > https://lore.kernel.org/lkml/20240923181846.549877-22-samitolva...@google.com/ > - Updated SPX license headers. > > - Squashed the first two patches in v2 and tried to reduce churn as > much as reasonable. > > - Dropped patch 18 from v2 ("x86/asm-prototypes: Include > ") as it's addressed by a separate patch. > > - Changed the error handling code to immediately terminate instead > of propagating the errors back to main, which cleaned up the code > quite a bit. > > - Switched to the list and hashtable implementations in scripts and > dropped the remaining tools/include dependencies. Added a couple > missing list macros. (patch 1) > > - Moved the genksyms CRC32 implementation to scripts/include and > dropped the duplicate code. (patches 2 and 14) > > - Switched from ad-hoc command line parsing to getopt_long (patch 3). > > - Added structure member and function parameter names to the DIE > output to match genksyms behavior, and tweaked the symtypes format > to be more parser-friendly in general based on Petr's suggestions. > > - Replaced the declaration-only struct annotations with more generic > kABI stability rules that allow source code annotations to be used > where #ifndef __GENKSYMS__ was previously used. Added support for > r
Re: [PATCH net-next v11 08/23] ovpn: implement basic TX path (UDP)
On 30/10/2024 18:14, Sabrina Dubroca wrote: 2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote: +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer) +{ + struct sk_buff *curr, *next; + + if (likely(!peer)) + /* retrieve peer serving the destination IP of this packet */ + peer = ovpn_peer_get_by_dst(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: no peer to send data to\n", + ovpn->dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + /* this might be a GSO-segmented skb list: process each skb +* independently +*/ + skb_list_walk_safe(skb, curr, next) nit (if you end up reposting): there should probably be some braces around the (multi-line) loop body. ACK + if (unlikely(!ovpn_encrypt_one(peer, curr))) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(curr); + } +void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, + struct sk_buff *skb) +{ [...] + /* crypto layer -> transport (UDP) */ + pkt_len = skb->len; + ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb); + +out_unlock: + rcu_read_unlock(); +out: + if (unlikely(ret < 0)) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(skb); + return; + } + + dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len); If I'm following things correctly, that's already been counted: ovpn_udp_output -> ovpn_udp4_output -> udp_tunnel_xmit_skb -> iptunnel_xmit -> iptunnel_xmit_stats which does (on success) the same thing as dev_sw_netstats_tx_add. On Right. This means we can remove that call to tx_add(). failure it increments a different tx_dropped counter than what dev_core_stats_tx_dropped_inc, but they should get summed in the end. It seems they are summed up in dev_get_tstats64(), therefore I should remove the tx_dropped_inc() call to avoid double counting. Thanks! Cheers, +} -- Antonio Quartulli OpenVPN Inc.
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
On 10/30/24 9:31 AM, Matthieu Baerts wrote: Hi Tao, BPF maintainers, On 30/10/2024 12:12, Tao Chen wrote: 在 2024/10/30 18:49, Matthieu Baerts 写道: Hi Tao Chen, Thank you for having shared this patch. On 30/10/2024 11:01, Tao Chen wrote: Fix compile error when MPTCP feature not support, though eBPF core check already done which seems invalid in this situation, the error info like: progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct tcp_sock' 49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? The filed created in new definitions with eBPF core feature to solve this build problem, and test case result still ok in MPTCP kernel. 176/1 mptcp/base:OK 176/2 mptcp/mptcpify:OK 176 mptcp:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") The commit you mentioned here is more than 2 years old, and as far as I can see, nobody else reported this compilation issue. I guess that's because people used tools/testing/selftests/bpf/config file as expected to populate the kernel config, and I suppose you didn't, right? Hi Matt, thank you for your reply, as you said, i did not use tools/ testing/selftests/bpf/config to compile kernel, i will use this helpful feature. I don't think other BPF selftests check for missing kernel config if they are specified in the 'config' file, but even if it is the case, I think it would be better to skip all the MPTCP tests, and not try to have them checking something that doesn't exist: no need to validate these tests if the expected kernel config has not been enabled. If i use the kernel not support MPTCP, the compile error still exists, and i can not build the bpf test successfully. Maybe skill the test case seems better when kernel not support. Now that bpf_core_field_exists check already used in the code, i think it is better to use new definition mode. I understand it would be better, but it means more code to maintain to handle that (and remembering that in future test cases). If that's not necessary, then no need to do the effort. @BPF maintainers: do we need to support kernels not respecting the tools/testing/selftests/bpf/config file? Should we detect when a required kernel config is not set and skip some tests? I guess it depends on the CONFIG_. Otherwise, it takes out the goodies of using when writing bpf selftests. If fixing the config is an option and sounds like it is for Tao, then it is always good to run everything in test_progs. There are some "___local" definitions in the selftests. If mptcp test wants to go this path, then Matt's request to at least test__skip() makes sense to me. pw-bot: cr But again, please correct me if I'm wrong, but I don't think there is anything to change here to fix your compilation issue: simply make sure to use this tools/testing/selftests/bpf/config file to generate your kernel config, no? Cheers, Matt
Re: [PATCH net-next v11 06/23] ovpn: introduce the ovpn_peer object
On 30/10/2024 17:37, Sabrina Dubroca wrote: 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: +static void ovpn_peer_release(struct ovpn_peer *peer) +{ + ovpn_bind_reset(peer, NULL); + + dst_cache_destroy(&peer->dst_cache); Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately: void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); } (probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?) Right. That was my assumption: release happens on refcnt = 0 only, therefore no field should be in use anymore. Anything that may still be in use will have its own refcounter. + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); + kfree_rcu(peer, rcu); +} [...] +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, +enum ovpn_del_peer_reason reason) + __must_hold(&peer->ovpn->lock) +{ + struct ovpn_peer *tmp; + + tmp = rcu_dereference_protected(peer->ovpn->peer, + lockdep_is_held(&peer->ovpn->lock)); + if (tmp != peer) { + DEBUG_NET_WARN_ON_ONCE(1); + if (tmp) + ovpn_peer_put(tmp); Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put? First of all consider that this is truly something that we don't expect to happen (hence the WARN_ON). If this is happening it's because we are trying to delete a peer that is not the one we are connected to (unexplainable scenario in p2p mode). Still, should we hit this case (I truly can't see how), I'd say "leave everything as is - maybe this call was just a mistake". Cheers, + + return -ENOENT; + } + + tmp->delete_reason = reason; + RCU_INIT_POINTER(peer->ovpn->peer, NULL); + ovpn_peer_put(tmp); + + return 0; +} -- Antonio Quartulli OpenVPN Inc.
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
Hi, On Wed, Oct 30, 2024 at 6:54 AM Paul E. McKenney wrote: > > > > We do assume that nmi_trigger_cpumask_backtrace() uses true NMIs, so, > > > yes, nmi_trigger_cpumask_backtrace() should use true NMIs, just like > > > the name says. ;-) > > > > In the comments of following patch, the arm64 maintainers have > > differing views on the use of nmi_trigger_cpumask_backtrace(). I'm a > > bit confused and unsure which perspective is more reasonable. > > > > https://lore.kernel.org/all/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid/ > > I clearly need to have a chat with the arm64 maintainers, and thank > you for checking. > > > > /* > > > * NOTE: though nmi_trigger_cpumask_backtrace() has "nmi_" in the > > name, > > > * nothing about it truly needs to be implemented using an NMI, it's > > > * just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi() > > > * returned false our backtrace attempt will just use a regular IPI. > > > */ > > > > > Alternatively, arm64 could continue using nmi_trigger_cpumask_backtrace() > > > with normal interrupts (for example, on SoCs not implementing true NMIs), > > > but have a short timeout (maybe a few jiffies?) after which its returns > > > false (and presumably also cancels the backtrace request so that when > > > the non-NMI interrupt eventually does happen, its handler simply returns > > > without backtracing). This should be implemented using atomics to avoid > > > deadlock issues. This alternative approach would provide accurate arm64 > > > backtraces in the common case where interrupts are enabled, but allow > > > a graceful fallback to remote tracing otherwise. > > > > > > Would you be interested in working this issue, whatever solution the > > > arm64 maintainers end up preferring? > > > > The 10-second timeout is hard-coded in nmi_trigger_cpumask_backtrace(). > > It is shared code and not architecture-specific. Currently, I haven't > > thought of a feasible solution. I have also CC'd the authors of the > > aforementioned patch to see if they have any other ideas. > > It should be possible for arm64 to have an architecture-specific hook > that enables them to use a much shorter timeout. Or, to eventually > switch to real NMIs. Note that: * Switching to real NMIs is impossible on many existing arm64 CPUs. The hardware support simply isn't there. Pseudo-NMIs should work fine and are (in nearly all cases) just as good as NMIs but they have a small performance impact. There are also compatibility issues with some pre-existing bootloaders. ...so code can't assume even Pseudo-NMI will work and needs to be able to fall back. Prior to my recent changes arm64 CPUs wouldn't even do stacktraces in some scenarios. Now at least they fall back to regular IPIs. * Even if we decided that we absolutely had to disable stacktrades on arm64 CPUs without some type of NMI, that won't fix arm32. arm32 has been using regular IPIs for backtraces like this for many, many years. Maybe folks don't care as much about arm32 anymore but it feels bad if we totally break it. IMO waiting 10 seconds for a backtrace is pretty crazy to begin with. What about just changing that globally to 1 second? If not, it doesn't feel like it would be impossibly hard to make an arch-specific callback to choose the time and that callback could even take into account whether we managed to get an NMI. I'd be happy to review such a patch. -Doug
Re: [PATCH v2 02/13] media: i2c: imx214: Use subdev active state
Hi It looks good to me, but I would like Sakari to review it. I am not sure if it is ok to keep the cur_mode variable. On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > Port the imx214 sensor driver to use the subdev active state. > > Move all the format configuration to the subdevice state and simplify > the format handling, locking and initialization. > > While at it, simplify imx214_start_streaming() by removing unneeded goto > statements and the corresponding error label. > > Signed-off-by: André Apitzsch > --- > drivers/media/i2c/imx214.c | 159 > +++-- > 1 file changed, 53 insertions(+), 106 deletions(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > 5d411452d342fdb177619cd1c9fd9650d31089bb..990fd0811904fc478c05d64054089fc2879cae94 > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -59,8 +59,6 @@ struct imx214 { > > struct v4l2_subdev sd; > struct media_pad pad; > - struct v4l2_mbus_framefmt fmt; > - struct v4l2_rect crop; > > struct v4l2_ctrl_handler ctrls; > struct v4l2_ctrl *pixel_rate; > @@ -68,15 +66,11 @@ struct imx214 { > struct v4l2_ctrl *exposure; > struct v4l2_ctrl *unit_size; > > + const struct imx214_mode *cur_mode; > + > struct regulator_bulk_data supplies[IMX214_NUM_SUPPLIES]; > > struct gpio_desc *enable_gpio; > - > - /* > -* Serialize control access, get/set format, get selection > -* and start streaming. > -*/ > - struct mutex mutex; > }; > > struct reg_8 { > @@ -490,6 +484,22 @@ static int __maybe_unused imx214_power_off(struct device > *dev) > return 0; > } > > +static void imx214_update_pad_format(struct imx214 *imx214, > +const struct imx214_mode *mode, > +struct v4l2_mbus_framefmt *fmt, u32 code) > +{ > + fmt->code = IMX214_MBUS_CODE; > + fmt->width = mode->width; > + fmt->height = mode->height; > + fmt->field = V4L2_FIELD_NONE; > + fmt->colorspace = V4L2_COLORSPACE_SRGB; > + fmt->ycbcr_enc = V4L2_MAP_YCBCR_ENC_DEFAULT(fmt->colorspace); > + fmt->quantization = V4L2_MAP_QUANTIZATION_DEFAULT(true, > + fmt->colorspace, > + fmt->ycbcr_enc); > + fmt->xfer_func = V4L2_MAP_XFER_FUNC_DEFAULT(fmt->colorspace); > +} > + > static int imx214_enum_mbus_code(struct v4l2_subdev *sd, > struct v4l2_subdev_state *sd_state, > struct v4l2_subdev_mbus_code_enum *code) > @@ -549,52 +559,6 @@ static const struct v4l2_subdev_core_ops imx214_core_ops > = { > #endif > }; > > -static struct v4l2_mbus_framefmt * > -__imx214_get_pad_format(struct imx214 *imx214, > - struct v4l2_subdev_state *sd_state, > - unsigned int pad, > - enum v4l2_subdev_format_whence which) > -{ > - switch (which) { > - case V4L2_SUBDEV_FORMAT_TRY: > - return v4l2_subdev_state_get_format(sd_state, pad); > - case V4L2_SUBDEV_FORMAT_ACTIVE: > - return &imx214->fmt; > - default: > - return NULL; > - } > -} > - > -static int imx214_get_format(struct v4l2_subdev *sd, > -struct v4l2_subdev_state *sd_state, > -struct v4l2_subdev_format *format) > -{ > - struct imx214 *imx214 = to_imx214(sd); > - > - mutex_lock(&imx214->mutex); > - format->format = *__imx214_get_pad_format(imx214, sd_state, > - format->pad, > - format->which); > - mutex_unlock(&imx214->mutex); > - > - return 0; > -} > - > -static struct v4l2_rect * > -__imx214_get_pad_crop(struct imx214 *imx214, > - struct v4l2_subdev_state *sd_state, > - unsigned int pad, enum v4l2_subdev_format_whence which) > -{ > - switch (which) { > - case V4L2_SUBDEV_FORMAT_TRY: > - return v4l2_subdev_state_get_crop(sd_state, pad); > - case V4L2_SUBDEV_FORMAT_ACTIVE: > - return &imx214->crop; > - default: > - return NULL; > - } > -} > - > static int imx214_set_format(struct v4l2_subdev *sd, > struct v4l2_subdev_state *sd_state, > struct v4l2_subdev_format *format) > @@ -604,34 +568,25 @@ static int imx214_set_format(struct v4l2_subdev *sd, > struct v4l2_rect *__crop; > const struct imx214_mode *mode; > > - mutex_lock(&imx214->mutex); > - > - __crop = __imx214_get_pad_crop(imx214, sd_state, format->pad,
Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test
On Wed, Oct 30, 2024, Pratik R. Sampat wrote: > On 10/30/2024 8:46 AM, Sean Christopherson wrote: > > +/* Minimum firmware version required for the SEV-SNP support */ > > +#define SNP_FW_REQ_VER_MAJOR 1 > > +#define SNP_FW_REQ_VER_MINOR 51 > > > > Side topic, why are these hardcoded? And where did they come from? If > > they're > > arbitrary KVM selftests values, make that super duper clear. > > Well, it's not entirely arbitrary. This was the version that SNP GA'd > with first so that kind of became the minimum required version needed. > > I think the only place we've documented this is here - > https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware. > > Maybe, I can modify the comment above to say something like - > Minimum general availability release firmware required for SEV-SNP support. Hmm, so if AMD says SNP is only supported for firmware version >= 1.51, why on earth is that not checked and enforced by the kernel? Relying on userspace to not crash the host (or worse) because of unsupported firmware is not a winning strategy.
Re: [RFC PATCH] remoteproc: core: Add support for predefined notifyids
On Wed, Oct 23, 2024 at 6:32 PM Mathieu Poirier wrote: > > Hello Daniel, > > On Fri, Oct 18, 2024 at 02:09:29PM +0300, Daniel Baluta wrote: > > Currently we generate notifyids in the linux kernel and override > > those found in rsc_table. > > > > This doesn't play well with users expecting to use the exact ids > > from rsc_table. > > > > So, use predefined notifyids found in rsc_table if any. Otherwise, > > let Linux generate the ids as before. > > > > Keypoint is we also define an invalid notifid as 0xU. This > > should be placed as notifids if users want Linux to generate the ids. > > > > Signed-off-by: Alexandru Lastur > > Signed-off-by: Daniel Baluta > > --- > > drivers/remoteproc/remoteproc_core.c | 14 -- > > include/linux/remoteproc.h | 1 + > > 2 files changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > b/drivers/remoteproc/remoteproc_core.c > > index f276956f2c5c..9f00fe16da38 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -332,6 +332,7 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) > > int ret, notifyid; > > struct rproc_mem_entry *mem; > > size_t size; > > + int start, end; > > > > /* actual size of vring (in bytes) */ > > size = PAGE_ALIGN(vring_size(rvring->num, rvring->align)); > > @@ -363,9 +364,18 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i) > > /* > >* Assign an rproc-wide unique index for this vring > >* TODO: assign a notifyid for rvdev updates as well > > - * TODO: support predefined notifyids (via resource table) > >*/ > > - ret = idr_alloc(&rproc->notifyids, rvring, 0, 0, GFP_KERNEL); > > + > > + start = 0; > > + end = 0; > > + > > + /* use id if specified in rsc table */ > > + if (rsc->vring[i].notifyid != RSC_INVALID_NOTIFYID) { > > + start = rsc->vring[i].notifyid; > > + end = start + 1; > > + } > > This will likely introduce a backward compatibility issue where anyone that > has more than one vring and set their notifyids to anything else than > 0x > in the resource table will see a boot failure. > > A while back the openAMP group started discussions on using the configuration > space of a virtio device to enhance device discovery, with exactly this kind > of > use case in mind. I think it is the only way to move forward with this > feature, though it is a big job that requires a lot of community interactions. I also found this attempt few years ago: https://lkml.iu.edu/hypermail/linux/kernel/2001.2/05248.html I think the best way to go is to bump the resource table version and support the old behavior for v1 and the new behavior for the new version (which should be v2). We will be back next week with a patch to express this. thanks, Daniel.
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
Hi Tao, BPF maintainers, On 30/10/2024 12:12, Tao Chen wrote: > 在 2024/10/30 18:49, Matthieu Baerts 写道: >> Hi Tao Chen, >> >> Thank you for having shared this patch. >> >> On 30/10/2024 11:01, Tao Chen wrote: >>> Fix compile error when MPTCP feature not support, though eBPF core check >>> already done which seems invalid in this situation, the error info like: >>> progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct >>> tcp_sock' >>> 49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? >>> >>> The filed created in new definitions with eBPF core feature to solve >>> this build problem, and test case result still ok in MPTCP kernel. >>> >>> 176/1 mptcp/base:OK >>> 176/2 mptcp/mptcpify:OK >>> 176 mptcp:OK >>> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED >>> >>> Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") >> >> The commit you mentioned here is more than 2 years old, and as far as I >> can see, nobody else reported this compilation issue. I guess that's >> because people used tools/testing/selftests/bpf/config file as expected >> to populate the kernel config, and I suppose you didn't, right? >> > > Hi Matt, thank you for your reply, as you said, i did not use tools/ > testing/selftests/bpf/config to compile kernel, i will use this helpful > feature. > >> I don't think other BPF selftests check for missing kernel config if >> they are specified in the 'config' file, but even if it is the case, I >> think it would be better to skip all the MPTCP tests, and not try to >> have them checking something that doesn't exist: no need to validate >> these tests if the expected kernel config has not been enabled. >> > > If i use the kernel not support MPTCP, the compile error still exists, > and i can not build the bpf test successfully. Maybe skill the test case > seems better when kernel not support. Now that bpf_core_field_exists > check already used in the code, i think it is better to use new > definition mode. I understand it would be better, but it means more code to maintain to handle that (and remembering that in future test cases). If that's not necessary, then no need to do the effort. @BPF maintainers: do we need to support kernels not respecting the tools/testing/selftests/bpf/config file? Should we detect when a required kernel config is not set and skip some tests? >> But again, please correct me if I'm wrong, but I don't think there is >> anything to change here to fix your compilation issue: simply make sure >> to use this tools/testing/selftests/bpf/config file to generate your >> kernel config, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net-next v11 08/23] ovpn: implement basic TX path (UDP)
2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote: > +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb, > + struct ovpn_peer *peer) > +{ > + struct sk_buff *curr, *next; > + > + if (likely(!peer)) > + /* retrieve peer serving the destination IP of this packet */ > + peer = ovpn_peer_get_by_dst(ovpn, skb); > + if (unlikely(!peer)) { > + net_dbg_ratelimited("%s: no peer to send data to\n", > + ovpn->dev->name); > + dev_core_stats_tx_dropped_inc(ovpn->dev); > + goto drop; > + } > + > + /* this might be a GSO-segmented skb list: process each skb > + * independently > + */ > + skb_list_walk_safe(skb, curr, next) nit (if you end up reposting): there should probably be some braces around the (multi-line) loop body. > + if (unlikely(!ovpn_encrypt_one(peer, curr))) { > + dev_core_stats_tx_dropped_inc(ovpn->dev); > + kfree_skb(curr); > + } > +void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, > +struct sk_buff *skb) > +{ [...] > + /* crypto layer -> transport (UDP) */ > + pkt_len = skb->len; > + ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb); > + > +out_unlock: > + rcu_read_unlock(); > +out: > + if (unlikely(ret < 0)) { > + dev_core_stats_tx_dropped_inc(ovpn->dev); > + kfree_skb(skb); > + return; > + } > + > + dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len); If I'm following things correctly, that's already been counted: ovpn_udp_output -> ovpn_udp4_output -> udp_tunnel_xmit_skb -> iptunnel_xmit -> iptunnel_xmit_stats which does (on success) the same thing as dev_sw_netstats_tx_add. On failure it increments a different tx_dropped counter than what dev_core_stats_tx_dropped_inc, but they should get summed in the end. > +} -- Sabrina
Re: [PATCH net-next v11 06/23] ovpn: introduce the ovpn_peer object
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote: > +static void ovpn_peer_release(struct ovpn_peer *peer) > +{ > + ovpn_bind_reset(peer, NULL); > + > + dst_cache_destroy(&peer->dst_cache); Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately: void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); } (probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?) > + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); > + kfree_rcu(peer, rcu); > +} [...] > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, > + enum ovpn_del_peer_reason reason) > + __must_hold(&peer->ovpn->lock) > +{ > + struct ovpn_peer *tmp; > + > + tmp = rcu_dereference_protected(peer->ovpn->peer, > + lockdep_is_held(&peer->ovpn->lock)); > + if (tmp != peer) { > + DEBUG_NET_WARN_ON_ONCE(1); > + if (tmp) > + ovpn_peer_put(tmp); Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put? > + > + return -ENOENT; > + } > + > + tmp->delete_reason = reason; > + RCU_INIT_POINTER(peer->ovpn->peer, NULL); > + ovpn_peer_put(tmp); > + > + return 0; > +} -- Sabrina
Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test
Hi Sean, On 10/30/2024 8:46 AM, Sean Christopherson wrote: > On Mon, Oct 28, 2024, Pratik R. Sampat wro4te: >> On 10/28/2024 12:55 PM, Sean Christopherson wrote: >>> On Mon, Oct 21, 2024, Pratik R. Sampat wrote: >> +if (unlikely(!is_smt_active())) >> +snp_policy &= ~SNP_POLICY_SMT; > > Why does SNP_POLICY assume SMT? And what is RSVD_MBO? E.g. why not this? > > u64 policy = is_smt_active() ? SNP_POLICY_SMT : SNP_POLICY; > I think most systems support SMT so I enabled the bit in by default and only unset it when there isn't any support. >>> >>> That's confusing though, because you're mixing architectural defines with >>> semi- >>> arbitrary selftests behavior. RSVD_MBO on the other is apparently tightly >>> coupled >>> with SNP, i.e. SNP can't exist without that bit, so it makes sense that >>> RSVD_MBO >>> needs to be part of SNP_POLICY >>> >>> If you want to have a *software*-defined default policy, then make it >>> obvious that >>> it's software defined. E.g. name the #define SNP_DEFAULT_POLICY, not simply >>> SNP_POLICY, because the latter is too easily misconstrued as the base SNP >>> policy, >>> which it is not. That said, IIUC, SMT *must* match the host configuration, >>> i.e. >>> whether or not SMT is set is non-negotiable. In that case, there's zero >>> value in >>> defining SNP_DEFAULT_POLICY, because it can't be a sane default for all >>> systems. >>> >> >> Right, SMT should match the host configuration. Would a >> SNP_DEFAULT_POLICY work if we made it check for SMT too in the macro? >> >> Instead of, >> #define SNP_POLICY (SNP_POLICY_SMT | SNP_POLICY_RSVD_MBO) >> >> Have something like this instead to make it generic and less ambiguous? >> #define SNP_DEFAULT_POLICY()\ >> ({ \ >> SNP_POLICY_RSVD_MBO | (is_smt_active() ? SNP_POLICY_SMT : 0); \ >> }) > > No, unless it's the least awful option, don't hide dynamic functionality in a > macro > that looks like it holds static data. The idea is totally fine, but put it > in an > actual helper, not a macro, _if_ there's actually a need for a default policy. > If there's only ever one main path that creates SNP VMs, then I don't see the > point > in specifying a default policy. > Currently, there just seems to be one path of doing (later the prefault tests exercise it) so I'm not too averse to just dropping it and having what bits needs to be set during the main path. I had only introduced it so that it would be easy to specify a minimal working SNP policy as it was for SEV and SEV-ES without too much ambiguity. But if it's causing more issues than resolving, I can definitely get rid of it. >>> Side topic, I assume one of SEV_POLICY_NO_DBG or SNP_POLICY_DBG *must* be >>> specified, >>> and that they are mutualy exclusive? E.g. what happens if the full policy >>> is simply >>> SNP_POLICY_RSVD_MBO? >> >> SEV_POLICY_NO_DBG is mainly for the guest policy structure of SEV and >> SEV-ES - pg 31, Table 2 >> https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55766_SEV-KM_API_Specification.pdf >> >> and, SNP_POLICY_DBG is a bit in the guest policy structure of SNP - pg >> 27, Table 9 >> https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56860.pdf >> >> In the former, a SEV guest disables debugging if SEV_POLICY_NO_DBG is >> set. Similarly, a SNP guest enables debugging if SNP_POLICY_DBG is set. > > Ugh, one is SEV_xxx, the other is SNP_xxx. Argh! And IIUC, they are mutually > exclusive (totally separate thigns?), because SNP guests use an 8-byte > structure, > whereas SEV/SEV-ES use a 4-byte structure, and with different layouts. > > That means this is _extremely_ confusing. Separate the SEV_xxx defines from > the > SNP_xxx defines, because other than a name, they have nothing in common. > Right. I see how that can be confusing. Sure I can make sure not to bundle up these defines together. > +/* Minimum firmware version required for the SEV-SNP support */ > +#define SNP_FW_REQ_VER_MAJOR 1 > +#define SNP_FW_REQ_VER_MINOR 51 > > Side topic, why are these hardcoded? And where did they come from? If > they're > arbitrary KVM selftests values, make that super duper clear. Well, it's not entirely arbitrary. This was the version that SNP GA'd with first so that kind of became the minimum required version needed. I think the only place we've documented this is here - https://github.com/AMDESE/AMDSEV/tree/snp-latest?tab=readme-ov-file#upgrade-sev-firmware. Maybe, I can modify the comment above to say something like - Minimum general availability release firmware required for SEV-SNP support. > > +#define SNP_POLICY_MINOR_BIT 0 > +#define SNP_POLICY_MAJOR_BIT 8 > > s/BIT/SHIFT. "BIT" implies they are a single bit, which is obviously not the > c
Re: [PATCH v6 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
On Mon, Oct 28, 2024 at 04:06:07PM +, Lorenzo Stoakes wrote: > I guess I'll try to adapt that and respin a v7 when I get a chance. Hm looking at this draft patch, it seems like a total rework of pidfd's across the board right (now all pidfd's will need to be converted to pid_fd)? Correct me if I'm wrong. If only for the signal case, it seems like overkill to define a whole pid_fd and to use this CLASS() wrapper just for this one instance. If the intent is to convert _all_ pidfd's to use this type, it feels really out of scope for this series and I think we'd probably instead want to go off and do that as a separate series and put this on hold until that is done. If instead you mean that we ought to do something like this just for the signal case, it feels like it'd be quite a bit of extra abstraction just used in this one case but nowhere else, I think if you did an abstraction like this it would _have_ to be across the board right? I agree that the issue is with this one signal case that pins only the fd (rather than this pid) where this 'pinning' doesn't _necessary_ mess around with reference counts. So we definitely must address this, but the issue you had with the first approach was that I think (correct me if I'm wrong) I was passing a pointer to a struct fd which is not permitted right? Could we pass the struct fd by value to avoid this? I think we'd have to unfortunately special-case this and probably duplicate some code which is a pity as I liked the idea of abstracting everything to one place, but we can obviously do that. So I guess to TL;DR it, the options are: 1. Implement pid_fd everywhere, in which case I will leave off on this series and I guess, if I have time I could look at trying to implement that or perhaps you'd prefer to? 2. We are good for the sake of this series to special-case a pidfd_to_pid() implementation (used only by the pidfd_send_signal() syscall) 3. Something else, or I am misunderstanding your point :) Let me know how you want me to proceed on this as we're at v6 already and I want to be _really_ sure I'm doing what you want here. Thanks!
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On 10/30, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev > wrote: > > > > On 10/30, Mina Almasry wrote: > > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev > > > wrote: > > > > > > > > The goal of the series is to simplify and make it possible to use > > > > ncdevmem in an automated way from the ksft python wrapper. > > > > > > > > ncdevmem is slowly mutated into a state where it uses stdout > > > > to print the payload and the python wrapper is added to > > > > make sure the arrived payload matches the expected one. > > > > > > > > v6: > > > > - fix compilation issue in 'Unify error handling' patch (Jakub) > > > > > > > > > > Since I saw a compilation failures on a couple of iterations I > > > cherry-picked this locally and tested compilation. I'm seeing this: > > > > Are you cherry picking the whole series or just this patch? It looks > > too broken. > > > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > > TARGETS=ncdevmem 2>&1 > > > make: Entering directory > > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > > CC ncdevmem > > > In file included from ncdevmem.c:63: > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: > > > warning: ‘enum ethtool_header_flags’ declared inside parameter list > > > will not be visible outside of this definition or declaration > > >23 | const char *ethtool_header_flags_str(enum ethtool_header_flags > > > value); > > > | ^~~~ > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: > > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside > > > parameter list will not be visible outside of this definition or > > > declaration > > >25 | ethtool_module_fw_flash_status_str(enum > > > ethtool_module_fw_flash_status value); > > > | > > > ^~ > > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: > > > error: field ‘status’ has incomplete type > > > 6766 | enum ethtool_module_fw_flash_status status; > > > | ^~ > > > > This has been fixed via '#include ' > > > > > ncdevmem.c: In function ‘do_server’: > > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known > > > 517 | struct dmabuf_token token; > > > > And this, and the rest, don't make sense at all? > > > > I'll double check on my side. > > Oh, whoops, I forgot to headers_install first. This works for me: > > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install && > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > TARGETS=ncdevmem 2>&1 > INSTALL ./usr/include > make: Entering directory > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > make: Nothing to be done for 'all'. > make: Leaving directory > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem > ./tools/testing/selftests/drivers/net/hw/ncdevmem > > Sorry for the noise :D Whew, thanks and no worries!
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 10/29, Mina Almasry wrote: > Check we're going to free a reasonable number of frags in token_count > before starting the loop, to prevent looping too long. > > Also minor code cleanups: > - Flip checks to reduce indentation. > - Use sizeof(*tokens) everywhere for consistentcy. > > Cc: Yi Lai > > Signed-off-by: Mina Almasry > > --- > net/core/sock.c | 46 -- > 1 file changed, 28 insertions(+), 18 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 7f398bd07fb7..8603b8d87f2e 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int > bytes) > > #ifdef CONFIG_PAGE_POOL > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > * 1 syscall. The limit exists to limit the amount of memory the kernel > - * allocates to copy these tokens. > + * allocates to copy these tokens, and to prevent looping over the frags for > + * too long. > */ > -#define MAX_DONTNEED_TOKENS 128 > +#define MAX_DONTNEED_FRAGS 1024 > > static noinline_for_stack int > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen) > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > optval, unsigned int optlen) > unsigned int num_tokens, i, j, k, netmem_num = 0; > struct dmabuf_token *tokens; > netmem_ref netmems[16]; > + u64 num_frags = 0; > int ret = 0; > > if (!sk_is_tcp(sk)) > return -EBADF; > > - if (optlen % sizeof(struct dmabuf_token) || > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > + if (optlen % sizeof(*tokens) || > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > return -EINVAL; > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > + num_tokens = optlen / sizeof(*tokens); > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > if (!tokens) > return -ENOMEM; > > - num_tokens = optlen / sizeof(struct dmabuf_token); > if (copy_from_sockptr(tokens, optval, optlen)) { > kvfree(tokens); > return -EFAULT; > } > > + for (i = 0; i < num_tokens; i++) { > + num_frags += tokens[i].token_count; > + if (num_frags > MAX_DONTNEED_FRAGS) { > + kvfree(tokens); > + return -E2BIG; > + } > + } > + > xa_lock_bh(&sk->sk_user_frags); > for (i = 0; i < num_tokens; i++) { > for (j = 0; j < tokens[i].token_count; j++) { > netmem_ref netmem = (__force netmem_ref)__xa_erase( > &sk->sk_user_frags, tokens[i].token_start + j); > > - if (netmem && > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > - netmems[netmem_num++] = netmem; > - if (netmem_num == ARRAY_SIZE(netmems)) { > - xa_unlock_bh(&sk->sk_user_frags); > - for (k = 0; k < netmem_num; k++) > - > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > - netmem_num = 0; > - xa_lock_bh(&sk->sk_user_frags); > - } > - ret++; [..] > + if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > + continue; Any reason we are not returning explicit error to the callers here? That probably needs some mechanism to signal which particular one failed so the users can restart?
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On Wed, Oct 30, 2024 at 8:13 AM Stanislav Fomichev wrote: > > On 10/30, Mina Almasry wrote: > > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev wrote: > > > > > > The goal of the series is to simplify and make it possible to use > > > ncdevmem in an automated way from the ksft python wrapper. > > > > > > ncdevmem is slowly mutated into a state where it uses stdout > > > to print the payload and the python wrapper is added to > > > make sure the arrived payload matches the expected one. > > > > > > v6: > > > - fix compilation issue in 'Unify error handling' patch (Jakub) > > > > > > > Since I saw a compilation failures on a couple of iterations I > > cherry-picked this locally and tested compilation. I'm seeing this: > > Are you cherry picking the whole series or just this patch? It looks > too broken. > > > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > > TARGETS=ncdevmem 2>&1 > > make: Entering directory > > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > > CC ncdevmem > > In file included from ncdevmem.c:63: > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: > > warning: ‘enum ethtool_header_flags’ declared inside parameter list > > will not be visible outside of this definition or declaration > >23 | const char *ethtool_header_flags_str(enum ethtool_header_flags > > value); > > | ^~~~ > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: > > warning: ‘enum ethtool_module_fw_flash_status’ declared inside > > parameter list will not be visible outside of this definition or > > declaration > >25 | ethtool_module_fw_flash_status_str(enum > > ethtool_module_fw_flash_status value); > > | > > ^~ > > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: > > error: field ‘status’ has incomplete type > > 6766 | enum ethtool_module_fw_flash_status status; > > | ^~ > > This has been fixed via '#include ' > > > ncdevmem.c: In function ‘do_server’: > > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known > > 517 | struct dmabuf_token token; > > And this, and the rest, don't make sense at all? > > I'll double check on my side. Oh, whoops, I forgot to headers_install first. This works for me: ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ sudo make headers_install && sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw TARGETS=ncdevmem 2>&1 INSTALL ./usr/include make: Entering directory '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' make: Nothing to be done for 'all'. make: Leaving directory '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' ➜ cos-kernel git:(tcpdevmem-fixes-1) ✗ find . -iname ncdevmem ./tools/testing/selftests/drivers/net/hw/ncdevmem Sorry for the noise :D -- Thanks, Mina
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On 10/30, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev wrote: > > > > The goal of the series is to simplify and make it possible to use > > ncdevmem in an automated way from the ksft python wrapper. > > > > ncdevmem is slowly mutated into a state where it uses stdout > > to print the payload and the python wrapper is added to > > make sure the arrived payload matches the expected one. > > > > v6: > > - fix compilation issue in 'Unify error handling' patch (Jakub) > > > > Since I saw a compilation failures on a couple of iterations I > cherry-picked this locally and tested compilation. I'm seeing this: Are you cherry picking the whole series or just this patch? It looks too broken. > sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw > TARGETS=ncdevmem 2>&1 > make: Entering directory > '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' > CC ncdevmem > In file included from ncdevmem.c:63: > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: > warning: ‘enum ethtool_header_flags’ declared inside parameter list > will not be visible outside of this definition or declaration >23 | const char *ethtool_header_flags_str(enum ethtool_header_flags value); > | ^~~~ > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: > warning: ‘enum ethtool_module_fw_flash_status’ declared inside > parameter list will not be visible outside of this definition or > declaration >25 | ethtool_module_fw_flash_status_str(enum > ethtool_module_fw_flash_status value); > | ^~ > /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: > error: field ‘status’ has incomplete type > 6766 | enum ethtool_module_fw_flash_status status; > | ^~ This has been fixed via '#include ' > ncdevmem.c: In function ‘do_server’: > ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known > 517 | struct dmabuf_token token; And this, and the rest, don't make sense at all? I'll double check on my side.
Re: [PATCH v3 2/9] KVM: selftests: Add a basic SNP smoke test
On Mon, Oct 28, 2024, Pratik R. Sampat wro4te: > On 10/28/2024 12:55 PM, Sean Christopherson wrote: > > On Mon, Oct 21, 2024, Pratik R. Sampat wrote: > +if (unlikely(!is_smt_active())) > +snp_policy &= ~SNP_POLICY_SMT; > >>> > >>> Why does SNP_POLICY assume SMT? And what is RSVD_MBO? E.g. why not this? > >>> > >>> u64 policy = is_smt_active() ? SNP_POLICY_SMT : SNP_POLICY; > >>> > >> > >> I think most systems support SMT so I enabled the bit in by default and > >> only unset it when there isn't any support. > > > > That's confusing though, because you're mixing architectural defines with > > semi- > > arbitrary selftests behavior. RSVD_MBO on the other is apparently tightly > > coupled > > with SNP, i.e. SNP can't exist without that bit, so it makes sense that > > RSVD_MBO > > needs to be part of SNP_POLICY > > > > If you want to have a *software*-defined default policy, then make it > > obvious that > > it's software defined. E.g. name the #define SNP_DEFAULT_POLICY, not simply > > SNP_POLICY, because the latter is too easily misconstrued as the base SNP > > policy, > > which it is not. That said, IIUC, SMT *must* match the host configuration, > > i.e. > > whether or not SMT is set is non-negotiable. In that case, there's zero > > value in > > defining SNP_DEFAULT_POLICY, because it can't be a sane default for all > > systems. > > > > Right, SMT should match the host configuration. Would a > SNP_DEFAULT_POLICY work if we made it check for SMT too in the macro? > > Instead of, > #define SNP_POLICY(SNP_POLICY_SMT | SNP_POLICY_RSVD_MBO) > > Have something like this instead to make it generic and less ambiguous? > #define SNP_DEFAULT_POLICY() \ > ({ \ > SNP_POLICY_RSVD_MBO | (is_smt_active() ? SNP_POLICY_SMT : 0); \ > }) No, unless it's the least awful option, don't hide dynamic functionality in a macro that looks like it holds static data. The idea is totally fine, but put it in an actual helper, not a macro, _if_ there's actually a need for a default policy. If there's only ever one main path that creates SNP VMs, then I don't see the point in specifying a default policy. > > Side topic, I assume one of SEV_POLICY_NO_DBG or SNP_POLICY_DBG *must* be > > specified, > > and that they are mutualy exclusive? E.g. what happens if the full policy > > is simply > > SNP_POLICY_RSVD_MBO? > > SEV_POLICY_NO_DBG is mainly for the guest policy structure of SEV and > SEV-ES - pg 31, Table 2 > https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55766_SEV-KM_API_Specification.pdf > > and, SNP_POLICY_DBG is a bit in the guest policy structure of SNP - pg > 27, Table 9 > https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56860.pdf > > In the former, a SEV guest disables debugging if SEV_POLICY_NO_DBG is > set. Similarly, a SNP guest enables debugging if SNP_POLICY_DBG is set. Ugh, one is SEV_xxx, the other is SNP_xxx. Argh! And IIUC, they are mutually exclusive (totally separate thigns?), because SNP guests use an 8-byte structure, whereas SEV/SEV-ES use a 4-byte structure, and with different layouts. That means this is _extremely_ confusing. Separate the SEV_xxx defines from the SNP_xxx defines, because other than a name, they have nothing in common. +/* Minimum firmware version required for the SEV-SNP support */ +#define SNP_FW_REQ_VER_MAJOR 1 +#define SNP_FW_REQ_VER_MINOR 51 Side topic, why are these hardcoded? And where did they come from? If they're arbitrary KVM selftests values, make that super duper clear. +#define SNP_POLICY_MINOR_BIT 0 +#define SNP_POLICY_MAJOR_BIT 8 s/BIT/SHIFT. "BIT" implies they are a single bit, which is obviously not the case. But I vote to omit the extra #define entirely and just open code the shift in the SNP_FW_VER_{MAJOR,MINOR} macros. #define SEV_POLICY_NO_DBG (1UL << 0) #define SEV_POLICY_ES (1UL << 2) +#define SNP_POLICY_SMT (1ULL << 16) +#define SNP_POLICY_RSVD_MBO(1ULL << 17) +#define SNP_POLICY_DBG (1ULL << 19) +#define SNP_POLICY (SNP_POLICY_SMT | SNP_POLICY_RSVD_MBO) + +#define SNP_FW_VER_MAJOR(maj) ((uint8_t)(maj) << SNP_POLICY_MAJOR_BIT) +#define SNP_FW_VER_MINOR(min) ((uint8_t)(min) << SNP_POLICY_MINOR_BIT)
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 10/30, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev > wrote: > > > > On 10/29, Mina Almasry wrote: > > > Check we're going to free a reasonable number of frags in token_count > > > before starting the loop, to prevent looping too long. > > > > > > Also minor code cleanups: > > > - Flip checks to reduce indentation. > > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > > > Cc: Yi Lai > > > > > > Signed-off-by: Mina Almasry > > > > > > --- > > > net/core/sock.c | 46 -- > > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index 7f398bd07fb7..8603b8d87f2e 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, > > > int bytes) > > > > > > #ifdef CONFIG_PAGE_POOL > > > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > > > * 1 syscall. The limit exists to limit the amount of memory the kernel > > > - * allocates to copy these tokens. > > > + * allocates to copy these tokens, and to prevent looping over the frags > > > for > > > + * too long. > > > */ > > > -#define MAX_DONTNEED_TOKENS 128 > > > +#define MAX_DONTNEED_FRAGS 1024 > > > > > > static noinline_for_stack int > > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > > optlen) > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > > > optval, unsigned int optlen) > > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > > struct dmabuf_token *tokens; > > > netmem_ref netmems[16]; > > > + u64 num_frags = 0; > > > int ret = 0; > > > > > > if (!sk_is_tcp(sk)) > > > return -EBADF; > > > > > > - if (optlen % sizeof(struct dmabuf_token) || > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > > + if (optlen % sizeof(*tokens) || > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > > return -EINVAL; > > > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > > + num_tokens = optlen / sizeof(*tokens); > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > > > if (!tokens) > > > return -ENOMEM; > > > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > > if (copy_from_sockptr(tokens, optval, optlen)) { > > > kvfree(tokens); > > > return -EFAULT; > > > } > > > > > > + for (i = 0; i < num_tokens; i++) { > > > + num_frags += tokens[i].token_count; > > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > > + kvfree(tokens); > > > + return -E2BIG; > > > + } > > > + } > > > + > > > xa_lock_bh(&sk->sk_user_frags); > > > for (i = 0; i < num_tokens; i++) { > > > for (j = 0; j < tokens[i].token_count; j++) { > > > netmem_ref netmem = (__force netmem_ref)__xa_erase( > > > &sk->sk_user_frags, tokens[i].token_start + > > > j); > > > > > > - if (netmem && > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > > - netmems[netmem_num++] = netmem; > > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > > - xa_unlock_bh(&sk->sk_user_frags); > > > - for (k = 0; k < netmem_num; k++) > > > - > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > > - netmem_num = 0; > > > - xa_lock_bh(&sk->sk_user_frags); > > > - } > > > - ret++; > > > > [..] > > > > > + if (!netmem || > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > > + continue; > > > > Any reason we are not returning explicit error to the callers here? > > That probably needs some mechanism to signal which particular one failed > > so the users can restart? > > Only because I can't think of a simple way to return an array of frags > failed to DONTNEED to the user. I'd expect the call to return as soon as it hits the invalid frag entry (plus the number of entries that it successfully refilled up to the invalid one). But too late I guess. > Also, this error should be extremely rare or never hit really. I don't > know how we end up not finding a netmem here or the netmem is page. > The only way is if the user is malicious (messing with the token ids > passed to the kernel) or if a kernel bug is happening. I do hit this error with 1500 mtu, so it would've been nice
Re: [PATCH] selftests/lam: Test get_user() LAM pointer handling
On Tue, Oct 29, 2024 at 03:14:20PM +0100, Maciej Wieczor-Retman wrote: > Recent change in how get_user() handles pointers [1] has a specific case > for LAM. It assigns a different bitmask that's later used to check > whether a pointer comes from userland in get_user(). > > While currently commented out (until LASS [2] is merged into the kernel) > it's worth making changes to the LAM selftest ahead of time. > > Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses > get_user() in its implementation. Execute the syscall with differently > tagged pointers to verify that valid user pointers are passing through > and invalid kernel/non-canonical pointers are not. > > Code was tested on a Sierra Forest Xeon machine that's LAM capable. The > test was ran without issues with both the LAM lines from [1] untouched > and commented out. The test was also ran without issues with LAM_SUP > both enabled and disabled. > > [1] > https://lore.kernel.org/all/20241024013214.129639-1-torva...@linux-foundation.org/ > [2] > https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shish...@linux.intel.com/ > > Signed-off-by: Maciej Wieczor-Retman > --- > tools/testing/selftests/x86/lam.c | 85 +++ > 1 file changed, 85 insertions(+) > > diff --git a/tools/testing/selftests/x86/lam.c > b/tools/testing/selftests/x86/lam.c > index 0ea4f6813930..3c53d4b7aa61 100644 > --- a/tools/testing/selftests/x86/lam.c > +++ b/tools/testing/selftests/x86/lam.c > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -43,10 +44,19 @@ > #define FUNC_INHERITE 0x20 > #define FUNC_PASID 0x40 > > +/* get_user() pointer test cases */ > +#define GET_USER_USER 0 > +#define GET_USER_KERNEL_TOP 1 > +#define GET_USER_KERNEL_BOT 2 > +#define GET_USER_KERNEL 3 > + > #define TEST_MASK 0x7f > +#define L5_SIGN_EXT_MASK(0xFFUL << 56) > +#define L4_SIGN_EXT_MASK(0x1UL << 47) > > #define LOW_ADDR(0x1UL << 30) > #define HIGH_ADDR (0x3UL << 48) > +#define L5_ADDR (0x1UL << 48) > > #define MALLOC_LEN 32 > > @@ -370,6 +380,54 @@ static int handle_syscall(struct testcases *test) > return ret; > } > > +static int get_user_syscall(struct testcases *test) > +{ > + int ret = 0; > + int ptr_value = 0; > + void *ptr = &ptr_value; > + int fd; > + > + uint64_t bitmask = ((uint64_t)ptr & L5_ADDR) ? L5_SIGN_EXT_MASK : > +L4_SIGN_EXT_MASK; Emm. Do you expect stack to be above at the very top of address space on 5-level paging machines? It is not true. We don't allocate any memory above 46-bit unless asked explicitly. See tools/testing/selftests/mm/va_high_addr_switch.c -- Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCH net-next v6 00/12] selftests: ncdevmem: Add ncdevmem to ksft
On Wed, Oct 30, 2024 at 7:27 AM Stanislav Fomichev wrote: > > The goal of the series is to simplify and make it possible to use > ncdevmem in an automated way from the ksft python wrapper. > > ncdevmem is slowly mutated into a state where it uses stdout > to print the payload and the python wrapper is added to > make sure the arrived payload matches the expected one. > > v6: > - fix compilation issue in 'Unify error handling' patch (Jakub) > Since I saw a compilation failures on a couple of iterations I cherry-picked this locally and tested compilation. I'm seeing this: sudo CFLAGS="-static" make -C ./tools/testing/selftests/drivers/net/hw TARGETS=ncdevmem 2>&1 make: Entering directory '/usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/drivers/net/hw' CC ncdevmem In file included from ncdevmem.c:63: /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:23:43: warning: ‘enum ethtool_header_flags’ declared inside parameter list will not be visible outside of this definition or declaration 23 | const char *ethtool_header_flags_str(enum ethtool_header_flags value); | ^~~~ /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:25:41: warning: ‘enum ethtool_module_fw_flash_status’ declared inside parameter list will not be visible outside of this definition or declaration 25 | ethtool_module_fw_flash_status_str(enum ethtool_module_fw_flash_status value); | ^~ /usr/local/google/home/almasrymina/cos-kernel/tools/testing/selftests/../../../tools/net/ynl/generated/ethtool-user.h:6766:45: error: field ‘status’ has incomplete type 6766 | enum ethtool_module_fw_flash_status status; | ^~ ncdevmem.c: In function ‘do_server’: ncdevmem.c:517:37: error: storage size of ‘token’ isn’t known 517 | struct dmabuf_token token; | ^ ncdevmem.c:542:47: error: ‘SCM_DEVMEM_DMABUF’ undeclared (first use in this function) 542 | (cm->cmsg_type != SCM_DEVMEM_DMABUF && | ^ ncdevmem.c:542:47: note: each undeclared identifier is reported only once for each function it appears in ncdevmem.c:543:47: error: ‘SCM_DEVMEM_LINEAR’ undeclared (first use in this function) 543 | cm->cmsg_type != SCM_DEVMEM_LINEAR)) { | ^ ncdevmem.c:557:52: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 557 | dmabuf_cmsg->frag_size); |^~ ncdevmem.c:562:56: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 562 | token.token_start = dmabuf_cmsg->frag_token; |^~ ncdevmem.c:566:42: error: ‘SO_DEVMEM_DONTNEED’ undeclared (first use in this function) 566 | SO_DEVMEM_DONTNEED, &token, | ^~ ncdevmem.c:573:56: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 573 | token.token_start = dmabuf_cmsg->frag_token; |^~ ncdevmem.c:576:54: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 576 | total_received += dmabuf_cmsg->frag_size; | ^~ ncdevmem.c:579:44: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 579 | dmabuf_cmsg->frag_offset >> PAGE_SHIFT, |^~ ncdevmem.c:580:44: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 580 | dmabuf_cmsg->frag_offset % getpagesize(), |^~ ncdevmem.c:581:44: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 581 | dmabuf_cmsg->frag_offset, |^~ ncdevmem.c:582:44: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 582 | dmabuf_cmsg->frag_size, dmabuf_cmsg->frag_token, |^~ ncdevmem.c:582:68: error: invalid use of undefined type ‘struct dmabuf_cmsg’ 582 | dmabuf_cmsg->frag_size, dmabuf_cmsg->frag_token, |^~ ncdevmem.c:583:60: error: invalid use of undefined type ‘struct dmabuf_cmsg’
Re: [PATCH RFT v11 0/8] fork: Support shadow stacks in clone3()
On Wed, Oct 30, 2024 at 02:08:59PM +, Mark Brown wrote: > On Sat, Oct 05, 2024 at 11:31:27AM +0100, Mark Brown wrote: > > The kernel has recently added support for shadow stacks, currently > > x86 only using their CET feature but both arm64 and RISC-V have > > equivalent features (GCS and Zicfiss respectively), I am actively > > working on GCS[1]. With shadow stacks the hardware maintains an > > additional stack containing only the return addresses for branch > > instructions which is not generally writeable by userspace and ensures > > that any returns are to the recorded addresses. This provides some > > protection against ROP attacks and making it easier to collect call > > stacks. These shadow stacks are allocated in the address space of the > > userspace process. > > Does anyone have any thoughts on this? I reworked things to specify the > address for the shadow stack pointer rather than the extent of the stack > as Rick and Yuri suggested, otherwise the only change from the prior > version was rebasing onto the arm64 GCS support since that's queued in > -next. I think the only substantial question is picking the ABI for > specifying the shadow stack. I will need more time to review this as both my primary and shadow stacks are full with other work. At a glance, I cannot offer any informed opinion for choosing ABI atm. Apologies for the delay. Kind regards, Yury
Re: [PATCH] selftests/lam: Test get_user() LAM pointer handling
On 2024-10-30 at 14:31:51 +0200, Kirill A. Shutemov wrote: >On Tue, Oct 29, 2024 at 03:14:20PM +0100, Maciej Wieczor-Retman wrote: >> Recent change in how get_user() handles pointers [1] has a specific case >> for LAM. It assigns a different bitmask that's later used to check >> whether a pointer comes from userland in get_user(). >> >> While currently commented out (until LASS [2] is merged into the kernel) >> it's worth making changes to the LAM selftest ahead of time. >> >> Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses >> get_user() in its implementation. Execute the syscall with differently >> tagged pointers to verify that valid user pointers are passing through >> and invalid kernel/non-canonical pointers are not. >> >> Code was tested on a Sierra Forest Xeon machine that's LAM capable. The >> test was ran without issues with both the LAM lines from [1] untouched >> and commented out. The test was also ran without issues with LAM_SUP >> both enabled and disabled. >> >> [1] >> https://lore.kernel.org/all/20241024013214.129639-1-torva...@linux-foundation.org/ >> [2] >> https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shish...@linux.intel.com/ >> >> Signed-off-by: Maciej Wieczor-Retman >> --- >> tools/testing/selftests/x86/lam.c | 85 +++ >> 1 file changed, 85 insertions(+) >> >> diff --git a/tools/testing/selftests/x86/lam.c >> b/tools/testing/selftests/x86/lam.c >> index 0ea4f6813930..3c53d4b7aa61 100644 >> --- a/tools/testing/selftests/x86/lam.c >> +++ b/tools/testing/selftests/x86/lam.c >> @@ -4,6 +4,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -43,10 +44,19 @@ >> #define FUNC_INHERITE 0x20 >> #define FUNC_PASID 0x40 >> >> +/* get_user() pointer test cases */ >> +#define GET_USER_USER 0 >> +#define GET_USER_KERNEL_TOP 1 >> +#define GET_USER_KERNEL_BOT 2 >> +#define GET_USER_KERNEL 3 >> + >> #define TEST_MASK 0x7f >> +#define L5_SIGN_EXT_MASK(0xFFUL << 56) >> +#define L4_SIGN_EXT_MASK(0x1UL << 47) >> >> #define LOW_ADDR(0x1UL << 30) >> #define HIGH_ADDR (0x3UL << 48) >> +#define L5_ADDR (0x1UL << 48) >> >> #define MALLOC_LEN 32 >> >> @@ -370,6 +380,54 @@ static int handle_syscall(struct testcases *test) >> return ret; >> } >> >> +static int get_user_syscall(struct testcases *test) >> +{ >> +int ret = 0; >> +int ptr_value = 0; >> +void *ptr = &ptr_value; >> +int fd; >> + >> +uint64_t bitmask = ((uint64_t)ptr & L5_ADDR) ? L5_SIGN_EXT_MASK : >> + L4_SIGN_EXT_MASK; > >Emm. Do you expect stack to be above at the very top of address space on >5-level paging machines? It is not true. We don't allocate any memory >above 46-bit unless asked explicitly. Right, I'm not sure why I thought that would work here. >See tools/testing/selftests/mm/va_high_addr_switch.c Thanks for the tip, I'll use mmap/munmap to determine the enabled pagetable level. > >-- > Kiryl Shutsemau / Kirill A. Shutemov -- Kind regards Maciej Wieczór-Retman
Re: [PATCH RFT v11 0/8] fork: Support shadow stacks in clone3()
On Sat, Oct 05, 2024 at 11:31:27AM +0100, Mark Brown wrote: > The kernel has recently added support for shadow stacks, currently > x86 only using their CET feature but both arm64 and RISC-V have > equivalent features (GCS and Zicfiss respectively), I am actively > working on GCS[1]. With shadow stacks the hardware maintains an > additional stack containing only the return addresses for branch > instructions which is not generally writeable by userspace and ensures > that any returns are to the recorded addresses. This provides some > protection against ROP attacks and making it easier to collect call > stacks. These shadow stacks are allocated in the address space of the > userspace process. Does anyone have any thoughts on this? I reworked things to specify the address for the shadow stack pointer rather than the extent of the stack as Rick and Yuri suggested, otherwise the only change from the prior version was rebasing onto the arm64 GCS support since that's queued in -next. I think the only substantial question is picking the ABI for specifying the shadow stack. signature.asc Description: PGP signature
Re: [PATCH v5 03/13] iommufd: Add iommufd_verify_unfinalized_object
On Tue, Oct 29, 2024 at 09:05:14PM -0700, Nicolin Chen wrote: > On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote: > > > I think we'd need the same change in iommufd_object_abort() too. > > > > Makes sense > > I found xa_cmpxchg() does xas_result to its returning value, which > turns XA_ZERO_ENTRY into NULL failing our intended verifications. Oh.. that is annoying, you can't actually tell if cmpxchg failed :\ NULL means success if it was XA_ZERO_ENTRY and failure of it was not populated! Hmm, I might ask Matthew about this Your version looks OK Jason
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
On Wed, Oct 30, 2024 at 03:55:56AM +, Cheng-Jui Wang (王正睿) wrote: > On Tue, 2024-10-29 at 09:29 -0700, Paul E. McKenney wrote: > > External email : Please do not click links or open attachments until you > > have verified the sender or the content. > > > > > > On Tue, Oct 29, 2024 at 02:20:51AM +, Cheng-Jui Wang (王正睿) wrote: > > > On Mon, 2024-10-28 at 17:22 -0700, Paul E. McKenney wrote: > > > > The result is that the current leaf rcu_node structure's ->lock is > > > > acquired only if a stack backtrace might be needed from the current CPU, > > > > and is held across only that CPU's backtrace. As a result, if there are > > > > > > After upgrading our device to kernel-6.11, we encountered a lockup > > > scenario under stall warning. > > > I had prepared a patch to submit, but I noticed that this series has > > > already addressed some issues, though it hasn't been merged into the > > > mainline yet. So, I decided to reply to this series for discussion on > > > how to fix it before pushing. Here is the lockup scenario We > > > encountered: > > > > > > Devices: arm64 with only 8 cores > > > One CPU holds rnp->lock in rcu_dump_cpu_stack() while trying to dump > > > other CPUs, but it waits for the corresponding CPU to dump backtrace, > > > with a 10-second timeout. > > > > > >__delay() > > >__const_udelay() > > >nmi_trigger_cpumask_backtrace() > > >arch_trigger_cpumask_backtrace() > > >trigger_single_cpu_backtrace() > > >dump_cpu_task() > > >rcu_dump_cpu_stacks() <- holding rnp->lock > > >print_other_cpu_stall() > > >check_cpu_stall() > > >rcu_pending() > > >rcu_sched_clock_irq() > > >update_process_times() > > > > > > However, the other 7 CPUs are waiting for rnp->lock on the path to > > > report qs. > > > > > >queued_spin_lock_slowpath() > > >queued_spin_lock() > > >do_raw_spin_lock() > > >__raw_spin_lock_irqsave() > > >_raw_spin_lock_irqsave() > > >rcu_report_qs_rdp() > > >rcu_check_quiescent_state() > > >rcu_core() > > >rcu_core_si() > > >handle_softirqs() > > >__do_softirq() > > >do_softirq() > > >call_on_irq_stack() > > > > > > Since the arm64 architecture uses IPI instead of true NMI to implement > > > arch_trigger_cpumask_backtrace(), spin_lock_irqsave disables > > > interrupts, which is enough to block this IPI request. > > > Therefore, if other CPUs start waiting for the lock before receiving > > > the IPI, a semi-deadlock scenario like the following occurs: > > > > > > CPU0CPU1CPU2 > > > - - - > > > lock_irqsave(rnp->lock) > > > lock_irqsave(rnp->lock) > > > > > > > > > > > > lock_irqsave(rnp->lock) > > > > > > > > > > > > ... > > > > > > > > > In our scenario, with 7 CPUs to dump, the lockup takes nearly 70 > > > seconds, causing subsequent useful logs to be unable to print, leading > > > to a watchdog timeout and system reboot. > > > > > > This series of changes re-acquires the lock after each dump, > > > significantly reducing lock-holding time. However, since it still holds > > > the lock while dumping CPU backtrace, there's still a chance for two > > > CPUs to wait for each other for 10 seconds, which is still too long. > > > So, I would like to ask if it's necessary to dump backtrace within the > > > spinlock section? > > > If not, especially now that lockless checks are possible, maybe it can > > > be changed as follows? > > > > > > - if (!(data_race(rnp->qsmask) & > > > leaf_node_cpu_bit(rnp, cpu))) > > > - continue; > > > - raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > - if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) { > > > + if (data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, > > > cpu)) { > > >
Re: [PATCH] selftests: netfilter: remove unused parameter
Applied to nf.git, thanks
Re: [PATCH v2 10/13] media: i2c: imx214: Implement vflip/hflip controls
On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > The imx214 sensor supports horizontal and vertical flipping. Add > appropriate controls to the driver. > > Signed-off-by: André Apitzsch Acked-by: Ricardo Ribalda > --- > drivers/media/i2c/imx214.c | 69 > -- > 1 file changed, 61 insertions(+), 8 deletions(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > 87a03e292e19ccd71f1b2dcee3409826b2f5cb6f..f2d21c2e8cf84ed25403c98e280073f32e50e758 > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -30,7 +30,6 @@ > #define IMX214_DEFAULT_LINK_FREQ 6 > #define IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ * 8LL) / 10) > #define IMX214_FPS 30 > -#define IMX214_MBUS_CODE MEDIA_BUS_FMT_SRGGB10_1X10 > > /* V-TIMING internal */ > #define IMX214_REG_FRM_LENGTH_LINESCCI_REG16(0x0340) > @@ -189,6 +188,22 @@ static const char * const imx214_supply_name[] = { > > #define IMX214_NUM_SUPPLIES ARRAY_SIZE(imx214_supply_name) > > +/* > + * The supported formats. > + * This table MUST contain 4 entries per format, to cover the various flip > + * combinations in the order > + * - no flip > + * - h flip > + * - v flip > + * - h&v flips > + */ > +static const u32 imx214_mbus_formats[] = { > + MEDIA_BUS_FMT_SRGGB10_1X10, > + MEDIA_BUS_FMT_SGRBG10_1X10, > + MEDIA_BUS_FMT_SGBRG10_1X10, > + MEDIA_BUS_FMT_SBGGR10_1X10, > +}; > + > struct imx214 { > struct device *dev; > struct clk *xclk; > @@ -204,6 +219,10 @@ struct imx214 { > struct v4l2_ctrl *hblank; > struct v4l2_ctrl *exposure; > struct v4l2_ctrl *unit_size; > + struct { > + struct v4l2_ctrl *hflip; > + struct v4l2_ctrl *vflip; > + }; > > const struct imx214_mode *cur_mode; > > @@ -339,7 +358,6 @@ static const struct cci_reg_sequence mode_table_common[] > = { > > /* global setting */ > /* basic config */ > - { IMX214_REG_ORIENTATION, 0 }, > { IMX214_REG_MASK_CORR_FRAMES, IMX214_CORR_FRAMES_MASK }, > { IMX214_REG_FAST_STANDBY_CTRL, 1 }, > { IMX214_REG_LINE_LENGTH_PCK, IMX214_PPL_DEFAULT }, > @@ -518,11 +536,21 @@ static int __maybe_unused imx214_power_off(struct > device *dev) > return 0; > } > > +/* Get bayer order based on flip setting. */ > +static u32 imx214_get_format_code(struct imx214 *imx214) > +{ > + unsigned int i; > + > + i = (imx214->vflip->val ? 2 : 0) | (imx214->hflip->val ? 1 : 0); > + > + return imx214_mbus_formats[i]; > +} > + > static void imx214_update_pad_format(struct imx214 *imx214, > const struct imx214_mode *mode, > struct v4l2_mbus_framefmt *fmt, u32 code) > { > - fmt->code = IMX214_MBUS_CODE; > + fmt->code = imx214_get_format_code(imx214); > fmt->width = mode->width; > fmt->height = mode->height; > fmt->field = V4L2_FIELD_NONE; > @@ -538,10 +566,12 @@ static int imx214_enum_mbus_code(struct v4l2_subdev *sd, > struct v4l2_subdev_state *sd_state, > struct v4l2_subdev_mbus_code_enum *code) > { > - if (code->index > 0) > + struct imx214 *imx214 = to_imx214(sd); > + > + if (code->index >= (ARRAY_SIZE(imx214_mbus_formats) / 4)) > return -EINVAL; > > - code->code = IMX214_MBUS_CODE; > + code->code = imx214_get_format_code(imx214); > > return 0; > } > @@ -550,7 +580,11 @@ static int imx214_enum_frame_size(struct v4l2_subdev > *subdev, > struct v4l2_subdev_state *sd_state, > struct v4l2_subdev_frame_size_enum *fse) > { > - if (fse->code != IMX214_MBUS_CODE) > + struct imx214 *imx214 = to_imx214(subdev); > + u32 code; > + > + code = imx214_get_format_code(imx214); > + if (fse->code != code) > return -EINVAL; > > if (fse->index >= ARRAY_SIZE(imx214_modes)) > @@ -713,6 +747,7 @@ static int imx214_entity_init_state(struct v4l2_subdev > *subdev, > struct v4l2_subdev_format fmt = { }; > > fmt.which = sd_state ? V4L2_SUBDEV_FORMAT_TRY : > V4L2_SUBDEV_FORMAT_ACTIVE; > + fmt.format.code = MEDIA_BUS_FMT_SRGGB10_1X10; > fmt.format.width = imx214_modes[0].width; > fmt.format.height = imx214_modes[0].height; > > @@ -755,6 +790,11 @@ static int imx214_set_ctrl(struct v4l2_ctrl *ctrl) > case V4L2_CID_EXPOSURE: > cci_write(imx214->regmap, IMX214_REG_EXPOSURE, ctrl->val, > &ret); > break; > + case V4L2_CID_HFLIP: > + case V4L2_CID_VFLIP: > + cci_write(imx214->regmap, IMX214_REG_ORIENTATION, > + imx214->hflip->val | imx214->vflip->val <<
Re: [PATCH v2 09/13] media: i2c: imx214: Extract format and crop settings
Hi Aren't you changing the binning mode for 1920x1080 with this patch? I think that could be considered an ABI change. Also, if we are not letting the user change the value, I do not see much value in setting the cropping programmatically, I'd rather not take this change. On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > Remove format and crop settings from register sequences and set them > programmatically. > > Signed-off-by: André Apitzsch > --- > drivers/media/i2c/imx214.c | 129 > ++--- > 1 file changed, 97 insertions(+), 32 deletions(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > cb443d8bee6fe72dc9378b2c2d3caae09f8642c5..87a03e292e19ccd71f1b2dcee3409826b2f5cb6f > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -96,6 +96,9 @@ > #define IMX214_REG_PREPLLCK_VT_DIV CCI_REG8(0x0305) > #define IMX214_REG_PLL_VT_MPY CCI_REG16(0x0306) > #define IMX214_REG_OPPXCK_DIV CCI_REG8(0x0309) > +#define IMX214_OPPXCK_DIV_COMP66 > +#define IMX214_OPPXCK_DIV_COMP88 > +#define IMX214_OPPXCK_DIV_RAW1010 > #define IMX214_REG_OPSYCK_DIV CCI_REG8(0x030b) > #define IMX214_REG_PLL_MULT_DRIV CCI_REG8(0x0310) > #define IMX214_PLL_SINGLE 0 > @@ -132,6 +135,9 @@ > #define IMX214_BINNING_NONE0 > #define IMX214_BINNING_ENABLE 1 > #define IMX214_REG_BINNING_TYPECCI_REG8(0x0901) > +#define IMX214_BINNING_1X1 0 > +#define IMX214_BINNING_2X2 0x22 > +#define IMX214_BINNING_4X4 0x44 > #define IMX214_REG_BINNING_WEIGHTING CCI_REG8(0x0902) > #define IMX214_BINNING_AVERAGE 0x00 > #define IMX214_BINNING_SUMMED 0x01 > @@ -211,36 +217,22 @@ static const struct cci_reg_sequence mode_4096x2304[] = > { > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > - { IMX214_REG_X_ADD_STA, 56 }, > - { IMX214_REG_Y_ADD_STA, 408 }, > - { IMX214_REG_X_ADD_END, 4151 }, > - { IMX214_REG_Y_ADD_END, 2711 }, > { IMX214_REG_X_EVEN_INC, 1 }, > { IMX214_REG_X_ODD_INC, 1 }, > { IMX214_REG_Y_EVEN_INC, 1 }, > { IMX214_REG_Y_ODD_INC, 1 }, > - { IMX214_REG_BINNING_MODE, IMX214_BINNING_NONE }, > - { IMX214_REG_BINNING_TYPE, 0 }, > { IMX214_REG_BINNING_WEIGHTING, IMX214_BINNING_AVERAGE }, > { CCI_REG8(0x3000), 0x35 }, > { CCI_REG8(0x3054), 0x01 }, > { CCI_REG8(0x305C), 0x11 }, > > - { IMX214_REG_CSI_DATA_FORMAT, IMX214_CSI_DATA_FORMAT_RAW10 }, > - { IMX214_REG_X_OUTPUT_SIZE, 4096 }, > - { IMX214_REG_Y_OUTPUT_SIZE, 2304 }, > { IMX214_REG_SCALE_MODE, IMX214_SCALE_NONE }, > { IMX214_REG_SCALE_M, 2 }, > - { IMX214_REG_DIG_CROP_X_OFFSET, 0 }, > - { IMX214_REG_DIG_CROP_Y_OFFSET, 0 }, > - { IMX214_REG_DIG_CROP_WIDTH, 4096 }, > - { IMX214_REG_DIG_CROP_HEIGHT, 2304 }, > > { IMX214_REG_VTPXCK_DIV, 5 }, > { IMX214_REG_VTSYCK_DIV, 2 }, > { IMX214_REG_PREPLLCK_VT_DIV, 3 }, > { IMX214_REG_PLL_VT_MPY, 150 }, > - { IMX214_REG_OPPXCK_DIV, 10 }, > { IMX214_REG_OPSYCK_DIV, 1 }, > { IMX214_REG_PLL_MULT_DRIV, IMX214_PLL_SINGLE }, > > @@ -281,36 +273,22 @@ static const struct cci_reg_sequence mode_1920x1080[] = > { > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > - { IMX214_REG_X_ADD_STA, 1144 }, > - { IMX214_REG_Y_ADD_STA, 1020 }, > - { IMX214_REG_X_ADD_END, 3063 }, > - { IMX214_REG_Y_ADD_END, 2099 }, > { IMX214_REG_X_EVEN_INC, 1 }, > { IMX214_REG_X_ODD_INC, 1 }, > { IMX214_REG_Y_EVEN_INC, 1 }, > { IMX214_REG_Y_ODD_INC, 1 }, > - { IMX214_REG_BINNING_MODE, IMX214_BINNING_NONE }, > - { IMX214_REG_BINNING_TYPE, 0 }, > { IMX214_REG_BINNING_WEIGHTING, IMX214_BINNING_AVERAGE }, > { CCI_REG8(0x3000), 0x35 }, > { CCI_REG8(0x3054), 0x01 }, > { CCI_REG8(0x305C), 0x11 }, > > - { IMX214_REG_CSI_DATA_FORMAT, IMX214_CSI_DATA_FORMAT_RAW10 }, > - { IMX214_REG_X_OUTPUT_SIZE, 1920 }, > - { IMX214_REG_Y_OUTPUT_SIZE, 1080 }, > { IMX214_REG_SCALE_MODE, IMX214_SCALE_NONE }, > { IMX214_REG_SCALE_M, 2 }, > - { IMX214_REG_DIG_CROP_X_OFFSET, 0 }, > - { IMX214_REG_DIG_CROP_Y_OFFSET, 0 }, > - { IMX214_REG_DIG_CROP_WIDTH, 1920 }, > - { IMX214_REG_DIG_CROP_HEIGHT, 1080 }, > > { IMX214_REG_VTPXCK_DIV, 5 }, > { IMX214_REG_VTSYCK_DIV, 2 }, > { IMX214_REG_PREPLLCK_VT_DIV, 3 }, > { IMX214_REG_PLL_VT_MPY, 150 }, > - { IM
Re: [PATCH v2 08/13] media: i2c: imx214: Add vblank and hblank controls
On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > Add vblank control to allow changing the framerate / > higher exposure values. > > The vblank and hblank controls are needed for libcamera support. > > While at it, fix the minimal exposure time according to the datasheet. > > Signed-off-by: André Apitzsch > --- > drivers/media/i2c/imx214.c | 119 > - > 1 file changed, 97 insertions(+), 22 deletions(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > 497baad616ad7374a92a3da2b7c1096b1d72a0c7..cb443d8bee6fe72dc9378b2c2d3caae09f8642c5 > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -34,11 +34,18 @@ > > /* V-TIMING internal */ > #define IMX214_REG_FRM_LENGTH_LINESCCI_REG16(0x0340) > +#define IMX214_VTS_MAX 0x > + > +#define IMX214_VBLANK_MIN 890 > + > +/* HBLANK control - read only */ > +#define IMX214_PPL_DEFAULT 5008 > > /* Exposure control */ > #define IMX214_REG_EXPOSURECCI_REG16(0x0202) > -#define IMX214_EXPOSURE_MIN0 > -#define IMX214_EXPOSURE_MAX3184 > +#define IMX214_EXPOSURE_OFFSET 10 > +#define IMX214_EXPOSURE_MIN1 > +#define IMX214_EXPOSURE_MAX(IMX214_VTS_MAX - > IMX214_EXPOSURE_OFFSET) this definition is never used > #define IMX214_EXPOSURE_STEP 1 > #define IMX214_EXPOSURE_DEFAULT3184 > #define IMX214_REG_EXPOSURE_RATIO CCI_REG8(0x0222) > @@ -187,6 +194,8 @@ struct imx214 { > struct v4l2_ctrl_handler ctrls; > struct v4l2_ctrl *pixel_rate; > struct v4l2_ctrl *link_freq; > + struct v4l2_ctrl *vblank; > + struct v4l2_ctrl *hblank; > struct v4l2_ctrl *exposure; > struct v4l2_ctrl *unit_size; > > @@ -202,8 +211,6 @@ static const struct cci_reg_sequence mode_4096x2304[] = { > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > - { IMX214_REG_FRM_LENGTH_LINES, 3194 }, > - { IMX214_REG_LINE_LENGTH_PCK, 5008 }, > { IMX214_REG_X_ADD_STA, 56 }, > { IMX214_REG_Y_ADD_STA, 408 }, > { IMX214_REG_X_ADD_END, 4151 }, > @@ -274,8 +281,6 @@ static const struct cci_reg_sequence mode_1920x1080[] = { > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > - { IMX214_REG_FRM_LENGTH_LINES, 3194 }, > - { IMX214_REG_LINE_LENGTH_PCK, 5008 }, > { IMX214_REG_X_ADD_STA, 1144 }, > { IMX214_REG_Y_ADD_STA, 1020 }, > { IMX214_REG_X_ADD_END, 3063 }, > @@ -359,6 +364,7 @@ static const struct cci_reg_sequence mode_table_common[] > = { > { IMX214_REG_ORIENTATION, 0 }, > { IMX214_REG_MASK_CORR_FRAMES, IMX214_CORR_FRAMES_MASK }, > { IMX214_REG_FAST_STANDBY_CTRL, 1 }, > + { IMX214_REG_LINE_LENGTH_PCK, IMX214_PPL_DEFAULT }, > { CCI_REG8(0x4550), 0x02 }, > { CCI_REG8(0x4601), 0x00 }, > { CCI_REG8(0x4642), 0x05 }, > @@ -462,18 +468,24 @@ static const struct cci_reg_sequence > mode_table_common[] = { > static const struct imx214_mode { > u32 width; > u32 height; > + > + /* V-timing */ > + unsigned int vts_def; > + > unsigned int num_of_regs; > const struct cci_reg_sequence *reg_table; > } imx214_modes[] = { > { > .width = 4096, > .height = 2304, > + .vts_def = 3194, > .num_of_regs = ARRAY_SIZE(mode_4096x2304), > .reg_table = mode_4096x2304, > }, > { > .width = 1920, > .height = 1080, > + .vts_def = 3194, > .num_of_regs = ARRAY_SIZE(mode_1920x1080), > .reg_table = mode_1920x1080, > }, > @@ -629,9 +641,39 @@ static int imx214_set_format(struct v4l2_subdev *sd, > __crop->width = mode->width; > __crop->height = mode->height; > > - if (format->which == V4L2_SUBDEV_FORMAT_ACTIVE) > + if (format->which == V4L2_SUBDEV_FORMAT_ACTIVE) { > + int exposure_max; > + int exposure_def; > + int hblank; > + > imx214->cur_mode = mode; > > + /* Update limits and set FPS to default */ > + __v4l2_ctrl_modify_range(imx214->vblank, IMX214_VBLANK_MIN, > +IMX214_VTS_MAX - mode->height, 2, > +mode->vts_def - mode->height); > + __v4l2_ctrl_s_ctrl(imx214->vblank, > + mode->vts_def - mode->height); Do we need to set FPS to default? > + > + /* Update max exposure
Re: [PATCH v2 07/13] media: i2c: imx214: Check number of lanes from device tree
On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > The imx214 camera is capable of either two-lane or four-lane operation. > > Currently only the four-lane mode is supported, as proper pixel rates > and link frequences for the two-lane mode are unknown. > > Signed-off-by: André Apitzsch > --- > drivers/media/i2c/imx214.c | 26 +++--- > 1 file changed, 19 insertions(+), 7 deletions(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > 0c83149bcc3e3b833a087d26104eb7dfaafdf904..497baad616ad7374a92a3da2b7c1096b1d72a0c7 > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -199,7 +199,6 @@ struct imx214 { > > /*From imx214_mode_tbls.h*/ > static const struct cci_reg_sequence mode_4096x2304[] = { > - { IMX214_REG_CSI_LANE_MODE, IMX214_CSI_4_LANE_MODE }, > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > @@ -272,7 +271,6 @@ static const struct cci_reg_sequence mode_4096x2304[] = { > }; > > static const struct cci_reg_sequence mode_1920x1080[] = { > - { IMX214_REG_CSI_LANE_MODE, IMX214_CSI_4_LANE_MODE }, > { IMX214_REG_HDR_MODE, IMX214_HDR_MODE_OFF }, > { IMX214_REG_HDR_RES_REDUCTION, IMX214_HDR_RES_REDU_THROUGH }, > { IMX214_REG_EXPOSURE_RATIO, 1 }, > @@ -791,6 +789,13 @@ static int imx214_start_streaming(struct imx214 *imx214) > return ret; > } > > + ret = cci_write(imx214->regmap, IMX214_REG_CSI_LANE_MODE, > + IMX214_CSI_4_LANE_MODE, NULL); > + if (ret) { > + dev_err(imx214->dev, "%s failed to configure lanes\n", > __func__); > + return ret; > + } > + > ret = cci_multi_reg_write(imx214->regmap, imx214->cur_mode->reg_table, > imx214->cur_mode->num_of_regs, NULL); > if (ret < 0) { > @@ -932,7 +937,7 @@ static int imx214_get_regulators(struct device *dev, > struct imx214 *imx214) >imx214->supplies); > } > > -static int imx214_parse_fwnode(struct device *dev) > +static int imx214_parse_fwnode(struct device *dev, struct imx214 *imx214) We don't seem to use imx214 in the function. You probably do not want to add this change. > { > struct fwnode_handle *endpoint; > struct v4l2_fwnode_endpoint bus_cfg = { > @@ -951,6 +956,13 @@ static int imx214_parse_fwnode(struct device *dev) > goto done; > } > > + /* Check the number of MIPI CSI2 data lanes */ > + if (bus_cfg.bus.mipi_csi2.num_data_lanes != 4) { > + dev_err_probe(dev, -EINVAL, > + "only 4 data lanes are currently supported\n"); > + goto done; > + } > + > for (i = 0; i < bus_cfg.nr_of_link_frequencies; i++) > if (bus_cfg.link_frequencies[i] == IMX214_DEFAULT_LINK_FREQ) > break; > @@ -975,14 +987,14 @@ static int imx214_probe(struct i2c_client *client) > struct imx214 *imx214; > int ret; > > - ret = imx214_parse_fwnode(dev); > - if (ret) > - return ret; > - > imx214 = devm_kzalloc(dev, sizeof(*imx214), GFP_KERNEL); > if (!imx214) > return -ENOMEM; > > + ret = imx214_parse_fwnode(dev, imx214); > + if (ret) > + return ret; I am not against changing the order... but the commit message does not mention it. > + > imx214->dev = dev; > > imx214->xclk = devm_clk_get(dev, NULL); > > -- > 2.47.0 > >
Re: [PATCH v2 01/13] media: i2c: imx214: Fix link frequency
Hi Andre On Mon, Oct 21, 2024 at 12:14 AM André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > The driver defines IMX214_DEFAULT_LINK_FREQ 48000, and then > IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ * 8LL) / 10), > which works out as 384MPix/s. (The 8 is 4 lanes and DDR.) > > Parsing the PLL registers with the defined 24MHz input. We're in single > PLL mode, so MIPI frequency is directly linked to pixel rate. VTCK ends > up being 1200MHz, and VTPXCK and OPPXCK both are 120MHz. Section 5.3 > "Frame rate calculation formula" says "Pixel rate > [pixels/s] = VTPXCK [MHz] * 4", so 120 * 4 = 480MPix/s, which basically > agrees with my number above. > > 3.1.4. MIPI global timing setting says "Output bitrate = OPPXCK * reg > 0x113[7:0]", so 120MHz * 10, or 1200Mbit/s. That would be a link > frequency of 600MHz due to DDR. > That also matches to 480MPix/s * 10bpp / 4 lanes / 2 for DDR. > I think your calculations are correct and the value should be 600M... but if we land this change, there will be boards that will stop working until they update their dts. Not sure if we allow that. Can we move this change to the last one of the series and add something like: diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c index 2aca3d88a0a7..8b4ded4cb5ce 100644 --- a/drivers/media/i2c/imx214.c +++ b/drivers/media/i2c/imx214.c @@ -1281,13 +1281,9 @@ static int imx214_parse_fwnode(struct device *dev, struct imx214 *imx214) if (bus_cfg.link_frequencies[i] == IMX214_DEFAULT_LINK_FREQ) break; - if (i == bus_cfg.nr_of_link_frequencies) { - dev_err_probe(dev, -EINVAL, - "link-frequencies %d not supported, Please review your DT\n", + if (i == bus_cfg.nr_of_link_frequencies) + dev_warn(dev, "link-frequencies %d not supported, Please review your DT. Continuing anyway\n", IMX214_DEFAULT_LINK_FREQ); - ret = -EINVAL; - goto done; - } > Signed-off-by: André Apitzsch > --- > drivers/media/i2c/imx214.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > index > 4962cfe7c83d62425aeccb46a400fa93146f14ea..5d411452d342fdb177619cd1c9fd9650d31089bb > 100644 > --- a/drivers/media/i2c/imx214.c > +++ b/drivers/media/i2c/imx214.c > @@ -24,7 +24,7 @@ > #define IMX214_MODE_STREAMING 0x01 > > #define IMX214_DEFAULT_CLK_FREQ2400 > -#define IMX214_DEFAULT_LINK_FREQ 48000 > +#define IMX214_DEFAULT_LINK_FREQ 6 > #define IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ * 8LL) / 10) > #define IMX214_FPS 30 > #define IMX214_MBUS_CODE MEDIA_BUS_FMT_SRGGB10_1X10 > > -- > 2.47.0 > >
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
在 2024/10/30 18:49, Matthieu Baerts 写道: Hi Tao Chen, Thank you for having shared this patch. On 30/10/2024 11:01, Tao Chen wrote: Fix compile error when MPTCP feature not support, though eBPF core check already done which seems invalid in this situation, the error info like: progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct tcp_sock' 49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? The filed created in new definitions with eBPF core feature to solve this build problem, and test case result still ok in MPTCP kernel. 176/1 mptcp/base:OK 176/2 mptcp/mptcpify:OK 176 mptcp:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") The commit you mentioned here is more than 2 years old, and as far as I can see, nobody else reported this compilation issue. I guess that's because people used tools/testing/selftests/bpf/config file as expected to populate the kernel config, and I suppose you didn't, right? Hi Matt, thank you for your reply, as you said, i did not use tools/testing/selftests/bpf/config to compile kernel, i will use this helpful feature. I don't think other BPF selftests check for missing kernel config if they are specified in the 'config' file, but even if it is the case, I think it would be better to skip all the MPTCP tests, and not try to have them checking something that doesn't exist: no need to validate these tests if the expected kernel config has not been enabled. If i use the kernel not support MPTCP, the compile error still exists, and i can not build the bpf test successfully. Maybe skill the test case seems better when kernel not support. Now that bpf_core_field_exists check already used in the code, i think it is better to use new definition mode. But again, please correct me if I'm wrong, but I don't think there is anything to change here to fix your compilation issue: simply make sure to use this tools/testing/selftests/bpf/config file to generate your kernel config, no? Cheers, Matt -- Best Regards Dylane Chen
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
Hi Tao Chen, Thank you for having shared this patch. On 30/10/2024 11:01, Tao Chen wrote: > Fix compile error when MPTCP feature not support, though eBPF core check > already done which seems invalid in this situation, the error info like: > progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct > tcp_sock' >49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? > > The filed created in new definitions with eBPF core feature to solve > this build problem, and test case result still ok in MPTCP kernel. > > 176/1 mptcp/base:OK > 176/2 mptcp/mptcpify:OK > 176 mptcp:OK > Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED > > Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") The commit you mentioned here is more than 2 years old, and as far as I can see, nobody else reported this compilation issue. I guess that's because people used tools/testing/selftests/bpf/config file as expected to populate the kernel config, and I suppose you didn't, right? I don't think other BPF selftests check for missing kernel config if they are specified in the 'config' file, but even if it is the case, I think it would be better to skip all the MPTCP tests, and not try to have them checking something that doesn't exist: no need to validate these tests if the expected kernel config has not been enabled. But again, please correct me if I'm wrong, but I don't think there is anything to change here to fix your compilation issue: simply make sure to use this tools/testing/selftests/bpf/config file to generate your kernel config, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH v2 3/3] softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
On 2024-10-29 22:47:21 [+0100], Frederic Weisbecker wrote: > Le Tue, Oct 29, 2024 at 02:52:31PM +0100, Sebastian Andrzej Siewior a écrit : > > On 2024-10-28 15:01:55 [+0100], Frederic Weisbecker wrote: > > > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > > > > index 457151f9f263d..9637af78087f3 100644 > > > > --- a/include/linux/interrupt.h > > > > +++ b/include/linux/interrupt.h > > > > @@ -616,6 +616,50 @@ extern void __raise_softirq_irqoff(unsigned int > > > > nr); > > > > extern void raise_softirq_irqoff(unsigned int nr); > > > > extern void raise_softirq(unsigned int nr); > > > > > > > > +/* > > > > + * Handle timers in a dedicated thread at a low SCHED_FIFO priority > > > > instead in > > > > + * ksoftirqd as to be prefred over SCHED_NORMAL tasks. > > > > + */ > > > > > > This doesn't parse. How about, inspired by your changelog: > > … > > > > What about this essay instead: > > > > | With forced-threaded interrupts enabled a raised softirq is deferred to > > | ksoftirqd unless it can be handled within the threaded interrupt. This > > | affects timer_list timers and hrtimers which are explicitly marked with > > | HRTIMER_MODE_SOFT. > > | With PREEMPT_RT enabled more hrtimers are moved to softirq for processing > > | which includes all timers which are not explicitly marked > > HRTIMER_MODE_HARD. > > | Userspace controlled timers (like the clock_nanosleep() interface) is > > divided > > | into two categories: Tasks with elevated scheduling policy including > > | SCHED_{FIFO|RR|DL} and the remaining scheduling policy. The tasks with the > > | elevated scheduling policy are woken up directly from the HARDIRQ while > > all > > | other wake ups are delayed to so softirq and so to ksoftirqd. > > First "so" is intended? No, it needs to go. > > | > > | The ksoftirqd runs at SCHED_OTHER policy at which it should remain since > > it > > | handles the softirq in an overloaded situation (not handled everything > > | within its last run). > > | If the timers are handled at SCHED_OTHER priority then they competes with > > all > > | other SCHED_OTHER tasks for CPU resources are possibly delayed. > > | Moving timers softirqs to a low priority SCHED_FIFO thread instead ensures > > | that timer are performed before scheduling any SCHED_OTHER thread. > > Works for me! Great. > > And with this piece of text I convinced myself to also enable this in > > the forced-threaded case. > > Yes, that makes sense. Good. Then I'm pick to up your tag for that patch. Thank you. > Thanks. > Sebastian
Re: [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb
Hi Ackerley, Due to actual customer requirements(such as ByteDance), I have added support for NUMA policy based on your foundation. Standing on the shoulders of giants, please correct me if there is anyting wrong. --- Thanks Jun.miao On 2024/9/11 07:43, Ackerley Tng wrote: If HugeTLB is requested at guest_memfd creation time, HugeTLB pages will be used to back guest_memfd. Signed-off-by: Ackerley Tng --- virt/kvm/guest_memfd.c | 252 ++--- 1 file changed, 239 insertions(+), 13 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 31e1115273e1..2e6f12e2bac8 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -8,6 +8,8 @@ #include #include #include +#include +#include #include "kvm_mm.h" @@ -29,6 +31,13 @@ static struct kvm_gmem_hugetlb *kvm_gmem_hgmem(struct inode *inode) return inode->i_mapping->i_private_data; } +static bool is_kvm_gmem_hugetlb(struct inode *inode) +{ + u64 flags = (u64)inode->i_private; + + return flags & KVM_GUEST_MEMFD_HUGETLB; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -58,6 +67,9 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; } +/** + * Use the uptodate flag to indicate that the folio is prepared for KVM's usage. + */ static inline void kvm_gmem_mark_prepared(struct folio *folio) { folio_mark_uptodate(folio); @@ -72,13 +84,18 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio) static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, struct folio *folio) { - unsigned long nr_pages, i; pgoff_t index; int r; - nr_pages = folio_nr_pages(folio); - for (i = 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); + if (folio_test_hugetlb(folio)) { + folio_zero_user(folio, folio->index << PAGE_SHIFT); + } else { + unsigned long nr_pages, i; + + nr_pages = folio_nr_pages(folio); + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + } /* * Preparing huge folios should always be safe, since it should @@ -103,6 +120,174 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, return r; } +static int kvm_gmem_get_mpol_node_nodemask(gfp_t gfp_mask, + struct mempolicy **mpol, + nodemask_t **nodemask) +{ + /* +* TODO: mempolicy would probably have to be stored on the inode, use +* task policy for now. +*/ + *mpol = get_task_policy(current); commit bbb0b86af11574516fe78bc1340f49c9e6b7e588 (HEAD -> my-gmem-hugetlb-rfc-v2) Author: Jun Miao Date: Wed Oct 30 11:07:16 2024 -0400 KVM: guest_memfd: add TDX numa policy in hugetlb support Support the numa policy in the gmem hugetlb. This function need the corresponding QEMU patch cooperate to work, and set the numa policy like this in qemu: "--object host-nodes=0,policy=bind". If no set in the Qemu, the policy uses current task policy for now. Signed-off-by: Jun Miao diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index a49631e47421..cf569fe0740d 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -91,6 +91,17 @@ static inline struct mempolicy *mpol_dup(struct mempolicy *pol) return pol; } +struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, + nodemask_t *nodes); + +int mpol_set_nodemask(struct mempolicy *pol, + const nodemask_t *nodes, struct nodemask_scratch *nsc); + +int sanitize_mpol_flags(int *mode, unsigned short *flags); + +int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask, + unsigned long maxnode); + static inline void mpol_get(struct mempolicy *pol) { if (pol) @@ -202,6 +213,25 @@ static inline void mpol_cond_put(struct mempolicy *pol) { } +struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, + nodemask_t *nodes); +{ +} + +int mpol_set_nodemask(struct mempolicy *pol, + const nodemask_t *nodes, struct nodemask_scratch *nsc); +{ +} + +int sanitize_mpol_flags(int *mode, unsigned short *flags); +{ +} + +int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask, + unsigned long maxnode); +{ +} + static inline void mpol_get(struct mempolicy *pol) { } diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 1f9bb10d1a47..6ba4eb0935de 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -24,6 +24,7 @@ enum {
Re: [PATCH v2 1/3] rust: kunit: add KUnit case and suite macros
On Wed, Oct 30, 2024 at 5:59 AM David Gow wrote: > > On Tue, 29 Oct 2024 at 20:08, Alice Ryhl wrote: > > > > On Tue, Oct 29, 2024 at 10:24 AM David Gow wrote: > > > > > > From: José Expósito > > > > > > Add a couple of Rust const functions and macros to allow to develop > > > KUnit tests without relying on generated C code: > > > > > > - The `kunit_unsafe_test_suite!` Rust macro is similar to the > > >`kunit_test_suite` C macro. It requires a NULL-terminated array of > > >test cases (see below). > > > - The `kunit_case` Rust function is similar to the `KUNIT_CASE` C macro. > > >It generates as case from the name and function. > > > - The `kunit_case_null` Rust function generates a NULL test case, which > > >is to be used as delimiter in `kunit_test_suite!`. > > > > > > While these functions and macros can be used on their own, a future > > > patch will introduce another macro to create KUnit tests using a > > > user-space like syntax. > > > > > > Signed-off-by: José Expósito > > > Co-developed-by: Matt Gilbride > > > Signed-off-by: Matt Gilbride > > > Co-developed-by: David Gow > > > Signed-off-by: David Gow > > > --- > > > > > > Changes since v1: > > > https://lore.kernel.org/lkml/20230720-rustbind-v1-1-c80db349e...@google.com/ > > > - Rebase on top of rust-next > > > - As a result, KUnit attributes are new set. These are hardcoded to the > > > defaults of "normal" speed and no module name. > > > - Split the kunit_case!() macro into two const functions, kunit_case() > > > and kunit_case_null() (for the NULL terminator). > > > > > > --- > > > rust/kernel/kunit.rs | 108 +++ > > > rust/kernel/lib.rs | 1 + > > > 2 files changed, 109 insertions(+) > > > > > > diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs > > > index 824da0e9738a..fc2d259db458 100644 > > > --- a/rust/kernel/kunit.rs > > > +++ b/rust/kernel/kunit.rs > > > @@ -161,3 +161,111 @@ macro_rules! kunit_assert_eq { > > > $crate::kunit_assert!($name, $file, $diff, $left == $right); > > > }}; > > > } > > > + > > > +/// Represents an individual test case. > > > +/// > > > +/// The test case should have the signature > > > +/// `unsafe extern "C" fn test_case(test: *mut crate::bindings::kunit)`. > > > +/// > > > +/// The `kunit_unsafe_test_suite!` macro expects a NULL-terminated list > > > of test cases. > > > +/// Use `kunit_case_null` to generate such a delimeter. > > > +const fn kunit_case(name: &kernel::str::CStr, run_case: unsafe extern > > > "C" fn(*mut kernel::bindings::kunit)) -> kernel::bindings::kunit_case { > > > > This should probably say `name: &'static CStr` to require that the > > name lives forever. > > Fixed in v3, thanks. > > > > +/// Registers a KUnit test suite. > > > +/// > > > +/// # Safety > > > +/// > > > +/// `test_cases` must be a NULL terminated array of test cases. > > > +/// > > > +/// # Examples > > > +/// > > > +/// ```ignore > > > +/// unsafe extern "C" fn test_fn(_test: *mut crate::bindings::kunit) { > > > +/// let actual = 1 + 1; > > > +/// let expected = 2; > > > +/// assert_eq!(actual, expected); > > > +/// } > > > +/// > > > +/// static mut KUNIT_TEST_CASE: crate::bindings::kunit_case = > > > crate::kunit_case(name, test_fn); > > > +/// static mut KUNIT_NULL_CASE: crate::bindings::kunit_case = > > > crate::kunit_case_null(); > > > +/// static mut KUNIT_TEST_CASES: &mut[crate::bindings::kunit_case] = > > > unsafe { > > > +/// &mut[KUNIT_TEST_CASE, KUNIT_NULL_CASE] > > > +/// }; > > > +/// crate::kunit_unsafe_test_suite!(suite_name, KUNIT_TEST_CASES); > > > +/// ``` > > > +#[macro_export] > > > +macro_rules! kunit_unsafe_test_suite { > > > +($name:ident, $test_cases:ident) => { > > > +const _: () = { > > > +static KUNIT_TEST_SUITE_NAME: [i8; 256] = { > > > +let name_u8 = core::stringify!($name).as_bytes(); > > > +let mut ret = [0; 256]; > > > + > > > +let mut i = 0; > > > +while i < name_u8.len() { > > > +ret[i] = name_u8[i] as i8; > > > +i += 1; > > > +} > > > > I assume the name must be zero-terminated? If so, you probably need to > > enforce that somehow, e.g. by failing if `name_u8` is longer than 255 > > bytes. > > Nice catch. I'm not sure how to nicely throw a compile time error in > this function, so I'm truncating it here and doing a compile error in > the macro in patch #2. This isn't ideal, but seems to work. You should be able to just panic! if it happens. Also, I believe it should be i < 255 instead of 256? Alice
Re: [PATCH] scripts: Remove export_report.pl
On Tue, Oct 29, 2024 at 10:12 PM Matthew Maurer wrote: > > This script has been broken for 5 years with no user complaints. > > It first had its .mod.c parser broken in commit a3d0cb04f7df ("modpost: > use __section in the output to *.mod.c"). Later, it had its object file > enumeration broken in commit f65a486821cf ("kbuild: change module.order > to list *.o instead of *.ko"). Both of these changes sat for years with > no reports. > > Rather than reviving this script as we make further changes to `.mod.c`, > this patch gets rid of it because it is clearly unused. > > Signed-off-by: Matthew Maurer > --- > scripts/export_report.pl | 186 > --- > 1 file changed, 186 deletions(-) The top-level Makefile should be cleaned up. $ git grep export_report -- Makefile Makefile: @echo ' export_report - List the usages of all exported symbols' Makefile:PHONY += includecheck versioncheck coccicheck export_report Makefile:export_report: Makefile: $(PERL) $(srctree)/scripts/export_report.pl -- Best Regards Masahiro Yamada
Re: [PATCH] selftest/hid: increase timeout to 10 min
On Oct 29 2024, Shuah Khan wrote: > On 10/29/24 03:07, Yun Lu wrote: > > If running hid testcases with command "./run_kselftest.sh -c hid", > > NIT - When inestead of "If" > > the following tests will take longer than the kselftest framework > > timeout (now 200 seconds) to run and thus got terminated with TIMEOUT > > error: > > > >hid-multitouch.sh - took about 6min41s > >hid-tablet.sh - took about 6min30s > > > > Increase the timeout setting to 10 minutes to allow them have a chance > > to finish. > > > > Cc: sta...@vger.kernel.org > > Signed-off-by: Yun Lu > > --- > > tools/testing/selftests/hid/settings | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/testing/selftests/hid/settings > > b/tools/testing/selftests/hid/settings > > index b3cbfc521b10..dff0d947f9c2 100644 > > --- a/tools/testing/selftests/hid/settings > > +++ b/tools/testing/selftests/hid/settings > > @@ -1,3 +1,3 @@ > > # HID tests can be long, so give a little bit more time > > # to them > > -timeout=200 > > +timeout=600 > > Okay - maybe this test shouldn't be part of the default run if it needs > 600 seconds to run? Agree, but is it possible to be more granular? FWIW, there are some tests that are quick in the hid subtree that should probably be run regularly. For instance, the hid_bpf program is interesting to be run when there are bpf changes, because it's relying on the bpf architecture. I think hid-core.sh, hid-mouse,sh and hid-keyboard.sh are fast enough and should also be integrated in the default runs. hid-multitouch, hid-wacom, hid-sony are definitely taking a lot of time, so they should probbaly be run only when there are changes in those areas, i.e., not by default. Cheers, Benjamin > > thanks, > -- Shuah
Re: [PATCH v5 12/13] Documentation: userspace-api: iommufd: Update vIOMMU
On Fri, Oct 25, 2024 at 04:49:52PM -0700, Nicolin Chen wrote: > With the introduction of the new object and its infrastructure, update the > doc to reflect that and add a new graph. > > Reviewed-by: Jason Gunthorpe > Reviewed-by: Kevin Tian > Signed-off-by: Nicolin Chen > --- > Documentation/userspace-api/iommufd.rst | 69 - > 1 file changed, 68 insertions(+), 1 deletion(-) > > diff --git a/Documentation/userspace-api/iommufd.rst > b/Documentation/userspace-api/iommufd.rst > index 2deba93bf159..92d16efad5b0 100644 > --- a/Documentation/userspace-api/iommufd.rst > +++ b/Documentation/userspace-api/iommufd.rst > @@ -63,6 +63,37 @@ Following IOMMUFD objects are exposed to userspace: >space usually has mappings from guest-level I/O virtual addresses to guest- >level physical addresses. > > + - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, I just found an extra space at this entire paragraph, resulting in the VIOMMU/VDEVICE chunks right-shifted by that one space... Will fix this in v6 and confirm with htmldocs. Also, I think I forgot to CC doc folks/maillist.. will do that too. Thanks Nicolin
Re: [PATCH v7 1/3] modules: Support extended MODVERSIONS info
Matthew Maurer writes: >> Sorry I realise it's version 7, but although the above looks correct it's >> kind of dense. >> >> I think the below would also work and is (I think) easier to follow, and >> is more obviously similar to the existing code. I'm sure your version is >> faster, but I don't think it's that performance critical. >> >> static void dedotify_ext_version_names(char *str_seq, unsigned long size) >> { >> char *end = str_seq + size; >> char *p = str_seq; >> >> while (p < end) { >> if (*p == '.') >> memmove(p, p + 1, end - p - 1); >> >> p += strlen(p) + 1; >> } >> } >> >> The tail of str_seq will be filled with nulls as long as the last string >> was null terminated. > > As you alluded to, what you're providing is potentially O(n^2) in the > number of symbols a module depends on - the existing code is O(n). > If leading dots on names are rare, this is probably fine. If they're > common, this will potentially make loading modules with a large number > of imported symbols actually take a measurable amount of additional > time. It should only be a single symbol these days, .TOC., for both big and little endian builds. But maybe someone out there is still building their kernel ELFv1, in which case every function will begin with '.'. I still don't think it will be measurable, but n^2 is asking for trouble. So forget it, just use your version, you've already written it anyway. Sorry for the noise. cheers
Re: [PATCH v3 2/3] rust: macros: add macro to easily run KUnit tests
On Tue, Oct 29, 2024 at 10:11:38PM -0700, Boqun Feng wrote: [...] > > + > > +let new_body: TokenStream = vec![body.stream(), > > kunit_macros.parse().unwrap()] > > +.into_iter() > > +.collect(); > > + > > +// Remove the `#[test]` macros. > > +let new_body = new_body.to_string().replace("#[test]", ""); > > Yeah, I want to see how you do it this time ;-) So if you do a > `.to_string()` on a `TokenStream`, you lose all the span [1] information > ("span information" is a term invited by me, hope I get it right ;-)) Not important: I meant I am not a procdure macro expert, hope "span information" is what is used when discussing span preservation ;-) Regards, Boqun > e.g. if there is a compile error in the test code, the compiler cannot > report the exact line of the error, it can only report there is an > error. > [...]
Re: [PATCH v3 2/3] rust: macros: add macro to easily run KUnit tests
On Wed, Oct 30, 2024 at 12:57:14PM +0800, David Gow wrote: > From: José Expósito > Add a new procedural macro (`#[kunit_tests(kunit_test_suit_name)]`) to > run KUnit tests using a user-space like syntax. > > The macro, that should be used on modules, transforms every `#[test]` > in a `kunit_case!` and adds a `kunit_unsafe_test_suite!` registering > all of them. > > The only difference with user-space tests is that instead of using > `#[cfg(test)]`, `#[kunit_tests(kunit_test_suit_name)]` is used. > > Note that `#[cfg(CONFIG_KUNIT)]` is added so the test module is not > compiled when `CONFIG_KUNIT` is set to `n`. > > Reviewed-by: David Gow > Signed-off-by: José Expósito > [Updated to use new const fn.] > Signed-off-by: David Gow > --- > > Changes since v2: > https://lore.kernel.org/linux-kselftest/20241029092422.2884505-3-david...@google.com/ > - Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!) > - The proc macro now emits an error if the suite name is too long. > > Changes since v1: > https://lore.kernel.org/lkml/20230720-rustbind-v1-2-c80db349e...@google.com/ > - Rebased on top of rust-next > - Make use of the new const functions, rather than the kunit_case!() > macro. > > --- > MAINTAINERS | 1 + > rust/kernel/kunit.rs | 11 > rust/macros/kunit.rs | 153 +++ > rust/macros/lib.rs | 29 > 4 files changed, 194 insertions(+) > create mode 100644 rust/macros/kunit.rs > > diff --git a/MAINTAINERS b/MAINTAINERS > index b77f4495dcf4..b65035ede675 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -12433,6 +12433,7 @@ F:Documentation/dev-tools/kunit/ > F: include/kunit/ > F: lib/kunit/ > F: rust/kernel/kunit.rs > +F: rust/macros/kunit.rs > F: scripts/rustdoc_test_* > F: tools/testing/kunit/ > > diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs > index 27bc4139d352..ac296467a552 100644 > --- a/rust/kernel/kunit.rs > +++ b/rust/kernel/kunit.rs > @@ -40,6 +40,8 @@ pub fn info(args: fmt::Arguments<'_>) { > } > } > > +use macros::kunit_tests; > + > /// Asserts that a boolean expression is `true` at runtime. > /// > /// Public but hidden since it should only be used from generated tests. > @@ -269,3 +271,12 @@ macro_rules! kunit_unsafe_test_suite { > }; > }; > } > + > +#[kunit_tests(rust_kernel_kunit)] > +mod tests { > +#[test] > +fn rust_test_kunit_kunit_tests() { > +let running = true; > +assert_eq!(running, true); > +} > +} > diff --git a/rust/macros/kunit.rs b/rust/macros/kunit.rs > new file mode 100644 > index ..850d268cc96a > --- /dev/null > +++ b/rust/macros/kunit.rs > @@ -0,0 +1,153 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +//! Procedural macro to run KUnit tests using a user-space like syntax. > +//! > +//! Copyright (c) 2023 José Expósito > + > +use proc_macro::{Delimiter, Group, TokenStream, TokenTree}; > +use std::fmt::Write; > + > +pub(crate) fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream > { > +if attr.to_string().is_empty() { > +panic!("Missing test name in #[kunit_tests(test_name)] macro") > +} > + > +if attr.to_string().as_bytes().len() > 255 { > +panic!("The test suite name `{}` exceeds the maximum length of 255 > bytes.", attr) > +} > + > +let mut tokens: Vec<_> = ts.into_iter().collect(); > + > +// Scan for the "mod" keyword. > +tokens > +.iter() > +.find_map(|token| match token { > +TokenTree::Ident(ident) => match ident.to_string().as_str() { > +"mod" => Some(true), > +_ => None, > +}, > +_ => None, > +}) > +.expect("#[kunit_tests(test_name)] attribute should only be applied > to modules"); > + > +// Retrieve the main body. The main body should be the last token tree. > +let body = match tokens.pop() { > +Some(TokenTree::Group(group)) if group.delimiter() == > Delimiter::Brace => group, > +_ => panic!("cannot locate main body of module"), > +}; > + > +// Get the functions set as tests. Search for `[test]` -> `fn`. > +let mut body_it = body.stream().into_iter(); > +let mut tests = Vec::new(); > +while let Some(token) = body_it.next() { > +match token { > +TokenTree::Group(ident) if ident.to_string() == "[test]" => > match body_it.next() { > +Some(TokenTree::Ident(ident)) if ident.to_string() == "fn" > => { > +let test_name = match body_it.next() { > +Some(TokenTree::Ident(ident)) => ident.to_string(), > +_ => continue, > +}; > +tests.push(test_name); > +} > +_ => continue, > +}, > +_ => (), > +} > +} > + > +// Add `#[cfg(CONFIG_KUNIT)]` before the module declaration. > +
Re: [PATCH v2 2/3] rust: macros: add macro to easily run KUnit tests
On Tue, 29 Oct 2024 at 20:28, Boqun Feng wrote: > > Hi David, > > On Tue, Oct 29, 2024 at 05:24:18PM +0800, David Gow wrote: > > From: José Expósito > > > > Add a new procedural macro (`#[kunit_tests(kunit_test_suit_name)]`) to > > run KUnit tests using a user-space like syntax. > > > > The macro, that should be used on modules, transforms every `#[test]` > > in a `kunit_case!` and adds a `kunit_unsafe_test_suite!` registering > > all of them. > > > > The only difference with user-space tests is that instead of using > > `#[cfg(test)]`, `#[kunit_tests(kunit_test_suit_name)]` is used. > > > > Note that `#[cfg(CONFIG_KUNIT)]` is added so the test module is not > > compiled when `CONFIG_KUNIT` is set to `n`. > > > > Reviewed-by: David Gow > > Signed-off-by: José Expósito > > [Updated to use new const fn.] > > Signed-off-by: David Gow > > --- > > > > Changes since v1: > > https://lore.kernel.org/lkml/20230720-rustbind-v1-2-c80db349e...@google.com/ > > - Rebased on top of rust-next > > - Make use of the new const functions, rather than the kunit_case!() > > macro. > > > > --- > > MAINTAINERS | 1 + > > rust/kernel/kunit.rs | 11 +++ > > rust/macros/lib.rs | 29 + > > 3 files changed, 41 insertions(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index b77f4495dcf4..b65035ede675 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -12433,6 +12433,7 @@ F:Documentation/dev-tools/kunit/ > > F: include/kunit/ > > F: lib/kunit/ > > F: rust/kernel/kunit.rs > > +F: rust/macros/kunit.rs > > I cannot find this file in this series, and it's not in the current > rust-next either. > > (Did you do the same thing as I sometimes did: forget to `git add` a new > file?) > Whoops! Thanks for catching this: I somehow managed to unstage it while editing things. Fixed in v3: https://lore.kernel.org/linux-kselftest/20241030045719.3085147-6-david...@google.com/ Cheers, -- David smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v2 1/3] rust: kunit: add KUnit case and suite macros
On Tue, 29 Oct 2024 at 20:08, Alice Ryhl wrote: > > On Tue, Oct 29, 2024 at 10:24 AM David Gow wrote: > > > > From: José Expósito > > > > Add a couple of Rust const functions and macros to allow to develop > > KUnit tests without relying on generated C code: > > > > - The `kunit_unsafe_test_suite!` Rust macro is similar to the > >`kunit_test_suite` C macro. It requires a NULL-terminated array of > >test cases (see below). > > - The `kunit_case` Rust function is similar to the `KUNIT_CASE` C macro. > >It generates as case from the name and function. > > - The `kunit_case_null` Rust function generates a NULL test case, which > >is to be used as delimiter in `kunit_test_suite!`. > > > > While these functions and macros can be used on their own, a future > > patch will introduce another macro to create KUnit tests using a > > user-space like syntax. > > > > Signed-off-by: José Expósito > > Co-developed-by: Matt Gilbride > > Signed-off-by: Matt Gilbride > > Co-developed-by: David Gow > > Signed-off-by: David Gow > > --- > > > > Changes since v1: > > https://lore.kernel.org/lkml/20230720-rustbind-v1-1-c80db349e...@google.com/ > > - Rebase on top of rust-next > > - As a result, KUnit attributes are new set. These are hardcoded to the > > defaults of "normal" speed and no module name. > > - Split the kunit_case!() macro into two const functions, kunit_case() > > and kunit_case_null() (for the NULL terminator). > > > > --- > > rust/kernel/kunit.rs | 108 +++ > > rust/kernel/lib.rs | 1 + > > 2 files changed, 109 insertions(+) > > > > diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs > > index 824da0e9738a..fc2d259db458 100644 > > --- a/rust/kernel/kunit.rs > > +++ b/rust/kernel/kunit.rs > > @@ -161,3 +161,111 @@ macro_rules! kunit_assert_eq { > > $crate::kunit_assert!($name, $file, $diff, $left == $right); > > }}; > > } > > + > > +/// Represents an individual test case. > > +/// > > +/// The test case should have the signature > > +/// `unsafe extern "C" fn test_case(test: *mut crate::bindings::kunit)`. > > +/// > > +/// The `kunit_unsafe_test_suite!` macro expects a NULL-terminated list of > > test cases. > > +/// Use `kunit_case_null` to generate such a delimeter. > > +const fn kunit_case(name: &kernel::str::CStr, run_case: unsafe extern "C" > > fn(*mut kernel::bindings::kunit)) -> kernel::bindings::kunit_case { > > This should probably say `name: &'static CStr` to require that the > name lives forever. Fixed in v3, thanks. > > +/// Registers a KUnit test suite. > > +/// > > +/// # Safety > > +/// > > +/// `test_cases` must be a NULL terminated array of test cases. > > +/// > > +/// # Examples > > +/// > > +/// ```ignore > > +/// unsafe extern "C" fn test_fn(_test: *mut crate::bindings::kunit) { > > +/// let actual = 1 + 1; > > +/// let expected = 2; > > +/// assert_eq!(actual, expected); > > +/// } > > +/// > > +/// static mut KUNIT_TEST_CASE: crate::bindings::kunit_case = > > crate::kunit_case(name, test_fn); > > +/// static mut KUNIT_NULL_CASE: crate::bindings::kunit_case = > > crate::kunit_case_null(); > > +/// static mut KUNIT_TEST_CASES: &mut[crate::bindings::kunit_case] = > > unsafe { > > +/// &mut[KUNIT_TEST_CASE, KUNIT_NULL_CASE] > > +/// }; > > +/// crate::kunit_unsafe_test_suite!(suite_name, KUNIT_TEST_CASES); > > +/// ``` > > +#[macro_export] > > +macro_rules! kunit_unsafe_test_suite { > > +($name:ident, $test_cases:ident) => { > > +const _: () = { > > +static KUNIT_TEST_SUITE_NAME: [i8; 256] = { > > +let name_u8 = core::stringify!($name).as_bytes(); > > +let mut ret = [0; 256]; > > + > > +let mut i = 0; > > +while i < name_u8.len() { > > +ret[i] = name_u8[i] as i8; > > +i += 1; > > +} > > I assume the name must be zero-terminated? If so, you probably need to > enforce that somehow, e.g. by failing if `name_u8` is longer than 255 > bytes. Nice catch. I'm not sure how to nicely throw a compile time error in this function, so I'm truncating it here and doing a compile error in the macro in patch #2. This isn't ideal, but seems to work. > > + > > +static mut KUNIT_TEST_SUITE: > > core::cell::UnsafeCell<$crate::bindings::kunit_suite> = > > You don't actually need UnsafeCell here since it's already `static mut`. > Thanks, this works. > > +core::cell::UnsafeCell::new($crate::bindings::kunit_suite { > > +name: KUNIT_TEST_SUITE_NAME, > > +// SAFETY: User is expected to pass a correct > > `test_cases`, hence this macro > > +// named 'unsafe'. > > +test_cases: unsafe { $test_cases.as_mut_ptr() }, > > +suite_init: None, > > +suite_exit: None, > > +init: None, > > +
Re: [PATCH v5 06/13] iommu: Add iommu_copy_struct_from_full_user_array helper
On Tue, Oct 29, 2024 at 08:24:52AM +, Tian, Kevin wrote: > > From: Nicolin Chen > > Sent: Saturday, October 26, 2024 7:51 AM > > > > From: Jason Gunthorpe > > > > The iommu_copy_struct_from_user_array helper can be used to copy a > > single > > entry from a user array which might not be efficient if the array is big. > > > > Add a new iommu_copy_struct_from_full_user_array to copy the entire user > > array at once. Update the existing iommu_copy_struct_from_user_array > > kdoc > > accordingly. > > what about: > > iommu_copy_struct_from_user_array_single() > iommu_copy_struct_from_user_array_full() I am okay with that, yet might prefer that in a separate patch? As I am trying to reduce the number of changes in the series since we are likely merging three series at once :) Thanks Nicolin > > Signed-off-by: Jason Gunthorpe > > Signed-off-by: Nicolin Chen > > Reviewed-by: Kevin Tian
Re: [PATCH v5 03/13] iommufd: Add iommufd_verify_unfinalized_object
On Tue, Oct 29, 2024 at 03:55:58PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 29, 2024 at 09:18:05AM -0700, Nicolin Chen wrote: > > I think we'd need the same change in iommufd_object_abort() too. > > Makes sense I found xa_cmpxchg() does xas_result to its returning value, which turns XA_ZERO_ENTRY into NULL failing our intended verifications. So, I replaced that further with xas_store: - @@ -41,20 +41,26 @@ static struct miscdevice vfio_misc_dev; void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old; - old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); - /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, obj); + xa_unlock(&ictx->objects); + /* obj->id was returned from xa_alloc() so the xas_store() cannot fail */ + WARN_ON(old != XA_ZERO_ENTRY); } /* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) { + XA_STATE(xas, &ictx->objects, obj->id); void *old; - old = xa_erase(&ictx->objects, obj->id); - WARN_ON(old); + xa_lock(&ictx->objects); + old = xas_store(&xas, NULL); + xa_unlock(&ictx->objects); + WARN_ON(old != XA_ZERO_ENTRY); kfree(obj); } - Thanks Nicolin
Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()
On Tue, 2024-10-29 at 09:29 -0700, Paul E. McKenney wrote: > External email : Please do not click links or open attachments until you have > verified the sender or the content. > > > On Tue, Oct 29, 2024 at 02:20:51AM +, Cheng-Jui Wang (王正睿) wrote: > > On Mon, 2024-10-28 at 17:22 -0700, Paul E. McKenney wrote: > > > The result is that the current leaf rcu_node structure's ->lock is > > > acquired only if a stack backtrace might be needed from the current CPU, > > > and is held across only that CPU's backtrace. As a result, if there are > > > > After upgrading our device to kernel-6.11, we encountered a lockup > > scenario under stall warning. > > I had prepared a patch to submit, but I noticed that this series has > > already addressed some issues, though it hasn't been merged into the > > mainline yet. So, I decided to reply to this series for discussion on > > how to fix it before pushing. Here is the lockup scenario We > > encountered: > > > > Devices: arm64 with only 8 cores > > One CPU holds rnp->lock in rcu_dump_cpu_stack() while trying to dump > > other CPUs, but it waits for the corresponding CPU to dump backtrace, > > with a 10-second timeout. > > > >__delay() > >__const_udelay() > >nmi_trigger_cpumask_backtrace() > >arch_trigger_cpumask_backtrace() > >trigger_single_cpu_backtrace() > >dump_cpu_task() > >rcu_dump_cpu_stacks() <- holding rnp->lock > >print_other_cpu_stall() > >check_cpu_stall() > >rcu_pending() > >rcu_sched_clock_irq() > >update_process_times() > > > > However, the other 7 CPUs are waiting for rnp->lock on the path to > > report qs. > > > >queued_spin_lock_slowpath() > >queued_spin_lock() > >do_raw_spin_lock() > >__raw_spin_lock_irqsave() > >_raw_spin_lock_irqsave() > >rcu_report_qs_rdp() > >rcu_check_quiescent_state() > >rcu_core() > >rcu_core_si() > >handle_softirqs() > >__do_softirq() > >do_softirq() > >call_on_irq_stack() > > > > Since the arm64 architecture uses IPI instead of true NMI to implement > > arch_trigger_cpumask_backtrace(), spin_lock_irqsave disables > > interrupts, which is enough to block this IPI request. > > Therefore, if other CPUs start waiting for the lock before receiving > > the IPI, a semi-deadlock scenario like the following occurs: > > > > CPU0CPU1CPU2 > > - - - > > lock_irqsave(rnp->lock) > > lock_irqsave(rnp->lock) > > > > > > > > lock_irqsave(rnp->lock) > > > > > > > > ... > > > > > > In our scenario, with 7 CPUs to dump, the lockup takes nearly 70 > > seconds, causing subsequent useful logs to be unable to print, leading > > to a watchdog timeout and system reboot. > > > > This series of changes re-acquires the lock after each dump, > > significantly reducing lock-holding time. However, since it still holds > > the lock while dumping CPU backtrace, there's still a chance for two > > CPUs to wait for each other for 10 seconds, which is still too long. > > So, I would like to ask if it's necessary to dump backtrace within the > > spinlock section? > > If not, especially now that lockless checks are possible, maybe it can > > be changed as follows? > > > > - if (!(data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, > > cpu))) > > - continue; > > - raw_spin_lock_irqsave_rcu_node(rnp, flags); > > - if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) { > > + if (data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, > > cpu)) { > > if (cpu_is_offline(cpu)) > > pr_err("Offline CPU %d blocking > > current GP.\n", cpu); > > else > > dump_cpu_task(cpu); > > } > > } > > - raw_spin_unlock_irqrestore_rcu_node(rnp, > > flags); > > > > Or should this be considered an arm64 issue, and they should switch to > > true N
Re: [PATCH net-next v5 03/12] selftests: ncdevmem: Unify error handling
On 10/29, Jakub Kicinski wrote: > On Wed, 23 Oct 2024 08:43:53 -0700 Stanislav Fomichev wrote: > > ret = inet_pton(server_sin.sin_family, server_ip, &server_sin.sin_addr); > > - if (socket < 0) > > - error(79, 0, "%s: [FAIL, create socket]\n", TEST_PREFIX); > > + if (ret < 0) > > + error(1, pton, "%s: [FAIL, create socket]\n", TEST_PREFIX); > > Looks like sched_ext broke our build_tools check, I think I pushed a > fix, but I also see here: > > ncdevmem.c: In function ‘do_server’: > ncdevmem.c:343:26: error: ‘pton’ undeclared (first use in this function) > 343 | error(1, pton, "%s: [FAIL, create socket]\n", > TEST_PREFIX); > | ^~~~ > -- > pw-bot: cr Oh, thanks, will double check. This looks silly.
RE: [PATCH v5 04/13] iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested
> From: Jason Gunthorpe > Sent: Wednesday, October 30, 2024 12:05 AM > > On Tue, Oct 29, 2024 at 08:22:43AM +, Tian, Kevin wrote: > > > From: Nicolin Chen > > > Sent: Saturday, October 26, 2024 7:51 AM > > > > > > @@ -302,7 +302,9 @@ iommufd_viommu_alloc_hwpt_nested(struct > > > iommufd_viommu *viommu, u32 flags, > > > } > > > hwpt->domain->owner = viommu->iommu_dev->ops; > > > > > > - if (WARN_ON_ONCE(hwpt->domain->type != > > > IOMMU_DOMAIN_NESTED)) { > > > + if (WARN_ON_ONCE(hwpt->domain->type != > > > IOMMU_DOMAIN_NESTED || > > > + (!viommu->ops->cache_invalidate && > > > + !hwpt->domain->ops->cache_invalidate_user))) { > > > rc = -EINVAL; > > > goto out_abort; > > > } > > > > According to patch5, cache invalidate in viommu only uses > > viommu->ops->cache_invalidate. Is here intended to allow > > nested hwpt created via viommu to still support the old > > method? > > I think that is reasonable? > Yes, just want to confirm. Reviewed-by: Kevin Tian
Re: [PATCH] selftest/hid: increase timeout to 10 min
On 10/29/24 03:07, Yun Lu wrote: If running hid testcases with command "./run_kselftest.sh -c hid", NIT - When inestead of "If" the following tests will take longer than the kselftest framework timeout (now 200 seconds) to run and thus got terminated with TIMEOUT error: hid-multitouch.sh - took about 6min41s hid-tablet.sh - took about 6min30s Increase the timeout setting to 10 minutes to allow them have a chance to finish. Cc: sta...@vger.kernel.org Signed-off-by: Yun Lu --- tools/testing/selftests/hid/settings | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/hid/settings b/tools/testing/selftests/hid/settings index b3cbfc521b10..dff0d947f9c2 100644 --- a/tools/testing/selftests/hid/settings +++ b/tools/testing/selftests/hid/settings @@ -1,3 +1,3 @@ # HID tests can be long, so give a little bit more time # to them -timeout=200 +timeout=600 Okay - maybe this test shouldn't be part of the default run if it needs 600 seconds to run? thanks, -- Shuah
Re: [PATCH for-next 7/7] selftests/net: Fix ./ns-XXXXXX not cleanup
On 30/10/2024 07:43, Jakub Kicinski wrote: > On Fri, 25 Oct 2024 09:40:10 +0800 Li Zhijian wrote: >> ``` >> readonly STATS="$(mktemp -p /tmp ns-XX)" >> readonly BASE=`basename $STATS` >> ``` >> It could be a mistake to write to $BASE rather than $STATS, where $STATS >> is used to save the NSTAT_HISTORY and it will be cleaned up before exit. > > Agreed, although since we've been creating the wrong file this whole > time and everything worked >-- should we just just delete those two lines completely? Yes, it also works. > > Similarly to patch 6 - please repost as a standalone patch so that our > CI will test it. If you only CC a mailing list on subset of patches > they are likely to be ignored by automation.. Got it Thanks Zhijian