Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
Hi Dmitry, On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov wrote: > > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > However this configuration is largely static and common between > different platforms. Provide in-kernel service implementing static > per-platform data. > > Unlike previous revisions of the patchset, this iteration uses static > configuration per platform, rather than building it dynamically from the > list of DSPs being started. > > To: Bjorn Andersson > To: Konrad Dybcio > To: Sibi Sankar > To: Mathieu Poirier > Cc: linux-arm-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-remotep...@vger.kernel.org > Cc: Johan Hovold > Cc: Xilin Wu > Cc: "Bryan O'Donoghue" > -- > > Changes in v7: > - Fixed modular build (Steev) > - Link to v6: > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > Changes in v6: > - Reworked mutex to fix lockdep issue on deregistration > - Fixed dependencies between PD-mapper and remoteproc to fix modular > builds (Krzysztof) > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > - Fixed kerneldocs (Krzysztof) > - Removed extra pr_debug messages (Krzysztof) > - Fixed wcss build (Krzysztof) > - Added platforms which do not require protection domain mapping to > silence the notice on those platforms > - Link to v5: > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > Changes in v5: > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > Lew) > - pd_mapper: reworked to provide static configuration per platform > (Bjorn) > - Link to v4: > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > Changes in v4: > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > - Added configuration for sm6350 (Thanks to Luca) > - Removed RFC tag (Konrad) > - Link to v3: > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > Changes in RFC v3: > - Send start / stop notifications when PD-mapper domain list is changed > - Reworked the way PD-mapper treats protection domains, register all of > them in a single batch > - Added SC7180 domains configuration based on TCL Book 14 GO > - Link to v2: > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > Changes in RFC v2: > - Swapped num_domains / domains (Konrad) > - Fixed an issue with battery not working on sc8280xp > - Added missing configuration for QCS404 > > --- > Dmitry Baryshkov (6): > soc: qcom: pdr: protect locator_addr with the main mutex > soc: qcom: pdr: fix parsing of domains lists > soc: qcom: pdr: extract PDR message marshalling data > soc: qcom: qmi: add a way to remove running service > soc: qcom: add pd-mapper implementation > remoteproc: qcom: enable in-kernel PD mapper > > drivers/remoteproc/Kconfig | 4 + > drivers/remoteproc/qcom_q6v5_adsp.c | 11 +- > drivers/remoteproc/qcom_q6v5_mss.c | 10 +- > drivers/remoteproc/qcom_q6v5_pas.c | 12 +- > drivers/remoteproc/qcom_q6v5_wcss.c | 12 +- > drivers/soc/qcom/Kconfig| 14 + > drivers/soc/qcom/Makefile | 2 + > drivers/soc/qcom/pdr_interface.c| 6 +- > drivers/soc/qcom/pdr_internal.h | 318 ++--- > drivers/soc/qcom/qcom_pd_mapper.c | 656 > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > drivers/soc/qcom/qmi_interface.c| 67 > include/linux/soc/qcom/pd_mapper.h | 28 ++ > include/linux/soc/qcom/qmi.h| 2 + > 14 files changed, 1193 insertions(+), 302 deletions(-) > --- > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > Best regards, > -- > Dmitry Baryshkov > > I've tested this series over a large number of reboots, and the p-d devices(?) do always seem to come up (with the pd-mapper service disabled) on my Thinkpad X13s. One less service to run in userland! Tested-by: Steev Klimaszewski
Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
On Thu, 25 Apr 2024 at 10:08, Steev Klimaszewski wrote: > > Hi Dmitry, > > On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov > wrote: > > > > Protection domain mapper is a QMI service providing mapping between > > 'protection domains' and services supported / allowed in these domains. > > For example such mapping is required for loading of the WiFi firmware or > > for properly starting up the UCSI / altmode / battery manager support. > > > > The existing userspace implementation has several issue. It doesn't play > > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > > firmware location is changed (or if the firmware was not available at > > the time pd-mapper was started but the corresponding directory is > > mounted later), etc. > > > > However this configuration is largely static and common between > > different platforms. Provide in-kernel service implementing static > > per-platform data. > > > > Unlike previous revisions of the patchset, this iteration uses static > > configuration per platform, rather than building it dynamically from the > > list of DSPs being started. > > > > To: Bjorn Andersson > > To: Konrad Dybcio > > To: Sibi Sankar > > To: Mathieu Poirier > > Cc: linux-arm-...@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: linux-remotep...@vger.kernel.org > > Cc: Johan Hovold > > Cc: Xilin Wu > > Cc: "Bryan O'Donoghue" > > -- > > > > Changes in v7: > > - Fixed modular build (Steev) > > - Link to v6: > > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > > > Changes in v6: > > - Reworked mutex to fix lockdep issue on deregistration > > - Fixed dependencies between PD-mapper and remoteproc to fix modular > > builds (Krzysztof) > > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > > - Fixed kerneldocs (Krzysztof) > > - Removed extra pr_debug messages (Krzysztof) > > - Fixed wcss build (Krzysztof) > > - Added platforms which do not require protection domain mapping to > > silence the notice on those platforms > > - Link to v5: > > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > > > Changes in v5: > > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > > Lew) > > - pd_mapper: reworked to provide static configuration per platform > > (Bjorn) > > - Link to v4: > > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > > > Changes in v4: > > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > > - Added configuration for sm6350 (Thanks to Luca) > > - Removed RFC tag (Konrad) > > - Link to v3: > > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > > > Changes in RFC v3: > > - Send start / stop notifications when PD-mapper domain list is changed > > - Reworked the way PD-mapper treats protection domains, register all of > > them in a single batch > > - Added SC7180 domains configuration based on TCL Book 14 GO > > - Link to v2: > > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > > > Changes in RFC v2: > > - Swapped num_domains / domains (Konrad) > > - Fixed an issue with battery not working on sc8280xp > > - Added missing configuration for QCS404 > > > > --- > > Dmitry Baryshkov (6): > > soc: qcom: pdr: protect locator_addr with the main mutex > > soc: qcom: pdr: fix parsing of domains lists > > soc: qcom: pdr: extract PDR message marshalling data > > soc: qcom: qmi: add a way to remove running service > > soc: qcom: add pd-mapper implementation > > remoteproc: qcom: enable in-kernel PD mapper > > > > drivers/remoteproc/Kconfig | 4 + > > drivers/remoteproc/qcom_q6v5_adsp.c | 11 +- > > drivers/remoteproc/qcom_q6v5_mss.c | 10 +- > > drivers/remoteproc/qcom_q6v5_pas.c | 12 +- > > drivers/remoteproc/qcom_q6v5_wcss.c | 12 +- > > drivers/soc/qcom/Kconfig| 14 + > > drivers/soc/qcom/Makefile | 2 + > > drivers/soc/qcom/pdr_interface.c| 6 +- > > drivers/soc/qcom/pdr_internal.h | 318 ++--- > > drivers/soc/qcom/qcom_pd_mapper.c | 656 > > > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > > drivers/soc/qcom/qmi_interface.c| 67 > > include/linux/soc/qcom/pd_mapper.h | 28 ++ > > include/linux/soc/qcom/qmi.h| 2 + > > 14 files changed, 1193 insertions(+), 302 deletions(-) > > --- > > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 > > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > > > Best regards, > > -- > > Dmitry Baryshkov > > > > > I've tested this series over a large number of reboots, and the p-d > devices(?) do always seem to come up (with the pd-mapper service > disabled) on my Thinkpad X13s. One less service to run in userland! > Tested-by: Steev Klimaszewski Thank you! -- With best wishes Dmitry
[syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
Hello, syzbot found the following issue on: HEAD commit:977b1ef51866 Merge tag 'block-6.9-20240420' of git://git.k.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=17080d2098 kernel config: https://syzkaller.appspot.com/x/.config?x=f47e5e015c177e57 dashboard link: https://syzkaller.appspot.com/bug?extid=83e7f982ca045ab4405c compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/549d1add1da9/disk-977b1ef5.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/3e8e501c8aa2/vmlinux-977b1ef5.xz kernel image: https://storage.googleapis.com/syzbot-assets/d02f7cb905b8/bzImage-977b1ef5.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+83e7f982ca045ab44...@syzkaller.appspotmail.com == WARNING: possible circular locking dependency detected 6.9.0-rc4-syzkaller-00266-g977b1ef51866 #0 Not tainted -- syz-executor.0/11241 is trying to acquire lock: 888020a2c0d8 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334 but task is already holding lock: 8880b943e658 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&rq->__lock){-.-.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378 raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 raw_spin_rq_lock kernel/sched/sched.h:1385 [inline] _raw_spin_rq_lock_irqsave kernel/sched/sched.h:1404 [inline] rq_lock_irqsave kernel/sched/sched.h:1683 [inline] class_rq_lock_irqsave_constructor kernel/sched/sched.h:1737 [inline] sched_mm_cid_exit_signals+0x17b/0x4b0 kernel/sched/core.c:12005 exit_signals+0x2a1/0x5c0 kernel/signal.c:3016 do_exit+0x6a8/0x27e0 kernel/exit.c:837 __do_sys_exit kernel/exit.c:994 [inline] __se_sys_exit kernel/exit.c:992 [inline] __pfx___ia32_sys_exit+0x0/0x10 kernel/exit.c:992 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sighand->siglock){-.-.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334 force_sig_fault_to_task kernel/signal.c:1733 [inline] force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738 __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814 handle_page_fault arch/x86/mm/fault.c:1505 [inline] exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 strncpy_from_user+0x2c6/0x2f0 lib/strncpy_from_user.c:138 strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186 bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:216 [inline] bpf_probe_read_compat_str kernel/trace/bpf_trace.c:311 [inline] bpf_probe_read_compat_str+0xe9/0x180 kernel/trace/bpf_trace.c:307 bpf_prog_e42f6260c1b72fb3+0x3d/0x3f bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline] __bpf_prog_run include/linux/filter.h:657 [inline] bpf_prog_run include/linux/filter.h:664 [inline] __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline] bpf_trace_run4+0x25a/0x490 kernel/trace/bpf_trace.c:2422 __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222 trace_sched_switch include/trace/events/sched.h:222 [inline] __schedule+0x2535/0x4a00 kernel/sched/core.c:6743 preempt_schedule_irq+0xfb/0x1c0 kernel/sched/core.c:7068 irqentry_exit+0x5e/0x90 kernel/entry/common.c:354 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 force_sig_fault+0x0/0x1d0 __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814 handle_page_fault arch/x86/mm/fault.c:1505 [inline] exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 __put_user_handle_exception+0x0/0x10 __do_sys_gettimeofday kernel/time/time.c:147 [inline
Re: [PATCH v5 3/5] vduse: Add function to get/free the pages for reconnection
On Thu, Apr 25, 2024 at 09:35:58AM +0800, Jason Wang wrote: > On Wed, Apr 24, 2024 at 5:51 PM Michael S. Tsirkin wrote: > > > > On Wed, Apr 24, 2024 at 08:44:10AM +0800, Jason Wang wrote: > > > On Tue, Apr 23, 2024 at 4:42 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Tue, Apr 23, 2024 at 11:09:59AM +0800, Jason Wang wrote: > > > > > On Tue, Apr 23, 2024 at 4:05 AM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Thu, Apr 18, 2024 at 08:57:51AM +0800, Jason Wang wrote: > > > > > > > On Wed, Apr 17, 2024 at 5:29 PM Michael S. Tsirkin > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Apr 12, 2024 at 09:28:23PM +0800, Cindy Lu wrote: > > > > > > > > > Add the function vduse_alloc_reconnnect_info_mem > > > > > > > > > and vduse_alloc_reconnnect_info_mem > > > > > > > > > These functions allow vduse to allocate and free memory for > > > > > > > > > reconnection > > > > > > > > > information. The amount of memory allocated is vq_num pages. > > > > > > > > > Each VQS will map its own page where the reconnection > > > > > > > > > information will be saved > > > > > > > > > > > > > > > > > > Signed-off-by: Cindy Lu > > > > > > > > > --- > > > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 40 > > > > > > > > > ++ > > > > > > > > > 1 file changed, 40 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > > > b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > > > index ef3c9681941e..2da659d5f4a8 100644 > > > > > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > > > > > @@ -65,6 +65,7 @@ struct vduse_virtqueue { > > > > > > > > > int irq_effective_cpu; > > > > > > > > > struct cpumask irq_affinity; > > > > > > > > > struct kobject kobj; > > > > > > > > > + unsigned long vdpa_reconnect_vaddr; > > > > > > > > > }; > > > > > > > > > > > > > > > > > > struct vduse_dev; > > > > > > > > > @@ -1105,6 +1106,38 @@ static void > > > > > > > > > vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq) > > > > > > > > > > > > > > > > > > vq->irq_effective_cpu = curr_cpu; > > > > > > > > > } > > > > > > > > > +static int vduse_alloc_reconnnect_info_mem(struct vduse_dev > > > > > > > > > *dev) > > > > > > > > > +{ > > > > > > > > > + unsigned long vaddr = 0; > > > > > > > > > + struct vduse_virtqueue *vq; > > > > > > > > > + > > > > > > > > > + for (int i = 0; i < dev->vq_num; i++) { > > > > > > > > > + /*page 0~ vq_num save the reconnect info for > > > > > > > > > vq*/ > > > > > > > > > + vq = dev->vqs[i]; > > > > > > > > > + vaddr = get_zeroed_page(GFP_KERNEL); > > > > > > > > > > > > > > > > > > > > > > > > I don't get why you insist on stealing kernel memory for > > > > > > > > something > > > > > > > > that is just used by userspace to store data for its own use. > > > > > > > > Userspace does not lack ways to persist data, for example, > > > > > > > > create a regular file anywhere in the filesystem. > > > > > > > > > > > > > > Good point. So the motivation here is to: > > > > > > > > > > > > > > 1) be self contained, no dependency for high speed persist data > > > > > > > storage like tmpfs > > > > > > > > > > > > No idea what this means. > > > > > > > > > > I mean a regular file may slow down the datapath performance, so > > > > > usually the application will try to use tmpfs and other which is a > > > > > dependency for implementing the reconnection. > > > > > > > > Are we worried about systems without tmpfs now? > > > > > > Yes. > > > > Why? Who ships these? > > Not sure, but it could be disabled or unmounted. I'm not sure make > VDUSE depends on TMPFS is a good idea. > > Thanks Don't disable or unmount it then? The use-case needs to be much clearer if we are adding a way for userspace to pin kernel memory for unlimited time. -- MST
Re: [PATCH v10 00/12] Initial Marvell PXA1908 support
On Wed, 24 Apr 2024 20:42:27 +0200, Duje Mihanović wrote: > Hello, > > This series adds initial support for the Marvell PXA1908 SoC and > "samsung,coreprimevelte", a smartphone using the SoC. > > USB works and the phone can boot a rootfs from an SD card, but there are > some warnings in the dmesg: > > During SMP initialization: > [0.006519] CPU features: SANITY CHECK: Unexpected variation in > SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU1: 0x00 > [0.006542] CPU features: Unsupported CPU feature variation detected. > [0.006589] CPU1: Booted secondary processor 0x01 [0x410fd032] > [0.010710] Detected VIPT I-cache on CPU2 > [0.010716] CPU features: SANITY CHECK: Unexpected variation in > SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU2: 0x00 > [0.010758] CPU2: Booted secondary processor 0x02 [0x410fd032] > [0.014849] Detected VIPT I-cache on CPU3 > [0.014855] CPU features: SANITY CHECK: Unexpected variation in > SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU3: 0x00 > [0.014895] CPU3: Booted secondary processor 0x03 [0x410fd032] > > SMMU probing fails: > [0.101798] arm-smmu c001.iommu: probing hardware configuration... > [0.101809] arm-smmu c001.iommu: SMMUv1 with: > [0.101816] arm-smmu c001.iommu: no translation support! > > A 3.14 based Marvell tree is available on GitHub > acorn-marvell/brillo_pxa_kernel, and a Samsung one on GitHub > CoderCharmander/g361f-kernel. > > Andreas Färber attempted to upstream support for this SoC in 2017: > https://lore.kernel.org/lkml/20170222022929.10540-1-afaer...@suse.de/ > > Signed-off-by: Duje Mihanović > > Changes in v10: > - Update trailers > - Rebase on v6.9-rc5 > - Clock driver changes: > - Add a couple of forgotten clocks in APBC > - The clocks are thermal_clk, ipc_clk, ssp0_clk, ssp2_clk and swjtag > - The IDs and register offsets were already present, but I forgot to > actually register them > - Split each controller block into own file > - Drop unneeded -of in clock driver filenames > - Simplify struct pxa1908_clk_unit > - Convert to platform driver > - Add module metadata > - DTS changes: > - Properly name pinctrl nodes > - Drop pinctrl #size-cells, #address-cells, ranges and #gpio-size-cells > - Fix pinctrl input-schmitt configuration > - Link to v9: > https://lore.kernel.org/20240402-pxa1908-lkml-v9-0-25a003e83...@skole.hr > > Changes in v9: > - Update trailers and rebase on v6.9-rc2, no changes > - Link to v8: > https://lore.kernel.org/20240110-pxa1908-lkml-v8-0-fea768a59...@skole.hr > > Changes in v8: > - Drop SSPA patch > - Drop broken-cd from eMMC node > - Specify S-Boot hardcoded initramfs location in device tree > - Add ARM PMU node > - Correct inverted modem memory base and size > - Update trailers > - Rebase on next-20240110 > - Link to v7: > https://lore.kernel.org/20231102-pxa1908-lkml-v7-0-cabb1a0cb...@skole.hr > and https://lore.kernel.org/20231102152033.5511-1-duje.mihano...@skole.hr > > Changes in v7: > - Suppress SND_MMP_SOC_SSPA on ARM64 > - Update trailers > - Rebase on v6.6-rc7 > - Link to v6: > https://lore.kernel.org/r/20231010-pxa1908-lkml-v6-0-b2fe09240...@skole.hr > > Changes in v6: > - Address maintainer comments: > - Add "marvell,pxa1908-padconf" binding to pinctrl-single driver > - Drop GPIO patch as it's been pulled > - Update trailers > - Rebase on v6.6-rc5 > - Link to v5: > https://lore.kernel.org/r/20230812-pxa1908-lkml-v5-0-a5d51937e...@skole.hr > > Changes in v5: > - Address maintainer comments: > - Move *_NR_CLKS to clock driver from dt binding file > - Allocate correct number of clocks for each block instead of blindly > allocating 50 for each > - Link to v4: > https://lore.kernel.org/r/20230807-pxa1908-lkml-v4-0-cb387d73b...@skole.hr > > Changes in v4: > - Address maintainer comments: > - Relicense clock binding file to BSD-2 > - Add pinctrl-names to SD card node > - Add vgic registers to GIC node > - Rebase on v6.5-rc5 > - Link to v3: > https://lore.kernel.org/r/20230804-pxa1908-lkml-v3-0-8e48fca37...@skole.hr > > Changes in v3: > - Address maintainer comments: > - Drop GPIO dynamic allocation patch > - Move clock register offsets into driver (instead of bindings file) > - Add missing Tested-by trailer to u32_fract patch > - Move SoC binding to arm/mrvl/mrvl.yaml > - Add serial0 alias and stdout-path to board dts to enable UART > debugging > - Rebase on v6.5-rc4 > - Link to v2: > https://lore.kernel.org/r/20230727162909.6031-1-duje.mihano...@skole.hr > > Changes in v2: > - Remove earlycon patch as it's been merged into tty-next > - Address maintainer comments: > - Clarify GPIO regressions on older PXA platforms > - Add Fixes tag to commit disabling GPIO pinctrl calls for this SoC > - Add missing includes to clock driver > - Clock driver uses HZ_PER_MHZ, u32_fract and GENMASK > - Dual license clock bindings
Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph
On Wed, 24 Apr 2024 15:35:15 +0200 Florent Revest wrote: > Neat! :) I had a look at mostly the "high level" part (fprobe and > arm64 specific bits) and this seems to be in a good state to me. > Thanks for the review this long series! > Thanks for all that work, that is quite a refactoring :) > > On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google) > wrote: > > > > Hi, > > > > Here is the 9th version of the series to re-implement the fprobe on > > function-graph tracer. The previous version is; > > > > https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/ > > > > This version is ported on the latest kernel (v6.9-rc3 + probes/for-next) > > and fixed some bugs + performance optimization patch[36/36]. > > - [12/36] Fix to clear fgraph_array entry in registration failure, also > >return -ENOSPC when fgraph_array is full. > > - [28/36] Add new store_fprobe_entry_data() for fprobe. > > - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation. > > - [36/36] Add new flag to skip timestamp recording. > > > > Overview > > > > This series does major 2 changes, enable multiple function-graphs on > > the ftrace (e.g. allow function-graph on sub instances) and rewrite the > > fprobe on this function-graph. > > > > The former changes had been sent from Steven Rostedt 4 years ago (*), > > which allows users to set different setting function-graph tracer (and > > other tracers based on function-graph) in each trace-instances at the > > same time. > > > > (*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/ > > > > The purpose of latter change are; > > > > 1) Remove dependency of the rethook from fprobe so that we can reduce > >the return hook code and shadow stack. > > > > 2) Make 'ftrace_regs' the common trace interface for the function > >boundary. > > > > 1) Currently we have 2(or 3) different function return hook codes, > > the function-graph tracer and rethook (and legacy kretprobe). > > But since this is redundant and needs double maintenance cost, > > I would like to unify those. From the user's viewpoint, function- > > graph tracer is very useful to grasp the execution path. For this > > purpose, it is hard to use the rethook in the function-graph > > tracer, but the opposite is possible. (Strictly speaking, kretprobe > > can not use it because it requires 'pt_regs' for historical reasons.) > > > > 2) Now the fprobe provides the 'pt_regs' for its handler, but that is > > wrong for the function entry and exit. Moreover, depending on the > > architecture, there is no way to accurately reproduce 'pt_regs' > > outside of interrupt or exception handlers. This means fprobe should > > not use 'pt_regs' because it does not use such exceptions. > > (Conversely, kprobe should use 'pt_regs' because it is an abstract > > interface of the software breakpoint exception.) > > > > This series changes fprobe to use function-graph tracer for tracing > > function entry and exit, instead of mixture of ftrace and rethook. > > Unlike the rethook which is a per-task list of system-wide allocated > > nodes, the function graph's ret_stack is a per-task shadow stack. > > Thus it does not need to set 'nr_maxactive' (which is the number of > > pre-allocated nodes). > > Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'. > > Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as > > their register interface, this changes it to convert 'ftrace_regs' to > > 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs', > > so users must access only registers for function parameters or > > return value. > > > > Design > > -- > > Instead of using ftrace's function entry hook directly, the new fprobe > > is built on top of the function-graph's entry and return callbacks > > with 'ftrace_regs'. > > > > Since the fprobe requires access to 'ftrace_regs', the architecture > > must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and > > CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph > > entry callback with 'ftrace_regs', and also > > CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to > > return_to_handler. > > > > All fprobes share a single function-graph ops (means shares a common > > ftrace filter) similar to the kprobe-on-ftrace. This needs another > > layer to find corresponding fprobe in the common function-graph > > callbacks, but has much better scalability, since the number of > > registered function-graph ops is limited. > > > > In the entry callback, the fprobe runs its entry_handler and saves the > > address of 'fprobe' on the function-graph's shadow stack as data. The > > return callback decodes the data to get the 'fprobe' address, and runs > > the exit_handler. > > > > The fprobe introduces two hash-tables, one is for entry callback which > > searches fprobes related to the given function address passed by entry > > callback. The other is for a retu
Re: [PATCH net-next v6] net/ipv4: add tracepoint for icmp_send
On Tue, 23 Apr 2024 17:23:39 +0800 (CST) xu.xi...@zte.com.cn wrote: > +#include > \ No newline at end of file Please fix. -- pw-bot: cr
Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
On 24 Apr 23:00, David Hildenbrand wrote: > > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for > > hugetlb mappings. However this was also on my TODO and I have a draft > > patch that implements it. > > Yes, I documented it back then and added sanity checks in GUP code to fence > it off. Shouldn't be too hard to implement (famous last words) and would be > the cleaner thing to use here once I manage to switch over to > FOLL_WRITE|FOLL_FORCE to break COW. Yes, my patch seems to be working. The hugetlb code is pretty simple. And it allows ptrace and the proc pid mem file to work on the executable private hugetlb mappings. There is one thing I am unclear about though. hugetlb enforces that huge_pte_write() is true on FOLL_WRITE in both the fault and follow_page_mask paths. I am not sure if we can simply assume in the hugetlb code that if the pte is not writable and this is a write fault then we're in the FOLL_FORCE|FOLL_WRITE case. Or do we want to keep the checks simply not enforce it for FOLL_FORCE|FOLL_WRITE? The latter is more complicated in the fault path because there is no FAULT_FLAG_FORCE flag. -- Guillaume Morin
Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
On 25.04.24 17:19, Guillaume Morin wrote: On 24 Apr 23:00, David Hildenbrand wrote: One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for hugetlb mappings. However this was also on my TODO and I have a draft patch that implements it. Yes, I documented it back then and added sanity checks in GUP code to fence it off. Shouldn't be too hard to implement (famous last words) and would be the cleaner thing to use here once I manage to switch over to FOLL_WRITE|FOLL_FORCE to break COW. Yes, my patch seems to be working. The hugetlb code is pretty simple. And it allows ptrace and the proc pid mem file to work on the executable private hugetlb mappings. There is one thing I am unclear about though. hugetlb enforces that huge_pte_write() is true on FOLL_WRITE in both the fault and follow_page_mask paths. I am not sure if we can simply assume in the hugetlb code that if the pte is not writable and this is a write fault then we're in the FOLL_FORCE|FOLL_WRITE case. Or do we want to keep the checks simply not enforce it for FOLL_FORCE|FOLL_WRITE? The latter is more complicated in the fault path because there is no FAULT_FLAG_FORCE flag. handle_mm_fault()->sanitize_fault_flags() makes sure that we'll only proceed with a fault either if * we have VM_WRITE set * we are in a COW mapping (MAP_PRIVATE with at least VM_MAYWRITE) Once you see FAULT_FLAG_WRITE and you do have VM_WRITE, you don't care about FOLL_FORCE, it's simply a write fault. Once you see FAULT_FLAG_WRITE and you *don't* have VM_WRITE, you must have VM_MAYWRITE and are essentially in FOLL_FORCE. In a VMA without VM_WRITE, you must never map a PTE writable. In ordinary COW code, that's done in wp_page_copy(), where we *always* use maybe_mkwrite(), to do exactly what a write fault would do, but without mapping the PTE writable. That's what the whole can_follow_write_pmd()/can_follow_write_pte() is about: writing to PTEs that are not writable. You'll have to follow the exact same model in hugetlb (can_follow_write_pmd(), hugetlb_maybe_mkwrite(), ...). -- Cheers, David / dhildenb
Re: [PATCH RFC 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc
On Wed, Apr 24, 2024 at 07:21:51PM +0200, Luca Weiss wrote: > The qcom,ipc-N properties are essentially providing a reference to a > mailbox, so allow using the mboxes property to do the same in a more > structured way. Can we mark qcom,ipc-N as deprecated then? > Since multiple SMSM hosts are supported, we need to be able to provide > the correct mailbox for each host. The old qcom,ipc-N properties map to > the mboxes property by index, starting at 0 since that's a valid SMSM > host also. > > The new example shows how an smsm node with just qcom,ipc-3 should be > specified with the mboxes property. > > Signed-off-by: Luca Weiss > --- > .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 48 > ++ > 1 file changed, 40 insertions(+), 8 deletions(-) > > diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > index db67cf043256..b12589171169 100644 > --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > @@ -33,6 +33,13 @@ properties: >specifier of the column in the subscription matrix representing the > local >processor. > > + mboxes: > +minItems: 1 > +maxItems: 5 Need to define what each entry is. > +description: > + Reference to the mailbox representing the outgoing doorbell in APCS for > + this client. > + >'#size-cells': > const: 0 > > @@ -98,15 +105,18 @@ required: >- '#address-cells' >- '#size-cells' > > -anyOf: > +oneOf: >- required: > - - qcom,ipc-1 > - - required: > - - qcom,ipc-2 > - - required: > - - qcom,ipc-3 > - - required: > - - qcom,ipc-4 > + - mboxes > + - anyOf: > + - required: > + - qcom,ipc-1 > + - required: > + - qcom,ipc-2 > + - required: > + - qcom,ipc-3 > + - required: > + - qcom,ipc-4 > > additionalProperties: false > > @@ -136,3 +146,25 @@ examples: > #interrupt-cells = <2>; > }; > }; > + # Example using mboxes property > + - | > +#include > + > +shared-memory { > +compatible = "qcom,smsm"; > +#address-cells = <1>; > +#size-cells = <0>; > +mboxes = <0>, <0>, <0>, <&apcs 19>; > + > +apps@0 { > +reg = <0>; > +#qcom,smem-state-cells = <1>; > +}; > + > +wcnss@7 { > +reg = <7>; > +interrupts = ; > +interrupt-controller; > +#interrupt-cells = <2>; > +}; > +}; > > -- > 2.44.0 >
[PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets
It was observed in the wild that pairs of consecutive packets would leave the IPVS with the same wrong checksum, and the issue only went away when disabling GSO. IPVS needs to avoid computing the SCTP checksum when using GSO. Fixes: 90017accff61 ("sctp: Add GSO support", 2016-06-02) Co-developed-by: Firo Yang Signed-off-by: Ismael Luceno Tested-by: Andreas Taschner CC: Michal Kubeček CC: Simon Horman CC: Julian Anastasov CC: lvs-de...@vger.kernel.org CC: netfilter-de...@vger.kernel.org CC: net...@vger.kernel.org CC: coret...@netfilter.org --- Notes: Changes since v2: * Use only skb_is_gso, no need to check for GSO type Changes since v1: * Added skb_is_gso before skb_is_gso_sctp. * Added "Fixes" tag. net/netfilter/ipvs/ip_vs_proto_sctp.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c index a0921adc31a9..83e452916403 100644 --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c @@ -126,7 +126,8 @@ sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, if (sctph->source != cp->vport || payload_csum || skb->ip_summed == CHECKSUM_PARTIAL) { sctph->source = cp->vport; - sctp_nat_csum(skb, sctph, sctphoff); + if (!skb_is_gso(skb)) + sctp_nat_csum(skb, sctph, sctphoff); } else { skb->ip_summed = CHECKSUM_UNNECESSARY; } @@ -174,7 +175,8 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, (skb->ip_summed == CHECKSUM_PARTIAL && !(skb_dst(skb)->dev->features & NETIF_F_SCTP_CRC))) { sctph->dest = cp->dport; - sctp_nat_csum(skb, sctph, sctphoff); + if (!skb_is_gso(skb)) + sctp_nat_csum(skb, sctph, sctphoff); } else if (skb->ip_summed != CHECKSUM_PARTIAL) { skb->ip_summed = CHECKSUM_UNNECESSARY; } -- 2.43.0
Re: [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions
On Wed, Apr 24, 2024 at 10:55:54PM +0200, David Hildenbrand wrote: > On 24.04.24 22:31, Vincent Donnefort wrote: > > Hi David, > > > > Thanks for your quick response. > > > > On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote: > > > > > > I gave it some more thought, and I think we are still missing something (I > > > wish PFNMAP/MIXEDMAP wouldn't be that hard). > > > > > > > + > > > > +/* > > > > + * +--+ pgoff == 0 > > > > + * | meta page | > > > > + * +--+ pgoff == 1 > > > > + * | subbuffer 0 | > > > > + * | | > > > > + * +--+ pgoff == (1 + (1 << subbuf_order)) > > > > + * | subbuffer 1 | > > > > + * | | > > > > + * ... > > > > + */ > > > > +#ifdef CONFIG_MMU > > > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer, > > > > + struct vm_area_struct *vma) > > > > +{ > > > > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = > > > > vma->vm_pgoff; > > > > + unsigned int subbuf_pages, subbuf_order; > > > > + struct page **pages; > > > > + int p = 0, s = 0; > > > > + int err; > > > > + > > > > > > I'd add some comments here like > > > > > > /* Refuse any MAP_PRIVATE or writable mappings. */ > > > > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC || > > > > + !(vma->vm_flags & VM_MAYSHARE)) > > > > + return -EPERM; > > > > + > > > > > > /* > > > * Make sure the mapping cannot become writable later. Also, tell the VM > > > * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell > > > * GUP to leave them alone as well (VM_IO). > > > */ > > > > + vm_flags_mod(vma, > > > > +VM_MIXEDMAP | VM_PFNMAP | > > > > +VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO, > > > > +VM_MAYWRITE); > > > > > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and, > > > as stated, vm_insert_pages() even complains quite a lot when it would have > > > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good > > > reason. > > > > > > Can't we limit ourselves to VM_IO? > > > > > > But then, I wonder if it really helps much regarding GUP: yes, it blocks > > > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked() > > > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not > > > reject it. > > > > > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends > > > are > > > the way to go, that will set pte_special() such that also GUP-fast will > > > leave it alone, just like vm_normal_page() would. > > > > > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone > > > won't stop all of GUP. We really have to mark the PTE as special, which > > > vm_insert_page() must not do (because it is refcounted!). > > > > Hum, apologies, I am not sure to follow the connection here. Why do you > > think > > the recommendation was to prevent GUP? > > Ah, I'm hallucinating! :) "not let people play games with the mapping" to me > implied "make sure nobody touches it". If GUP is acceptable that makes stuff > a lot easier. VM_IO will block some GUP, but not all of it. > > > > > > > > > Which means: do we really have to stop GUP from grabbing that page? > > > > > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO) > > > would be better. > > > > Under the assumption we do not want to stop all GUP, why not using VM_IO > > over > > VM_MIXEDMAP which is I believe more restrictive? > > VM_MIXEDMAP will be implicitly set by vm_insert_page(). There is a lengthy > comment > for vm_normal_page() that explains all this madness. VM_MIXEDMAP is primarily > relevant for COW mappings, which you just forbid completely. > > remap_pfn_range_notrack() documents the semantics of some of the other flags: > >* VM_IO tells people not to look at these pages >* (accesses can have side effects). >* VM_PFNMAP tells the core MM that the base pages are just >* raw PFN mappings, and do not have a "struct page" associated >* with them. >* VM_DONTEXPAND >* Disable vma merging and expanding with mremap(). >* VM_DONTDUMP >* Omit vma from core dump, even when VM_IO turned off. > > VM_PFNMAP is very likely really not what we want, unless we really perform raw > PFN mappings ... VM_IO we can set without doing much harm. > > So I would suggest dropping VM_PFNMAP when using vm_insert_pages(), using > only VM_IO > and likely just letting vm_insert_pages() set VM_MIXEDMAP for you. Sounds good, I will do that in v22. > > [...] > > > > > > > vm_insert_pages() documents: "In case of error, we may have mapped a > > > subset > > > of the provided pages. It is the caller's responsibility to account for > > > this > > > case." >
Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
On Wed, Apr 24, 2024 at 03:36:39PM -0500, Rob Herring wrote: > > On Wed, 24 Apr 2024 14:06:09 -0500, Andrew Davis wrote: > > From: Hari Nagalla > > > > K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain. > > The remote processor's life cycle management and IPC mechanisms are > > similar across the R5F and M4F cores from remote processor driver > > point of view. However, there are subtle differences in image loading > > and starting the M4F subsystems. > > > > The YAML binding document provides the various node properties to be > > configured by the consumers of the M4F subsystem. > > > > Signed-off-by: Martyn Welch > > Signed-off-by: Hari Nagalla > > Signed-off-by: Andrew Davis > > --- > > .../bindings/remoteproc/ti,k3-m4f-rproc.yaml | 126 ++ > > 1 file changed, 126 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml > > > > My bot found errors running 'make dt_binding_check' on your patch: > > yamllint warnings/errors: > > dtschema/dtc warnings/errors: > > > doc reference errors (make refcheckdocs): > Warning: Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml > references a file that doesn't exist: > Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml: > Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml The file is now in dt-schema: https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml signature.asc Description: PGP signature
Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
On Wed, Apr 24, 2024 at 02:06:09PM -0500, Andrew Davis wrote: > From: Hari Nagalla > > K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain. > The remote processor's life cycle management and IPC mechanisms are > similar across the R5F and M4F cores from remote processor driver > point of view. However, there are subtle differences in image loading > and starting the M4F subsystems. > > The YAML binding document provides the various node properties to be > configured by the consumers of the M4F subsystem. > > Signed-off-by: Martyn Welch > Signed-off-by: Hari Nagalla > Signed-off-by: Andrew Davis > + mboxes: > +description: | > + memory-region: > +description: | Not sure that either chomping operator is needed here, but that's a nit. With the incorrect link fixed up, Reviewed-by: Conor Dooley Cheers, Conor. signature.asc Description: PGP signature
Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets
Hello, On Thu, 25 Apr 2024, Ismael Luceno wrote: > It was observed in the wild that pairs of consecutive packets would leave > the IPVS with the same wrong checksum, and the issue only went away when > disabling GSO. > > IPVS needs to avoid computing the SCTP checksum when using GSO. > > Fixes: 90017accff61 ("sctp: Add GSO support", 2016-06-02) > Co-developed-by: Firo Yang > Signed-off-by: Ismael Luceno > Tested-by: Andreas Taschner > CC: Michal Kubeček > CC: Simon Horman > CC: Julian Anastasov > CC: lvs-de...@vger.kernel.org > CC: netfilter-de...@vger.kernel.org > CC: net...@vger.kernel.org > CC: coret...@netfilter.org > --- > > Notes: > Changes since v2: > * Use only skb_is_gso, no need to check for GSO type v2 is already applied. I acked it because sctp_gso_segment() checks for skb_is_gso_sctp(). If v3 is just an optimization better to live with v2? Is it possible to see skb_is_gso() but not skb_is_gso_sctp() while working with SCTP packet? > Changes since v1: > * Added skb_is_gso before skb_is_gso_sctp. > * Added "Fixes" tag. > > net/netfilter/ipvs/ip_vs_proto_sctp.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c > b/net/netfilter/ipvs/ip_vs_proto_sctp.c > index a0921adc31a9..83e452916403 100644 > --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c > +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c > @@ -126,7 +126,8 @@ sctp_snat_handler(struct sk_buff *skb, struct > ip_vs_protocol *pp, > if (sctph->source != cp->vport || payload_csum || > skb->ip_summed == CHECKSUM_PARTIAL) { > sctph->source = cp->vport; > - sctp_nat_csum(skb, sctph, sctphoff); > + if (!skb_is_gso(skb)) > + sctp_nat_csum(skb, sctph, sctphoff); > } else { > skb->ip_summed = CHECKSUM_UNNECESSARY; > } > @@ -174,7 +175,8 @@ sctp_dnat_handler(struct sk_buff *skb, struct > ip_vs_protocol *pp, > (skb->ip_summed == CHECKSUM_PARTIAL && >!(skb_dst(skb)->dev->features & NETIF_F_SCTP_CRC))) { > sctph->dest = cp->dport; > - sctp_nat_csum(skb, sctph, sctphoff); > + if (!skb_is_gso(skb)) > + sctp_nat_csum(skb, sctph, sctphoff); > } else if (skb->ip_summed != CHECKSUM_PARTIAL) { > skb->ip_summed = CHECKSUM_UNNECESSARY; > } > -- > 2.43.0 Regards -- Julian Anastasov
Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets
On Thu, 25 Apr 2024 18:28:40 +0200 Ismael Luceno wrote: > Changes since v2: > * Use only skb_is_gso, no need to check for GSO type v2 is already in the tree, if the change is important you need to send an incremental fix.
Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
On Thu, Apr 25, 2024 at 02:05:31AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:977b1ef51866 Merge tag 'block-6.9-20240420' of git://git.k.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=17080d2098 > kernel config: https://syzkaller.appspot.com/x/.config?x=f47e5e015c177e57 > dashboard link: https://syzkaller.appspot.com/bug?extid=83e7f982ca045ab4405c > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) > 2.40 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/549d1add1da9/disk-977b1ef5.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/3e8e501c8aa2/vmlinux-977b1ef5.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/d02f7cb905b8/bzImage-977b1ef5.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+83e7f982ca045ab44...@syzkaller.appspotmail.com > > == > WARNING: possible circular locking dependency detected > 6.9.0-rc4-syzkaller-00266-g977b1ef51866 #0 Not tainted > -- > syz-executor.0/11241 is trying to acquire lock: > 888020a2c0d8 (&sighand->siglock){-.-.}-{2:2}, at: > force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334 > > but task is already holding lock: > 8880b943e658 (&rq->__lock){-.-.}-{2:2}, at: > raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&rq->__lock){-.-.}-{2:2}: >lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 >_raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378 >raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 >raw_spin_rq_lock kernel/sched/sched.h:1385 [inline] >_raw_spin_rq_lock_irqsave kernel/sched/sched.h:1404 [inline] >rq_lock_irqsave kernel/sched/sched.h:1683 [inline] >class_rq_lock_irqsave_constructor kernel/sched/sched.h:1737 [inline] >sched_mm_cid_exit_signals+0x17b/0x4b0 kernel/sched/core.c:12005 >exit_signals+0x2a1/0x5c0 kernel/signal.c:3016 >do_exit+0x6a8/0x27e0 kernel/exit.c:837 >__do_sys_exit kernel/exit.c:994 [inline] >__se_sys_exit kernel/exit.c:992 [inline] >__pfx___ia32_sys_exit+0x0/0x10 kernel/exit.c:992 >do_syscall_x64 arch/x86/entry/common.c:52 [inline] >do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 >entry_SYSCALL_64_after_hwframe+0x77/0x7f > > -> #0 (&sighand->siglock){-.-.}-{2:2}: >check_prev_add kernel/locking/lockdep.c:3134 [inline] >check_prevs_add kernel/locking/lockdep.c:3253 [inline] >validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 >__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 >lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 >__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 >force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334 >force_sig_fault_to_task kernel/signal.c:1733 [inline] >force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738 >__bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814 >handle_page_fault arch/x86/mm/fault.c:1505 [inline] >exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563 >asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 >strncpy_from_user+0x2c6/0x2f0 lib/strncpy_from_user.c:138 >strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186 >bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:216 [inline] >bpf_probe_read_compat_str kernel/trace/bpf_trace.c:311 [inline] >bpf_probe_read_compat_str+0xe9/0x180 kernel/trace/bpf_trace.c:307 >bpf_prog_e42f6260c1b72fb3+0x3d/0x3f >bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline] >__bpf_prog_run include/linux/filter.h:657 [inline] >bpf_prog_run include/linux/filter.h:664 [inline] >__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline] >bpf_trace_run4+0x25a/0x490 kernel/trace/bpf_trace.c:2422 >__traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222 >trace_sched_switch include/trace/events/sched.h:222 [inline] >__schedule+0x2535/0x4a00 kernel/sched/core.c:6743 >preempt_schedule_irq+0xfb/0x1c0 kernel/sched/core.c:7068 >irqentry_exit+0x5e/0x90 kernel/entry/common.c:354 >asm_sysvec_apic_timer_interrupt+0x1a/0x20 > arch/x86/include/asm/idtentry.h:702 >force_sig_fault+0x0/0x1d0 >__bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814 >handle_page_fault arch/x86/mm/fault.c:
Re: [PATCH RFC 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc
On Donnerstag, 25. April 2024 18:17:15 MESZ Rob Herring wrote: > On Wed, Apr 24, 2024 at 07:21:51PM +0200, Luca Weiss wrote: > > The qcom,ipc-N properties are essentially providing a reference to a > > mailbox, so allow using the mboxes property to do the same in a more > > structured way. > > Can we mark qcom,ipc-N as deprecated then? Yes, that should be ok. Will also send a similar change to the other bindings that support both qcom,ipc and mboxes. > > > Since multiple SMSM hosts are supported, we need to be able to provide > > the correct mailbox for each host. The old qcom,ipc-N properties map to > > the mboxes property by index, starting at 0 since that's a valid SMSM > > host also. > > > > The new example shows how an smsm node with just qcom,ipc-3 should be > > specified with the mboxes property. > > > > Signed-off-by: Luca Weiss > > --- > > .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 48 > > ++ > > 1 file changed, 40 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > > b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > > index db67cf043256..b12589171169 100644 > > --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > > +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml > > @@ -33,6 +33,13 @@ properties: > >specifier of the column in the subscription matrix representing the > > local > >processor. > > > > + mboxes: > > +minItems: 1 > > +maxItems: 5 > > Need to define what each entry is. The entry is (description from qcom,ipc-N) "the outgoing ipc bit used for signaling the N:th remote processor." So you want me to add 5 times e.g. - the IPC mailbox used for signaling the 0th remote processor - the IPC mailbox used for signaling the 1st remote processor etc? I don't really have any extra knowledge on smsm to be able to write something better there.. Also what are your thoughts on this binding vs the alternative I wrote in the cover letter? I'm not really happy about how the properties are represented. Regards Luca > > > +description: > > + Reference to the mailbox representing the outgoing doorbell in APCS > > for > > + this client. > > + > >'#size-cells': > > const: 0 > > > > @@ -98,15 +105,18 @@ required: > >- '#address-cells' > >- '#size-cells' > > > > -anyOf: > > +oneOf: > >- required: > > - - qcom,ipc-1 > > - - required: > > - - qcom,ipc-2 > > - - required: > > - - qcom,ipc-3 > > - - required: > > - - qcom,ipc-4 > > + - mboxes > > + - anyOf: > > + - required: > > + - qcom,ipc-1 > > + - required: > > + - qcom,ipc-2 > > + - required: > > + - qcom,ipc-3 > > + - required: > > + - qcom,ipc-4 > > > > additionalProperties: false > > > > @@ -136,3 +146,25 @@ examples: > > #interrupt-cells = <2>; > > }; > > }; > > + # Example using mboxes property > > + - | > > +#include > > + > > +shared-memory { > > +compatible = "qcom,smsm"; > > +#address-cells = <1>; > > +#size-cells = <0>; > > +mboxes = <0>, <0>, <0>, <&apcs 19>; > > + > > +apps@0 { > > +reg = <0>; > > +#qcom,smem-state-cells = <1>; > > +}; > > + > > +wcnss@7 { > > +reg = <7>; > > +interrupts = ; > > +interrupt-controller; > > +#interrupt-cells = <2>; > > +}; > > +}; > > >
Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
On 4/25/24 12:15 PM, Conor Dooley wrote: On Wed, Apr 24, 2024 at 03:36:39PM -0500, Rob Herring wrote: On Wed, 24 Apr 2024 14:06:09 -0500, Andrew Davis wrote: From: Hari Nagalla K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain. The remote processor's life cycle management and IPC mechanisms are similar across the R5F and M4F cores from remote processor driver point of view. However, there are subtle differences in image loading and starting the M4F subsystems. The YAML binding document provides the various node properties to be configured by the consumers of the M4F subsystem. Signed-off-by: Martyn Welch Signed-off-by: Hari Nagalla Signed-off-by: Andrew Davis --- .../bindings/remoteproc/ti,k3-m4f-rproc.yaml | 126 ++ 1 file changed, 126 insertions(+) create mode 100644 Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml My bot found errors running 'make dt_binding_check' on your patch: yamllint warnings/errors: dtschema/dtc warnings/errors: doc reference errors (make refcheckdocs): Warning: Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml references a file that doesn't exist: Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml: Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml The file is now in dt-schema: https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml So should I use "reserved-memory/reserved-memory.yaml" here, or just drop this line completely? Andrew
[PATCH 0/2] Mark qcom,ipc as deprecated in two schemas
The mboxes property has been supported in those bindings since a while and was always meant to the replace qcom,ipc properties, so let's mark qcom,ipc as deprecated - and update the examples to use mboxes. Related: https://lore.kernel.org/linux-arm-msm/20240424-apcs-mboxes-v1-0-6556c47cb...@z3ntu.xyz/ Signed-off-by: Luca Weiss --- Luca Weiss (2): dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated dt-bindings: soc: qcom,smp2p: Mark qcom,ipc as deprecated Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml | 3 ++- Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) --- base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 change-id: 20240425-qcom-ipc-deprecate-37afbe33cadc Best regards, -- Luca Weiss
[PATCH 1/2] dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated
Deprecate the qcom,ipc way of accessing the mailbox in favor of the 'mboxes' property. Update the example to use mboxes. Signed-off-by: Luca Weiss --- Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml index 02c85b420c1a..63500b1a0f6f 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml +++ b/Documentation/devicetree/bindings/remoteproc/qcom,smd-edge.yaml @@ -61,6 +61,7 @@ properties: description: Three entries specifying the outgoing ipc bit used for signaling the remote processor. +deprecated: true qcom,smd-edge: $ref: /schemas/types.yaml#/definitions/uint32 @@ -111,7 +112,7 @@ examples: smd-edge { interrupts = ; -qcom,ipc = <&apcs 8 8>; +mboxes = <&apcs 8>; qcom,smd-edge = <1>; }; }; -- 2.44.0
[PATCH 2/2] dt-bindings: soc: qcom,smp2p: Mark qcom,ipc as deprecated
Deprecate the qcom,ipc way of accessing the mailbox in favor of the 'mboxes' property. Update the example to use mboxes. Signed-off-by: Luca Weiss --- Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml index 58500529b90f..141d666dc3f7 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smp2p.yaml @@ -41,6 +41,7 @@ properties: description: Three entries specifying the outgoing ipc bit used for signaling the remote end of the smp2p edge. +deprecated: true qcom,local-pid: $ref: /schemas/types.yaml#/definitions/uint32 @@ -128,7 +129,7 @@ examples: compatible = "qcom,smp2p"; qcom,smem = <431>, <451>; interrupts = ; -qcom,ipc = <&apcs 8 18>; +mboxes = <&apcs 18>; qcom,local-pid = <0>; qcom,remote-pid = <4>; -- 2.44.0
Re: [PATCH v7 1/6] soc: qcom: pdr: protect locator_addr with the main mutex
On 4/24/2024 2:27 AM, Dmitry Baryshkov wrote: If the service locator server is restarted fast enough, the PDR can rewrite locator_addr fields concurrently. Protect them by placing modification of those fields under the main pdr->lock. Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers") Tested-by: Neil Armstrong # on SM8550-QRD Signed-off-by: Dmitry Baryshkov --- drivers/soc/qcom/pdr_interface.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c index a1b6a4081dea..19cfe4b41235 100644 --- a/drivers/soc/qcom/pdr_interface.c +++ b/drivers/soc/qcom/pdr_interface.c @@ -76,12 +76,12 @@ static int pdr_locator_new_server(struct qmi_handle *qmi, locator_hdl); struct pdr_service *pds; + mutex_lock(&pdr->lock); /* Create a local client port for QMI communication */ pdr->locator_addr.sq_family = AF_QIPCRTR; pdr->locator_addr.sq_node = svc->node; pdr->locator_addr.sq_port = svc->port; - mutex_lock(&pdr->lock); pdr->locator_init_complete = true; mutex_unlock(&pdr->lock); @@ -104,10 +104,10 @@ static void pdr_locator_del_server(struct qmi_handle *qmi, mutex_lock(&pdr->lock); pdr->locator_init_complete = false; - mutex_unlock(&pdr->lock); pdr->locator_addr.sq_node = 0; pdr->locator_addr.sq_port = 0; + mutex_unlock(&pdr->lock); } static const struct qmi_ops pdr_locator_ops = { These two functions are provided as qmi_ops handlers in pdr_locator_ops. Aren't they serialized in the qmi handle's workqueue since it as an ordered_workqueue? Even in a fast pdr scenario I don't think we would see a race condition between these two functions. The other access these two functions do race against is in the pdr_notifier_work. I think you would need to protect locator_addr in pdr_get_domain_list since the qmi_send_request there uses 'pdr->locator_addr'. Thanks! Chris
Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
On 25.04.24 17:19, Guillaume Morin wrote: On 24 Apr 23:00, David Hildenbrand wrote: One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for hugetlb mappings. However this was also on my TODO and I have a draft patch that implements it. Yes, I documented it back then and added sanity checks in GUP code to fence it off. Shouldn't be too hard to implement (famous last words) and would be the cleaner thing to use here once I manage to switch over to FOLL_WRITE|FOLL_FORCE to break COW. Yes, my patch seems to be working. The hugetlb code is pretty simple. And it allows ptrace and the proc pid mem file to work on the executable private hugetlb mappings. There is one thing I am unclear about though. hugetlb enforces that huge_pte_write() is true on FOLL_WRITE in both the fault and follow_page_mask paths. I am not sure if we can simply assume in the hugetlb code that if the pte is not writable and this is a write fault then we're in the FOLL_FORCE|FOLL_WRITE case. Or do we want to keep the checks simply not enforce it for FOLL_FORCE|FOLL_WRITE? The latter is more complicated in the fault path because there is no FAULT_FLAG_FORCE flag. I just pushed something to https://github.com/davidhildenbrand/linux/tree/uprobes_cow Only very lightly tested so far. Expect the worst :) I still detest having the zapping logic there, but to get it all right I don't see a clean way around that. For hugetlb, we'd primarily have to implement the mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE). Likely vaddr and PAGE_SIZE in uprobe_write_opcode() would have to be expanded to cover the full hugetlb page. -- Cheers, David / dhildenb
Re: [PATCH v9 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
On Mon, Apr 15, 2024 at 6:22 AM Masami Hiramatsu (Google) wrote: > > From: Masami Hiramatsu (Google) > > Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is > converted from ftrace_regs by ftrace_partial_regs(), thus some registers > may always returns 0. But it should be enough for function entry (access > arguments) and exit (access return value). > > Signed-off-by: Masami Hiramatsu (Google) > Acked-by: Florent Revest > --- > Changes from previous series: NOTHING, Update against the new series. > --- > kernel/trace/bpf_trace.c | 22 +- > 1 file changed, 9 insertions(+), 13 deletions(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index e51a6ef87167..57b1174030c9 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void) > fs_initcall(bpf_event_init); > #endif /* CONFIG_MODULES */ > > -#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) > +#ifdef CONFIG_FPROBE > struct bpf_kprobe_multi_link { > struct bpf_link link; > struct fprobe fp; > @@ -2600,6 +2600,8 @@ struct user_syms { > char *buf; > }; > > +static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs); this is a waste if CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, right? Can we guard it? > + > static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, > u32 cnt) > { > unsigned long __user usymbol; > @@ -2792,13 +2794,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct > bpf_run_ctx *ctx) > > static int > kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, > - unsigned long entry_ip, struct pt_regs *regs) > + unsigned long entry_ip, struct ftrace_regs *fregs) > { > struct bpf_kprobe_multi_run_ctx run_ctx = { > .link = link, > .entry_ip = entry_ip, > }; > struct bpf_run_ctx *old_run_ctx; > + struct pt_regs *regs; > int err; > > if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { > @@ -2809,6 +2812,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link > *link, > > migrate_disable(); > rcu_read_lock(); > + regs = ftrace_partial_regs(fregs, > this_cpu_ptr(&bpf_kprobe_multi_pt_regs)); and then pass NULL if defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)? > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > err = bpf_prog_run(link->link.prog, regs); > bpf_reset_run_ctx(old_run_ctx); > @@ -2826,13 +2830,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned > long fentry_ip, > void *data) > { > struct bpf_kprobe_multi_link *link; > - struct pt_regs *regs = ftrace_get_regs(fregs); > - > - if (!regs) > - return 0; > > link = container_of(fp, struct bpf_kprobe_multi_link, fp); > - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs); > + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs); > return 0; > } > > @@ -2842,13 +2842,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, > unsigned long fentry_ip, >void *data) > { > struct bpf_kprobe_multi_link *link; > - struct pt_regs *regs = ftrace_get_regs(fregs); > - > - if (!regs) > - return; > > link = container_of(fp, struct bpf_kprobe_multi_link, fp); > - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs); > + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs); > } > > static int symbols_cmp_r(const void *a, const void *b, const void *priv) > @@ -3107,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr > *attr, struct bpf_prog *pr > kvfree(cookies); > return err; > } > -#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */ > +#else /* !CONFIG_FPROBE */ > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog > *prog) > { > return -EOPNOTSUPP; > >
Re: [PATCH v9 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded
On Mon, Apr 15, 2024 at 6:25 AM Masami Hiramatsu (Google) wrote: > > From: Masami Hiramatsu (Google) > > Skip recording calltime and rettime if the fgraph_ops does not need it. > This is a kind of performance optimization for fprobe. Since the fprobe > user does not use these entries, recording timestamp in fgraph is just > a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag, > and all fgraph_ops sets this flag, skip recording calltime and rettime. > > Suggested-by: Jiri Olsa > Signed-off-by: Masami Hiramatsu (Google) > --- > Changes in v9: > - Newly added. > --- > include/linux/ftrace.h |2 ++ > kernel/trace/fgraph.c | 46 +++--- > kernel/trace/fprobe.c |1 + > 3 files changed, 42 insertions(+), 7 deletions(-) > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index d845a80a3d56..06fc7cbef897 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -1156,6 +1156,8 @@ struct fgraph_ops { > struct ftrace_ops ops; /* for the hash lists */ > void*private; > int idx; > + /* If skip_timestamp is true, this does not record timestamps. */ > + boolskip_timestamp; > }; > > void *fgraph_reserve_data(int idx, int size_bytes); > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c > index 7556fbbae323..a5722537bb79 100644 > --- a/kernel/trace/fgraph.c > +++ b/kernel/trace/fgraph.c > @@ -131,6 +131,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph); > int ftrace_graph_active; > > static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE]; > +static bool fgraph_skip_timestamp; > > /* LRU index table for fgraph_array */ > static int fgraph_lru_table[FGRAPH_ARRAY_SIZE]; > @@ -475,7 +476,7 @@ void ftrace_graph_stop(void) > static int > ftrace_push_return_trace(unsigned long ret, unsigned long func, > unsigned long frame_pointer, unsigned long *retp, > -int fgraph_idx) > +int fgraph_idx, bool skip_ts) > { > struct ftrace_ret_stack *ret_stack; > unsigned long long calltime; > @@ -498,8 +499,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned > long func, > ret_stack = get_ret_stack(current, current->curr_ret_stack, &index); > if (ret_stack && ret_stack->func == func && > get_fgraph_type(current, index + FGRAPH_RET_INDEX) == > FGRAPH_TYPE_BITMAP && > - !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, > fgraph_idx)) > + !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, > fgraph_idx)) { > + /* If previous one skips calltime, update it. */ > + if (!skip_ts && !ret_stack->calltime) > + ret_stack->calltime = trace_clock_local(); > return index + FGRAPH_RET_INDEX; > + } > > val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX; > > @@ -517,7 +522,10 @@ ftrace_push_return_trace(unsigned long ret, unsigned > long func, > return -EBUSY; > } > > - calltime = trace_clock_local(); > + if (skip_ts) would it be ok to add likely() here to keep the least-overhead code path linear? > + calltime = 0LL; > + else > + calltime = trace_clock_local(); > > index = READ_ONCE(current->curr_ret_stack); > ret_stack = RET_STACK(current, index); > @@ -601,7 +609,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned > long func, > trace.func = func; > trace.depth = ++current->curr_ret_depth; > > - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0); > + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0, > +fgraph_skip_timestamp); > if (index < 0) > goto out; > > @@ -654,7 +663,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned > long func, > return -ENODEV; > > /* Use start for the distance to ret_stack (skipping over reserve) */ > - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, > gops->idx); > + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, > gops->idx, > +gops->skip_timestamp); > if (index < 0) > return index; > type = get_fgraph_type(current, index); > @@ -732,6 +742,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, > unsigned long *ret, > *ret = ret_stack->ret; > trace->func = ret_stack->func; > trace->calltime = ret_stack->calltime; > + trace->rettime = 0; > trace->overrun = atomic_read(¤t->trace_overrun); > trace->depth = current->curr_ret_depth; > /* > @@ -792,7 +803,6 @@ __ftrace_return_to_handler(struct f
Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph
On Mon, Apr 15, 2024 at 5:49 AM Masami Hiramatsu (Google) wrote: > > Hi, > > Here is the 9th version of the series to re-implement the fprobe on > function-graph tracer. The previous version is; > > https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/ > > This version is ported on the latest kernel (v6.9-rc3 + probes/for-next) > and fixed some bugs + performance optimization patch[36/36]. > - [12/36] Fix to clear fgraph_array entry in registration failure, also >return -ENOSPC when fgraph_array is full. > - [28/36] Add new store_fprobe_entry_data() for fprobe. > - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation. > - [36/36] Add new flag to skip timestamp recording. > > Overview > > This series does major 2 changes, enable multiple function-graphs on > the ftrace (e.g. allow function-graph on sub instances) and rewrite the > fprobe on this function-graph. > > The former changes had been sent from Steven Rostedt 4 years ago (*), > which allows users to set different setting function-graph tracer (and > other tracers based on function-graph) in each trace-instances at the > same time. > > (*) https://lore.kernel.org/all/20190525031633.811342...@goodmis.org/ > > The purpose of latter change are; > > 1) Remove dependency of the rethook from fprobe so that we can reduce >the return hook code and shadow stack. > > 2) Make 'ftrace_regs' the common trace interface for the function >boundary. > > 1) Currently we have 2(or 3) different function return hook codes, > the function-graph tracer and rethook (and legacy kretprobe). > But since this is redundant and needs double maintenance cost, > I would like to unify those. From the user's viewpoint, function- > graph tracer is very useful to grasp the execution path. For this > purpose, it is hard to use the rethook in the function-graph > tracer, but the opposite is possible. (Strictly speaking, kretprobe > can not use it because it requires 'pt_regs' for historical reasons.) > > 2) Now the fprobe provides the 'pt_regs' for its handler, but that is > wrong for the function entry and exit. Moreover, depending on the > architecture, there is no way to accurately reproduce 'pt_regs' > outside of interrupt or exception handlers. This means fprobe should > not use 'pt_regs' because it does not use such exceptions. > (Conversely, kprobe should use 'pt_regs' because it is an abstract > interface of the software breakpoint exception.) > > This series changes fprobe to use function-graph tracer for tracing > function entry and exit, instead of mixture of ftrace and rethook. > Unlike the rethook which is a per-task list of system-wide allocated > nodes, the function graph's ret_stack is a per-task shadow stack. > Thus it does not need to set 'nr_maxactive' (which is the number of > pre-allocated nodes). > Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'. > Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as > their register interface, this changes it to convert 'ftrace_regs' to > 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs', > so users must access only registers for function parameters or > return value. > > Design > -- > Instead of using ftrace's function entry hook directly, the new fprobe > is built on top of the function-graph's entry and return callbacks > with 'ftrace_regs'. > > Since the fprobe requires access to 'ftrace_regs', the architecture > must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and > CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph > entry callback with 'ftrace_regs', and also > CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to > return_to_handler. > > All fprobes share a single function-graph ops (means shares a common > ftrace filter) similar to the kprobe-on-ftrace. This needs another > layer to find corresponding fprobe in the common function-graph > callbacks, but has much better scalability, since the number of > registered function-graph ops is limited. > > In the entry callback, the fprobe runs its entry_handler and saves the > address of 'fprobe' on the function-graph's shadow stack as data. The > return callback decodes the data to get the 'fprobe' address, and runs > the exit_handler. > > The fprobe introduces two hash-tables, one is for entry callback which > searches fprobes related to the given function address passed by entry > callback. The other is for a return callback which checks if the given > 'fprobe' data structure pointer is still valid. Note that it is > possible to unregister fprobe before the return callback runs. Thus > the address validation must be done before using it in the return > callback. > > This series can be applied against the probes/for-next branch, which > is based on v6.9-rc3. > > This series can also be found below branch. > > https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-
Re: [PATCH 3/4] vhost: Improve vhost_get_avail_head()
Hi Gavin, kernel test robot noticed the following build warnings: [auto build test WARNING on mst-vhost/linux-next] [also build test WARNING on linus/master v6.9-rc5 next-20240424] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Gavin-Shan/vhost-Drop-variable-last_avail_idx-in-vhost_get_vq_desc/20240423-112803 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next patch link: https://lore.kernel.org/r/20240423032407.262329-4-gshan%40redhat.com patch subject: [PATCH 3/4] vhost: Improve vhost_get_avail_head() config: i386-randconfig-141-20240426 (https://download.01.org/0day-ci/archive/20240426/202404260448.g7f06v7m-...@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202404260448.g7f06v7m-...@intel.com/ smatch warnings: drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: unsigned 'head' is never less than zero. drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: error code type promoted to positive: 'head' vim +/head +2614 drivers/vhost/vhost.c 2581 2582 /* This looks in the virtqueue and for the first available buffer, and converts 2583 * it to an iovec for convenient access. Since descriptors consist of some 2584 * number of output then some number of input descriptors, it's actually two 2585 * iovecs, but we pack them into one and note how many of each there were. 2586 * 2587 * This function returns the descriptor number found, or vq->num (which is 2588 * never a valid descriptor number) if none was found. A negative code is 2589 * returned on error. */ 2590 int vhost_get_vq_desc(struct vhost_virtqueue *vq, 2591struct iovec iov[], unsigned int iov_size, 2592unsigned int *out_num, unsigned int *in_num, 2593struct vhost_log *log, unsigned int *log_num) 2594 { 2595 struct vring_desc desc; 2596 unsigned int i, head, found = 0; 2597 int ret, access; 2598 2599 if (vq->avail_idx == vq->last_avail_idx) { 2600 ret = vhost_get_avail_idx(vq); 2601 if (unlikely(ret)) 2602 return ret; 2603 2604 /* If there's nothing new since last we looked, return 2605 * invalid. 2606 */ 2607 if (vq->avail_idx == vq->last_avail_idx) 2608 return vq->num; 2609 } 2610 2611 /* Grab the next descriptor number they're advertising, and increment 2612 * the index we've seen. */ 2613 head = vhost_get_avail_head(vq); > 2614 if (unlikely(head < 0)) 2615 return head; 2616 2617 /* When we start there are none of either input nor output. */ 2618 *out_num = *in_num = 0; 2619 if (unlikely(log)) 2620 *log_num = 0; 2621 2622 i = head; 2623 do { 2624 unsigned iov_count = *in_num + *out_num; 2625 if (unlikely(i >= vq->num)) { 2626 vq_err(vq, "Desc index is %u > %u, head = %u", 2627 i, vq->num, head); 2628 return -EINVAL; 2629 } 2630 if (unlikely(++found > vq->num)) { 2631 vq_err(vq, "Loop detected: last one at %u " 2632 "vq size %u head %u\n", 2633 i, vq->num, head); 2634 return -EINVAL; 2635 } 2636 ret = vhost_get_desc(vq, &desc, i); 2637 if (unlikely(ret)) { 2638 vq_err(vq, "Failed to get descriptor: idx %d addr %p\n", 2639 i, vq->desc + i); 2640 return -EFAULT; 2641 } 2642 if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) { 2643 ret = get_indirect(vq, iov, iov_size, 2644 out_num, in_num, 2645 log, log_num, &desc); 2646 if (unlikely(ret < 0)) { 2647 if (ret != -EAGAIN) 2648 vq_err(vq, "Failure detected " 2649
Re: [PATCH v3] ipvs: Fix checksumming on GSO of SCTP packets
Hi, On Thu, Apr 25, 2024 at 06:28:40PM +0200, Ismael Luceno wrote: > It was observed in the wild that pairs of consecutive packets would leave > the IPVS with the same wrong checksum, and the issue only went away when > disabling GSO. > > IPVS needs to avoid computing the SCTP checksum when using GSO. This patch went already upstream. It was no clear to me that a v3 was needed. You will have to post a follow up.
Re: [PATCH v7 4/6] soc: qcom: qmi: add a way to remove running service
On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: Add qmi_del_server(), a pair to qmi_add_server(), a way to remove running server from the QMI socket. This is e.g. necessary for pd-mapper, which needs to readd a server each time the DSP is started or s/readd/read/ stopped. Tested-by: Neil Armstrong # on SM8550-QRD Signed-off-by: Dmitry Baryshkov --- +/** + * qmi_del_server() - register a service with the name service Update comment to describe removal of service instead of 'register'.
[GIT PULL] virtio: bugfix
The following changes since commit 0bbac3facb5d6cc0171c45c9873a2dc96bea9680: Linux 6.9-rc4 (2024-04-14 13:38:39 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 98a821546b3919a10a58faa12ebe5e9a55cd638e: vDPA: code clean for vhost_vdpa uapi (2024-04-22 17:07:13 -0400) virtio: bugfix enum renames for vdpa uapi - we better do this now before the names have been in any releases. Signed-off-by: Michael S. Tsirkin Zhu Lingshan (1): vDPA: code clean for vhost_vdpa uapi drivers/vdpa/vdpa.c | 6 +++--- include/uapi/linux/vdpa.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix vectors
On Tue, Apr 23, 2024 at 08:42:57AM +, Angus Chen wrote: > Hi mst. > > > -Original Message- > > From: Michael S. Tsirkin > > Sent: Tuesday, April 23, 2024 4:35 PM > > To: Gavin Liu > > Cc: jasow...@redhat.com; Angus Chen ; > > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; > > linux-kernel@vger.kernel.org; Heng Qi > > Subject: Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix vectors > > > > On Tue, Apr 23, 2024 at 01:39:17AM +, Gavin Liu wrote: > > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote: > > > > From: Yuxue Liu > > > > > > > > When there is a ctlq and it doesn't require interrupt callbacks,the > > > > original method of calculating vectors wastes hardware msi or msix > > > > resources as well as system IRQ resources. > > > > > > > > When conducting performance testing using testpmd in the guest os, it > > > > was found that the performance was lower compared to directly using > > > > vfio-pci to passthrough the device > > > > > > > > In scenarios where the virtio device in the guest os does not utilize > > > > interrupts, the vdpa driver still configures the hardware's msix > > > > vector. Therefore, the hardware still sends interrupts to the host os. > > > > > > >I just have a question on this part. How come hardware sends interrupts > > > >does > > not guest driver disable them? > > > > > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU sets > > the call fd to -1 > > >2:On the host side, the vhost_vdpa program will set > > vp_vdpa->vring[i].cb.callback to invalid > > >3:Before the modification, the vp_vdpa_request_irq function does not > > check whether > > > vp_vdpa->vring[i].cb.callback is valid. Instead, it enables the > > hardware's MSIX > > > interrupts based on the number of queues of the device > > > > > > > So MSIX is enabled but why would it trigger? virtio PMD in poll mode > > presumably suppresses interrupts after all. > Virtio pmd is in the guest,but in host side,the msix is enabled,then the > device will triger > Interrupt normally. I analysed this bug before,and I think gavin is right. > Did I make it clear? Not really. Guest disables interrupts presumably (it's polling) why does device still send them? > > > > > > > > > > > - Original Message - > > > From: Michael S. Tsirkin m...@redhat.com > > > Sent: April 22, 2024 20:09 > > > To: Gavin Liu gavin@jaguarmicro.com > > > Cc: jasow...@redhat.com; Angus Chen angus.c...@jaguarmicro.com; > > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; > > linux-kernel@vger.kernel.org; Heng Qi hen...@linux.alibaba.com > > > Subject: Re: [PATCH v5] vp_vdpa: don't allocate unused msix vectors > > > > > > > > > > > > External Mail: This email originated from OUTSIDE of the organization! > > > Do not click links, open attachments or provide ANY information unless you > > recognize the sender and know the content is safe. > > > > > > > > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote: > > > > From: Yuxue Liu > > > > > > > > When there is a ctlq and it doesn't require interrupt callbacks,the > > > > original method of calculating vectors wastes hardware msi or msix > > > > resources as well as system IRQ resources. > > > > > > > > When conducting performance testing using testpmd in the guest os, it > > > > was found that the performance was lower compared to directly using > > > > vfio-pci to passthrough the device > > > > > > > > In scenarios where the virtio device in the guest os does not utilize > > > > interrupts, the vdpa driver still configures the hardware's msix > > > > vector. Therefore, the hardware still sends interrupts to the host os. > > > > > > I just have a question on this part. How come hardware sends interrupts > > > does > > not guest driver disable them? > > > > > > > Because of this unnecessary > > > > action by the hardware, hardware performance decreases, and it also > > > > affects the performance of the host os. > > > > > > > > Before modification:(interrupt mode) > > > > 32: 0 0 0 0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0 > > > > 33: 0 0 0 0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1 > > > > 34: 0 0 0 0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2 > > > > 35: 0 0 0 0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config > > > > > > > > After modification:(interrupt mode) > > > > 32: 0 0 1 7 PCI-MSI 32768-edge vp-vdpa[:00:02.0]-0 > > > > 33: 36 0 3 0 PCI-MSI 32769-edge vp-vdpa[:00:02.0]-1 > > > > 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[:00:02.0]-config > > > > > > > > Before modification:(virtio pmd mode for guest os) > > > > 32: 0 0 0 0 PCI-MSI 32768-edgevp-vdpa[:00:02.0]-0 > > > > 33: 0 0 0 0 PCI-MSI 32769-edgevp-vdpa[:00:02.0]-1 > > > > 34: 0 0 0 0 PCI-MSI 32770-edgevp-vdpa[:00:02.0]-2 > > > > 35: 0 0 0 0 PCI-MSI 32771-edgevp-vdpa[:00:02.0]-config > > > > > > > > Af
Re: [GIT PULL] virtio: bugfix
The pull request you sent on Thu, 25 Apr 2024 18:01:06 -0400: > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c942a0cd3603e34dd2d7237e064d9318cb7f9654 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH 3/4] vhost: Improve vhost_get_avail_head()
On 4/26/24 06:42, kernel test robot wrote:> kernel test robot noticed the following build warnings: [auto build test WARNING on mst-vhost/linux-next] [also build test WARNING on linus/master v6.9-rc5 next-20240424] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Gavin-Shan/vhost-Drop-variable-last_avail_idx-in-vhost_get_vq_desc/20240423-112803 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next patch link: https://lore.kernel.org/r/20240423032407.262329-4-gshan%40redhat.com patch subject: [PATCH 3/4] vhost: Improve vhost_get_avail_head() config: i386-randconfig-141-20240426 (https://download.01.org/0day-ci/archive/20240426/202404260448.g7f06v7m-...@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202404260448.g7f06v7m-...@intel.com/ smatch warnings: drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: unsigned 'head' is never less than zero. drivers/vhost/vhost.c:2614 vhost_get_vq_desc() warn: error code type promoted to positive: 'head' vim +/head +2614 drivers/vhost/vhost.c 2581 2582 /* This looks in the virtqueue and for the first available buffer, and converts 2583 * it to an iovec for convenient access. Since descriptors consist of some 2584 * number of output then some number of input descriptors, it's actually two 2585 * iovecs, but we pack them into one and note how many of each there were. 2586 * 2587 * This function returns the descriptor number found, or vq->num (which is 2588 * never a valid descriptor number) if none was found. A negative code is 2589 * returned on error. */ 2590 int vhost_get_vq_desc(struct vhost_virtqueue *vq, 2591 struct iovec iov[], unsigned int iov_size, 2592 unsigned int *out_num, unsigned int *in_num, 2593 struct vhost_log *log, unsigned int *log_num) 2594 { 2595 struct vring_desc desc; 2596 unsigned int i, head, found = 0; 2597 int ret, access; 2598 2599 if (vq->avail_idx == vq->last_avail_idx) { 2600 ret = vhost_get_avail_idx(vq); 2601 if (unlikely(ret)) 2602 return ret; 2603 2604 /* If there's nothing new since last we looked, return 2605 * invalid. 2606 */ 2607 if (vq->avail_idx == vq->last_avail_idx) 2608 return vq->num; 2609 } 2610 2611 /* Grab the next descriptor number they're advertising, and increment 2612 * the index we've seen. */ 2613 head = vhost_get_avail_head(vq); 2614 if (unlikely(head < 0)) 2615 return head; Thanks for the report. @head needs to be 'int' instead of 'unsigned int' so that it can hold the error number from vhost_get_avail_head(). I would give it more time to see if there are other review comments before I revise it to fix it up. Thanks, Gavin
Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
On 25 Apr 21:56, David Hildenbrand wrote: > > On 25.04.24 17:19, Guillaume Morin wrote: > > On 24 Apr 23:00, David Hildenbrand wrote: > > > > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for > > > > hugetlb mappings. However this was also on my TODO and I have a draft > > > > patch that implements it. > > > > > > Yes, I documented it back then and added sanity checks in GUP code to > > > fence > > > it off. Shouldn't be too hard to implement (famous last words) and would > > > be > > > the cleaner thing to use here once I manage to switch over to > > > FOLL_WRITE|FOLL_FORCE to break COW. > > > > Yes, my patch seems to be working. The hugetlb code is pretty simple. > > And it allows ptrace and the proc pid mem file to work on the executable > > private hugetlb mappings. > > > > There is one thing I am unclear about though. hugetlb enforces that > > huge_pte_write() is true on FOLL_WRITE in both the fault and > > follow_page_mask paths. I am not sure if we can simply assume in the > > hugetlb code that if the pte is not writable and this is a write fault > > then we're in the FOLL_FORCE|FOLL_WRITE case. Or do we want to keep the > > checks simply not enforce it for FOLL_FORCE|FOLL_WRITE? > > > > The latter is more complicated in the fault path because there is no > > FAULT_FLAG_FORCE flag. > > > > I just pushed something to > https://github.com/davidhildenbrand/linux/tree/uprobes_cow > > Only very lightly tested so far. Expect the worst :) I'll try it out and send you the hugetlb bits > > I still detest having the zapping logic there, but to get it all right I > don't see a clean way around that. > > > For hugetlb, we'd primarily have to implement the > mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE). For FOLL_FORCE, heer is my draft. Let me know if this is what you had in mind. diff --git a/mm/gup.c b/mm/gup.c index 1611e73b1121..ac60e0ae64e8 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; - /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ - if (is_vm_hugetlb_page(vma)) - return -EFAULT; /* * We used to let the write,force case do COW in a * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3548eae42cf9..73f86eddf888 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl, struct vm_fault *vmf) { - const bool unshare = flags & FAULT_FLAG_UNSHARE; + const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) && + (vma->vm_flags & VM_WRITE); pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct folio *old_folio; @@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * can trigger this, because hugetlb_fault() will always resolve * uffd-wp bit first. */ - if (!unshare && huge_pte_uffd_wp(pte)) + if (make_writable && huge_pte_uffd_wp(pte)) return 0; - /* -* hugetlb does not support FOLL_FORCE-style write faults that keep the -* PTE mapped R/O such as maybe_mkwrite() would do. -*/ - if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE))) - return VM_FAULT_SIGSEGV; - /* Let's take out MAP_SHARED mappings first. */ if (vma->vm_flags & VM_MAYSHARE) { set_huge_ptep_writable(vma, haddr, ptep); @@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, folio_move_anon_rmap(old_folio, vma); SetPageAnonExclusive(&old_folio->page); } - if (likely(!unshare)) + if (likely(make_writable)) set_huge_ptep_writable(vma, haddr, ptep); delayacct_wpcopy_end(); @@ -6094,7 +6088,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, spin_lock(ptl); ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); + pte_t newpte = make_huge_pte(vma, &new_folio->page, +make_writable); /* Break COW or unshare */ huge_ptep_clear_flush(vma,
Re: [PATCH v5] vp_vdpa: don't allocate unused msix vectors
Hi mst. > > > >I just have a question on this part. How come hardware sends > > > >interrupts does > > not guest driver disable them? > > > > > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU > > > sets > > the call fd to -1 > > >2:On the host side, the vhost_vdpa program will set > > vp_vdpa->vring[i].cb.callback to invalid > > >3:Before the modification, the vp_vdpa_request_irq function > > > does not > > check whether > > > vp_vdpa->vring[i].cb.callback is valid. Instead, it enables > > > the > > hardware's MSIX > > > interrupts based on the number of queues of the device > > > > > > > So MSIX is enabled but why would it trigger? virtio PMD in poll mode > > presumably suppresses interrupts after all. >> Virtio pmd is in the guest,but in host side,the msix is enabled,then > >the device will triger Interrupt normally. I analysed this bug before,and I > >think gavin is right. > >Did I make it clear? >Not really. Guest disables interrupts presumably (it's polling) why does >device still send them? The testing model is as follows: testpmd testpmd--- ^^ || || || vv vfio pci--- ---vfio pci--- pci device -----pci device-- guest os - guest os -- ---virtio device-----vfio device-- --qemu-----qemu--- ^ ^ | | | | | | v v vhost_vdpa vfio pci host os-- --hw- VF1 VF2 1:The guest os uses PMD mode, so the guest os won't receive interrupts. We can reach a consensus on this point. 2:Note that the MSIX interrupts mentioned here refer to the interrupts received by PCI devices on the host from the hardware. 3:In the guest OS, Virtio devices use PMD mode. The host does not need to enable the MSIX capability of the PCI device, nor does it need to register interrupt callbacks. Do you agree with this? 4: Note that the patch is proposed to ensure that PCI devices on the host are not disturbed by interrupts. - Forwarded Message - From: Michael S. Tsirkin m...@redhat.com Sent: April 26, 2024 6:10 AM To: Angus Chen angus.c...@jaguarmicro.com Cc: Gavin Liu gavin@jaguarmicro.com; jasow...@redhat.com; virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; linux-kernel@vger.kernel.org; Heng Qi hen...@linux.alibaba.com Subject: Re: Reply: [PATCH v5] vp_vdpa: don't allocate unused msix vectors External Mail: This email originated from OUTSIDE of the organization! Do not click links, open attachments or provide ANY information unless you recognize the sender and know the content is safe. On Tue, Apr 23, 2024 at 08:42:57AM +, Angus Chen wrote: > Hi mst. > > > -Original Message- > > From: Michael S. Tsirkin > > Sent: Tuesday, April 23, 2024 4:35 PM > > To: Gavin Liu > > Cc: jasow...@redhat.com; Angus Chen ; > > virtualizat...@lists.linux.dev; xuanz...@linux.alibaba.com; > > linux-kernel@vger.kernel.org; Heng Qi > > Subject: Re: 回复: [PATCH v5] vp_vdpa: don't allocate unused msix > > vectors > > > > On Tue, Apr 23, 2024 at 01:39:17AM +, Gavin Liu wrote: > > > On Wed, Apr 10, 2024 at 11:30:20AM +0800, lyx634449800 wrote: > > > > From: Yuxue Liu > > > > > > > > When there is a ctlq and it doesn't require interrupt > > > > callbacks,the original method of calculating vectors wastes > > > > hardware msi or msix resources as well as system IRQ resources. > > > > > > > > When conducting performance testing using testpmd in the guest > > > > os, it was found that the performance was lower compared to > > > > directly using vfio-pci to passthrough the device > > > > > > > > In scenarios where the virtio device in the guest os does not > > > > utilize interrupts, the vdpa driver still configures the > > > > hardware's msix vector. Therefore, the hardware still sends interrupts > > > > to the host os. > > > > > > >I just have a question on this part. How come hardware sends > > > >interrupts does > > not guest driver disable them? > > > > > >1:Assuming the guest OS's Virtio device is using PMD mode, QEMU > > > sets > > the call fd to -1 > > >2:On the host side, the vhost_vdpa program will set > > vp_vdpa->vring[i].cb.callback to invalid > > >3:Before the modification, the vp_vdpa_request_irq func