Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
On 11/05/2024 23:56, Dmitry Baryshkov wrote: Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" Cc: Steev Klimaszewski Cc: Alexey Minnekhanov -- Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 17 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1190 insertions(+), 300 deletions(-) --- base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, Tested-by: Neil Armstrong # on SM8550-QRD Tested-by: Neil Armstrong # on SM8550-HDK Tested-by: Neil Armstrong # on SM8650-QRD Thanks, Neil
Re: [PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot
On Wed, 05 Jun 2024 16:26:44 -0400 Steven Rostedt wrote: > > Recieved some minor bug reports from the kernel test robot. First I started > cleaning up some of the sparse warnings. There's many more, but most changes > are not really helping anything, but just quieting the warnings. > > But the reports from kernel test robot need to be fixed. All looks good to me. Acked-by: Masami Hiramatsu (Google) Thank you! > > Steven Rostedt (Google) (6): > ftrace: Declare function_trace_op in header to quiet sparse warning > ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU > ftrace: Assign RCU list variable with rcu_assign_ptr() > ftrace: Fix prototypes for ftrace_startup/shutdown_subops() > function_graph: Make fgraph_do_direct static key static > function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not > enabled > > > include/linux/ftrace.h | 3 +++ > kernel/trace/fgraph.c | 4 +++- > kernel/trace/ftrace.c | 4 ++-- > kernel/trace/ftrace_internal.h | 9 + > kernel/trace/trace.h | 1 - > 5 files changed, 17 insertions(+), 4 deletions(-) -- Masami Hiramatsu (Google)
[PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot
Recieved some minor bug reports from the kernel test robot. First I started cleaning up some of the sparse warnings. There's many more, but most changes are not really helping anything, but just quieting the warnings. But the reports from kernel test robot need to be fixed. Steven Rostedt (Google) (6): ftrace: Declare function_trace_op in header to quiet sparse warning ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU ftrace: Assign RCU list variable with rcu_assign_ptr() ftrace: Fix prototypes for ftrace_startup/shutdown_subops() function_graph: Make fgraph_do_direct static key static function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled include/linux/ftrace.h | 3 +++ kernel/trace/fgraph.c | 4 +++- kernel/trace/ftrace.c | 4 ++-- kernel/trace/ftrace_internal.h | 9 + kernel/trace/trace.h | 1 - 5 files changed, 17 insertions(+), 4 deletions(-)
[PATCH v4 07/11] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.43.0
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
I've tested this applied on top of kernel 6.8.11 on an X13s over the past week and it's been working well. -- classabbyamp
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Thu, 30 May 2024 16:02:37 +0300 Ilkka Naulapää wrote: > applied your patch and here's the output. > Unfortunately, it doesn't give me any new information. I added one more BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index de5b72216b1a..a090495e78c9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) return NULL; ti->flags = 0; + ti->magic = 20240823; return >vfs_inode; } static void tracefs_free_inode(struct inode *inode) { - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); + struct tracefs_inode *ti = get_tracefs(inode); + + BUG_ON(ti->magic != 20240823); + kmem_cache_free(tracefs_inode_cachep, ti); } static ssize_t default_read_file(struct file *file, char __user *buf, @@ -147,16 +151,6 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) return; ti = get_tracefs(inode); + BUG_ON(ti->magic != 20240823); if (ti && ti->flags & TRACEFS_EVENT_INODE) eventfs_set_ef_status_free(dentry); iput(inode); @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry) return dentry; } +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + + BUG_ON(sb->s_op != _super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); + } + return inode; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 69c2b1d87c46..9059b8b11bb6 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -9,12 +9,15 @@ enum { struct tracefs_inode { unsigned long flags; void*private; + unsigned long magic; struct inodevfs_inode; }; static inline struct tracefs_inode *get_tracefs(const struct inode *inode) { - return container_of(inode, struct tracefs_inode, vfs_inode); + struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, vfs_inode); + BUG_ON(ti->magic != 20240823); + return ti; } struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Wed, 29 May 2024 14:47:57 -0400 Steven Rostedt wrote: > Let me make a debug patch (that crashes on this issue) for that kernel, > and perhaps you could bisect it? Can you try this on 6.6-rc1 and see if it gives you any other splats? Hmm, you can switch it to WARN_ON and that way it may not crash the machine, and you can use dmesg to get the output. Thanks, -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index de5b72216b1a..a090495e78c9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) return NULL; ti->flags = 0; + ti->magic = 20240823; return >vfs_inode; } static void tracefs_free_inode(struct inode *inode) { - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); + struct tracefs_inode *ti = get_tracefs(inode); + + BUG_ON(ti->magic != 20240823); + kmem_cache_free(tracefs_inode_cachep, ti); } static ssize_t default_read_file(struct file *file, char __user *buf, @@ -147,16 +151,6 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) return; ti = get_tracefs(inode); + BUG_ON(ti->magic != 20240823); if (ti && ti->flags & TRACEFS_EVENT_INODE) eventfs_set_ef_status_free(dentry); iput(inode); @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry) return dentry; } +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + + BUG_ON(sb->s_op != _super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); + } + return inode; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 69c2b1d87c46..9f6f303a9e58 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -9,6 +9,7 @@ enum { struct tracefs_inode { unsigned long flags; void*private; + unsigned long magic; struct inodevfs_inode; };
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Wed, 29 May 2024 21:36:08 +0300 Ilkka Naulapää wrote: > applied your patch without others, so trace and panic there. > Screenshot attached. Also tested kernels backward and found out that Bah, it's still in an RCU callback, which doesn't tell us why a normal inode is being sent to the trace inode free list. > this trace bug first triggered on 6.6-rc1. Hmm, that's when eventfs was added. > > Let me know if you need more assistance with this. Let me make a debug patch (that crashes on this issue) for that kernel, and perhaps you could bisect it? Thanks! -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Tue, 28 May 2024 07:51:30 +0300 Ilkka Naulapää wrote: > yeah, the cache_from_obj tracing bug (without panic) has been > displayed quite some time now - maybe even since 6.7.x or so. I could > try checking a few versions back for this and try bisecting it if I > can find when this started. > OK, so I don't think the commit your last bisect hit is the cause of the bug. It added a delay (via RCU) and is causing the real bug to blow up more. Can you add this patch to v6.9.2 and hopefully it crashes in a better location that we can find where the mixup happened. You may need to add the other commit (too if this doesn't trigger. Thanks, -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 417c840e6403..7af3f696696d 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -50,6 +50,7 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) list_add_rcu(>list, _inodes); spin_unlock_irqrestore(_inode_lock, flags); + ti->magic = 20240823; return >vfs_inode; } @@ -66,6 +67,7 @@ static void tracefs_free_inode(struct inode *inode) struct tracefs_inode *ti = get_tracefs(inode); unsigned long flags; + BUG_ON(ti->magic != 20240823); spin_lock_irqsave(_inode_lock, flags); list_del_rcu(>list); spin_unlock_irqrestore(_inode_lock, flags); @@ -271,16 +273,6 @@ static const struct inode_operations tracefs_file_inode_operations = { .setattr= tracefs_setattr, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - simple_inode_init_ts(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -448,6 +440,17 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, }; +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + BUG_ON(sb->s_op != _super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + simple_inode_init_ts(inode); + } + return inode; +} + /* * It would be cleaner if eventfs had its own dentry ops. * diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index f704d8348357..dda7d2708e30 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -16,6 +16,7 @@ struct tracefs_inode { }; /* The below gets initialized with memset_after(ti, 0, vfs_inode) */ struct list_headlist; + unsigned long magic; unsigned long flags; void*private; };
Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
On 5/20/24 07:42, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li Acked-by: Daniel Bristot de Oliveira Thanks -- Daniel
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
yeah, the cache_from_obj tracing bug (without panic) has been displayed quite some time now - maybe even since 6.7.x or so. I could try checking a few versions back for this and try bisecting it if I can find when this started. --Ilkka On Tue, May 28, 2024 at 1:31 AM Steven Rostedt wrote: > > On Fri, 24 May 2024 12:50:08 +0200 > "Linux regression tracking (Thorsten Leemhuis)" > wrote: > > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > > quick display of a kernel trace dump before the shutdown/reboot > > > completed. Starting from version 6.8.10 and continuing into version > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > > preventing the shutdown or reboot from completing and leaving the > > > machine stuck. > > You state "Before kernel version 6.8.10, the bug caused ...". Does that > mean that a bug was happening before v6.8.10? But did not cause a panic? > > I just noticed your second screen shot from your report, and it has: > > "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from > inode_cache" > > So somehow an tracefs_inode was allocated from the inode_cache and is > now being freed by the tracefs_inode logic? Did this happen before > 6.8.10? If so, this code could just be triggering an issue from an > unrelated bug. > > Thanks, > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
I tried 6.10-rc1 and it still ends up to panic --Ilkka On Tue, May 28, 2024 at 12:44 AM Steven Rostedt wrote: > > On Mon, 27 May 2024 20:14:42 +0200 > Greg KH wrote: > > > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > > > Hi Steven, > > > > > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > > > panic inducing commit: > > > > > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > > > options) > > > > > > I reverted that commit to 6.9.2 and now it only serves the trace but > > > the panic is gone. But I can live with it. > > > > Steven, should we revert that? > > > > Or is there some other change that we should take to resolve this? > > > > Before we revert it (as it may be a bug in mainline), Ilkka, can you > test v6.10-rc1? If it exists there, it will let me know whether or not > I missed something. > > Thanks, > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > quick display of a kernel trace dump before the shutdown/reboot > > completed. Starting from version 6.8.10 and continuing into version > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > preventing the shutdown or reboot from completing and leaving the > > machine stuck. You state "Before kernel version 6.8.10, the bug caused ...". Does that mean that a bug was happening before v6.8.10? But did not cause a panic? I just noticed your second screen shot from your report, and it has: "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from inode_cache" So somehow an tracefs_inode was allocated from the inode_cache and is now being freed by the tracefs_inode logic? Did this happen before 6.8.10? If so, this code could just be triggering an issue from an unrelated bug. Thanks, -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Mon, 27 May 2024 20:14:42 +0200 Greg KH wrote: > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > > Hi Steven, > > > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > > panic inducing commit: > > > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > > options) > > > > I reverted that commit to 6.9.2 and now it only serves the trace but > > the panic is gone. But I can live with it. > > Steven, should we revert that? > > Or is there some other change that we should take to resolve this? > Before we revert it (as it may be a bug in mainline), Ilkka, can you test v6.10-rc1? If it exists there, it will let me know whether or not I missed something. Thanks, -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > Hi Steven, > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > panic inducing commit: > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > options) > > I reverted that commit to 6.9.2 and now it only serves the trace but > the panic is gone. But I can live with it. Steven, should we revert that? Or is there some other change that we should take to resolve this? thanks, greg k-h
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
Hi Steven, I took some time and bisected the 6.8.9 - 6.8.10 and git gave the panic inducing commit: 414fb08628143 (tracefs: Reset permissions on remount if permissions are options) I reverted that commit to 6.9.2 and now it only serves the trace but the panic is gone. But I can live with it. --Ilkka On Sun, May 26, 2024 at 8:42 PM Ilkka Naulapää wrote: > > hi, > > I took 6.9.2 and applied that 0bcfd9aa4dafa to it. Now the kernel is > serving me both problems; the trace and the panic as the pic shows. > > > To understand this, did you do anything with tracing? Before shutting down, > > is there anything in /sys/kernel/tracing/instances directory? > > Were any of the files/directories permissions in /sys/kernel/tracing > > changed? > > And to answer your question, I did not do any tracing or so and the > /sys/kernel/tracing is empty. > Just plain boot-up, no matter if in full desktop or in bare rescue > mode, ends up the same way. > > --Ilkka > > On Fri, May 24, 2024 at 8:19 PM Steven Rostedt wrote: > > > > On Fri, 24 May 2024 12:50:08 +0200 > > "Linux regression tracking (Thorsten Leemhuis)" > > wrote: > > > > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > > > quick display of a kernel trace dump before the shutdown/reboot > > > > completed. Starting from version 6.8.10 and continuing into version > > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > > > preventing the shutdown or reboot from completing and leaving the > > > > machine stuck. > > > > Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on > > remount if permissions are options"), which added a "iput" callback to the > > dentry without calling iput, leaving stale inodes around. > > > > This is fixed with: > > > > 0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()") > > > > Try adding just that patch. It will at least make it go back to what was > > happening before 6.8.10 (I hope!). > > > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > quick display of a kernel trace dump before the shutdown/reboot > > completed. Starting from version 6.8.10 and continuing into version > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > preventing the shutdown or reboot from completing and leaving the > > machine stuck. Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options"), which added a "iput" callback to the dentry without calling iput, leaving stale inodes around. This is fixed with: 0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()") Try adding just that patch. It will at least make it go back to what was happening before 6.8.10 (I hope!). -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > [CCing a few people] > Thanks for the Cc. > On 24.05.24 12:31, Ilkka Naulapää wrote: > > > > I have encountered a critical bug in the Linux vanilla kernel that > > leads to a kernel panic during the shutdown or reboot process. The > > issue arises after all services, including `journald`, have been > > stopped. As a result, the machine fails to complete the shutdown or > > reboot procedure, effectively causing the system to hang and not shut > > down or reboot. To understand this, did you do anything with tracing? Before shutting down, is there anything in /sys/kernel/tracing/instances directory? Were any of the files/directories permissions in /sys/kernel/tracing changed? > > Thx for the report. Not my area of expertise, so take this with a gain > of salt. But given the versions your mention in your report and the > screenshot that mentioned tracefs_free_inode I suspect this is caused by > baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are > options"). A few fixes for it will soon hit mainline and are meant to be > backported to affected stable trees: > > https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/ > https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/ > > You might want to try them – or recheck once they hit the stable trees > you are about. If they don't work, please report back. There's been quite a bit of updates in this code, but this looks new to me. I have more fixes that were just pulled by Linus today. https://git.kernel.org/torvalds/c/0eb03c7e8e2a4cc3653eb5eeb2d2001182071215 I'm not sure how relevant that is for this. But if you can reproduce it with that commit, then this is a new bug. -- Steve
Re: How to properly fix reading user pointers in bpf in android kernel 4.9?
[also Cc: bpf maintainers and get_maintainer output] On Thu, May 23, 2024 at 07:52:22PM +0300, Marcel wrote: > This seems that it was a long standing problem with the Linux kernel in > general. bpf_probe_read should have worked for both kernel and user pointers > but it fails with access error when reading an user one instead. > > I know there's a patch upstream that fixes this by introducing new helpers > for reading kernel and userspace pointers and I tried to back port them back > to my kernel but with no success. Tools like bcc fail to use them and instead > they report that the arguments sent to the helpers are invalid. I assume this > is due to the arguments ARG_CONST_STACK_SIZE and ARG_PTR_TO_RAW_STACK handle > data different in the 4.9 android version and the upstream version but I'm > not sure that this is the cause. I left the patch I did below and with a link > to the kernel I'm working on and maybe someone can take a look and give me an > hand (the patch isn't applied yet) What upstream patch? Has it already been in mainline? > > <https://github.com/nitanmarcel/android_kernel_oneplus_sdm845-bpf> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 744b4763b80e..de94c13b7193 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -559,6 +559,43 @@ enum bpf_func_id { > */ > BPF_FUNC_probe_read_user, > > + /** > + * int bpf_probe_read_kernel(void *dst, int size, void *src) > + * Read a kernel pointer safely. > + * Return: 0 on success or negative error > + */ > + BPF_FUNC_probe_read_kernel, > + > + /** > + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) > + * Copy a NUL terminated string from user unsafe address. In case > the string > + * length is smaller than size, the target is not padded with > further NUL > + * bytes. In case the string length is larger than size, just > count-1 > + * bytes are copied and the last byte is set to NUL. > + * @dst: destination address > + * @size: maximum number of bytes to copy, including the trailing > NUL > + * @unsafe_ptr: unsafe address > + * Return: > + * > 0 length of the string including the trailing NUL on success > + * < 0 error > + */ > + BPF_FUNC_probe_read_user_str, > + > + /** > + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) > + * Copy a NUL terminated string from unsafe address. In case the > string > + * length is smaller than size, the target is not padded with > further NUL > + * bytes. In case the string length is larger than size, just > count-1 > + * bytes are copied and the last byte is set to NUL. > + * @dst: destination address > + * @size: maximum number of bytes to copy, including the trailing > NUL > + * @unsafe_ptr: unsafe address > + * Return: > + * > 0 length of the string including the trailing NUL on success > + * < 0 error > + */ > + BPF_FUNC_probe_read_kernel_str, > + > __BPF_FUNC_MAX_ID, > }; > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index a1e37a5d8c88..3478ca744a45 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = { > .arg3_type = ARG_ANYTHING, > }; > > -BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, > unsafe_ptr) > +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void __user > *, unsafe_ptr) > { > int ret; > > @@ -115,6 +115,27 @@ static const struct bpf_func_proto > bpf_probe_read_user_proto = { > }; > > > +BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, > unsafe_ptr) > +{ > + int ret; > + > + ret = probe_kernel_read(dst, unsafe_ptr, size); > + if (unlikely(ret < 0)) > + memset(dst, 0, size); > + > + return ret; > +} > + > +static const struct bpf_func_proto bpf_probe_read_kernel_proto = { > + .func = bpf_probe_read_kernel, > + .gpl_only = true, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_PTR_TO_RAW_STACK, > + .arg2_type = ARG_CONST_STACK_SIZE, > + .arg3_type = ARG_ANYTHING, > +}; > + > + > BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, > u32, size) > { > @@ -487,6 +508,69 @@ static const struct bpf_func_proto >
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
[CCing a few people] On 24.05.24 12:31, Ilkka Naulapää wrote: > > I have encountered a critical bug in the Linux vanilla kernel that > leads to a kernel panic during the shutdown or reboot process. The > issue arises after all services, including `journald`, have been > stopped. As a result, the machine fails to complete the shutdown or > reboot procedure, effectively causing the system to hang and not shut > down or reboot. Thx for the report. Not my area of expertise, so take this with a gain of salt. But given the versions your mention in your report and the screenshot that mentioned tracefs_free_inode I suspect this is caused by baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options"). A few fixes for it will soon hit mainline and are meant to be backported to affected stable trees: https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/ https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/ You might want to try them – or recheck once they hit the stable trees you are about. If they don't work, please report back. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. > Here are the details of the issue: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > quick display of a kernel trace dump before the shutdown/reboot > completed. Starting from version 6.8.10 and continuing into version > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > preventing the shutdown or reboot from completing and leaving the > machine stuck. > > - Symptoms: > - In normal shutdown/reboot scenarios, the kernel trace dump briefly > appears as the last message on the screen. > - In rescue mode, the kernel panic message is displayed. Normally it > is not shown. > > Since `journald` is stopped before this issue occurs, no textual logs > are available. However, I have captured two pictures illustrating > these related issues, which I am attaching to this email for your > reference. Also added my custom kernel config. > > Thank you for your attention to this matter. Please let me know if any > additional information is required to assist in diagnosing and > resolving this bug. > > Best regards, > > Ilkka Naulapää
[PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.40.1
[PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Signed-off-by: Yang Li --- kernel/trace/rv/rv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c index 2f68e93fff0b..df0745a42a3f 100644 --- a/kernel/trace/rv/rv.c +++ b/kernel/trace/rv/rv.c @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def *mdef, bool sync) /** * rv_disable_monitor - disable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success. */ @@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef) /** * rv_enable_monitor - enable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success, error otherwise. */ -- 2.20.1.7.g153144c
Re: [PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()
On Sat, 18 May 2024 15:54:49 -0700 Jeff Johnson wrote: > Fix the 'make W=1' warning: > > WARNING: modpost: missing MODULE_DESCRIPTION() in > kernel/trace/preemptirq_delay_test.o > Looks good to me. Acked-by: Masami Hiramatsu (Google) Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Thanks, > Signed-off-by: Jeff Johnson > --- > kernel/trace/preemptirq_delay_test.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/preemptirq_delay_test.c > b/kernel/trace/preemptirq_delay_test.c > index 8c4ffd076162..cb0871fbdb07 100644 > --- a/kernel/trace/preemptirq_delay_test.c > +++ b/kernel/trace/preemptirq_delay_test.c > @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void) > > module_init(preemptirq_delay_init) > module_exit(preemptirq_delay_exit) > +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency > tracers"); > MODULE_LICENSE("GPL v2"); > > --- > base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532 > change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b > -- Masami Hiramatsu (Google)
[PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Signed-off-by: Jeff Johnson --- kernel/trace/preemptirq_delay_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 8c4ffd076162..cb0871fbdb07 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void) module_init(preemptirq_delay_init) module_exit(preemptirq_delay_exit) +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency tracers"); MODULE_LICENSE("GPL v2"); --- base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532 change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b
Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
Hi Yang On 5/17/24 11:14, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li > --- > kernel/trace/rv/rv.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c > index 2f68e93fff0b..df0745a42a3f 100644 > --- a/kernel/trace/rv/rv.c > +++ b/kernel/trace/rv/rv.c > @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def > *mdef, bool sync) > > /** > * rv_disable_monitor - disable a given runtime monitor > + * @mdef: Pointer to the monitor definition structure. This change is in for mainline kernel, why are you using the -next on the Subject? -- Daniel
[PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Signed-off-by: Yang Li --- kernel/trace/rv/rv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c index 2f68e93fff0b..df0745a42a3f 100644 --- a/kernel/trace/rv/rv.c +++ b/kernel/trace/rv/rv.c @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def *mdef, bool sync) /** * rv_disable_monitor - disable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success. */ @@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef) /** * rv_enable_monitor - enable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success, error otherwise. */ -- 2.20.1.7.g153144c
[PATCH v3 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock --- Changes since v2: - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo - Use $(real-prereqs) rather than $(filter-out ...) --- scripts/Makefile.vmlinux | 16 1 file changed, 16 insertions(+) diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index c9f3e03124d7f..afe8287e8dda0 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -36,6 +36,22 @@ targets += vmlinux vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE +$(call if_changed_dep,link_vmlinux) +# module.builtin.ranges +# --- +ifdef CONFIG_BUILTIN_MODULE_RANGES +__default: modules.builtin.ranges + +quiet_cmd_modules_builtin_ranges = GEN $@ + cmd_modules_builtin_ranges = \ + $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@ + +vmlinux.map: vmlinux + +targets += modules.builtin.ranges +modules.builtin.ranges: modules.builtin.modinfo vmlinux.map vmlinux.o.map FORCE + $(call if_changed,modules_builtin_ranges) +endif + # Add FORCE to the prequisites of a target to force it to be always rebuilt. # --- -- 2.43.0
Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
On Tue, May 14, 2024 at 04:04:43PM +0200, Björn Töpel wrote: > From: Björn Töpel > > During memory hot remove, the ptdump functionality can end up touching > stale data. Avoid any potential crashes (or worse), by holding the > memory hotplug read-lock while traversing the page table. > > This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: > Hold memory hotplug lock while walking for kernel page table dump"). > > Signed-off-by: Björn Töpel Reviewed-by: Oscar Salvador funny enough, it seems arm64 and riscv are the only ones holding the hotplug lock here. I think we have the same problem on the other arches as well (at least on x86_64 that I can see). If we happen to finally need the lock in those, I would rather have a centric function in the generic mm code with the locking and then calling an arch specific ptdump_show function, so the lock is not scattered. But that is another story. -- Oscar Salvador SUSE Labs
Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
On 14.05.24 16:04, Björn Töpel wrote: From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb
[PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.40.1
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
On Mon, Apr 08, 2024 at 04:58:18PM GMT, Michal Koutný wrote: > The kernel provides mechanisms, while it should not imply policies -- > default pid_max seems to be an example of the policy that does not fit > all. At the same time pid_max must have some value assigned, so use the > end of the allowed range -- pid_max_max. > > This change thus increases initial pid_max from 32k to 4M (x86_64 > defconfig). Out of curiosity I dug out the commit acdc721fe26d ("[PATCH] pid-max-2.5.33-A0") v2.5.34~5 that introduced the 32k default. The commit message doesn't say why such a sudden change though. Previously, the limit was 1G of pids (i.e. effectively no default limit like the intention of this series). Honestly, I expected more enthusiasm or reasons against removing the default value of pid_max. Is this really not of interest to anyone? (Thanks, Andrew, for your responses. I don't plan to pursue this further should there be no more interest in having less default limit values in kernel.) Regards, Michal signature.asc Description: PGP signature
Re: [PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
On Sun, May 12, 2024 at 7:44 AM Kris Van Hees wrote: > > Signed-off-by: Kris Van Hees > Reviewed-by: Nick Alcock > --- > Changes since v1: > - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES > --- > scripts/Makefile.vmlinux | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux > index c9f3e03124d7f..54095d72f7fd7 100644 > --- a/scripts/Makefile.vmlinux > +++ b/scripts/Makefile.vmlinux > @@ -36,6 +36,23 @@ targets += vmlinux > vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE > +$(call if_changed_dep,link_vmlinux) > > +# module.builtin.ranges > +# --- > +ifdef CONFIG_BUILTIN_MODULE_RANGES > +__default: modules.builtin.ranges > + > +quiet_cmd_modules_builtin_ranges = GEN $@ > + cmd_modules_builtin_ranges = \ > + $(srctree)/scripts/generate_builtin_ranges.awk \ > + $(filter-out FORCE,$+) > $@ $(filter-out FORCE,$+) -> $(real-prereqs) > + > +vmlinux.map: vmlinux > + > +targets += modules.builtin.ranges > +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE > + $(call if_changed,modules_builtin_ranges) > +endif > + > # Add FORCE to the prequisites of a target to force it to be always rebuilt. > # --- > > -- > 2.43.0 > > -- Best Regards Masahiro Yamada
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
On Sat, May 11, 2024 at 4:56 PM Dmitry Baryshkov wrote: > > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > However this configuration is largely static and common between > different platforms. Provide in-kernel service implementing static > per-platform data. > > To: Bjorn Andersson > To: Konrad Dybcio > To: Sibi Sankar > To: Mathieu Poirier > Cc: linux-arm-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-remotep...@vger.kernel.org > Cc: Johan Hovold > Cc: Xilin Wu > Cc: "Bryan O'Donoghue" > Cc: Steev Klimaszewski > Cc: Alexey Minnekhanov > > -- > > Changes in v8: > - Reworked pd-mapper to register as an rproc_subdev / auxdev > - Dropped Tested-by from Steev and Alexey from the last patch since the > implementation was changed significantly. > - Add sensors, cdsp and mpss_root domains to 660 config (Alexey > Minnekhanov) > - Added platform entry for sm4250 (used for qrb4210 / RB2) > - Added locking to the pdr_get_domain_list() (Chris Lew) > - Remove the call to qmi_del_server() and corresponding API (Chris Lew) > - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) > - Link to v7: > https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org > > Changes in v7: > - Fixed modular build (Steev) > - Link to v6: > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > Changes in v6: > - Reworked mutex to fix lockdep issue on deregistration > - Fixed dependencies between PD-mapper and remoteproc to fix modular > builds (Krzysztof) > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > - Fixed kerneldocs (Krzysztof) > - Removed extra pr_debug messages (Krzysztof) > - Fixed wcss build (Krzysztof) > - Added platforms which do not require protection domain mapping to > silence the notice on those platforms > - Link to v5: > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > Changes in v5: > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > Lew) > - pd_mapper: reworked to provide static configuration per platform > (Bjorn) > - Link to v4: > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > Changes in v4: > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > - Added configuration for sm6350 (Thanks to Luca) > - Removed RFC tag (Konrad) > - Link to v3: > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > Changes in RFC v3: > - Send start / stop notifications when PD-mapper domain list is changed > - Reworked the way PD-mapper treats protection domains, register all of > them in a single batch > - Added SC7180 domains configuration based on TCL Book 14 GO > - Link to v2: > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > Changes in RFC v2: > - Swapped num_domains / domains (Konrad) > - Fixed an issue with battery not working on sc8280xp > - Added missing configuration for QCS404 > > --- > Dmitry Baryshkov (5): > soc: qcom: pdr: protect locator_addr with the main mutex > soc: qcom: pdr: fix parsing of domains lists > soc: qcom: pdr: extract PDR message marshalling data > soc: qcom: add pd-mapper implementation > remoteproc: qcom: enable in-kernel PD mapper > > drivers/remoteproc/qcom_common.c| 87 + > drivers/remoteproc/qcom_common.h| 10 + > drivers/remoteproc/qcom_q6v5_adsp.c | 3 + > drivers/remoteproc/qcom_q6v5_mss.c | 3 + > drivers/remoteproc/qcom_q6v5_pas.c | 3 + > drivers/remoteproc/qcom_q6v5_wcss.c | 3 + > drivers/soc/qcom/Kconfig| 15 + > drivers/soc/qcom/Makefile | 2 + > drivers/soc/qcom/pdr_interface.c| 17 +- > drivers/soc/qcom/pdr_internal.h | 318 ++--- > drivers/soc/qcom/qcom_pd_mapper.c | 676 > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > 12 files changed, 1190 insertions(+), 300 deletions(-) > --- > base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > Best regards, > -- > Dmitry Baryshkov > I've tested this over the weekend on my Thinkpad X13s with a number of reboots and seems to do the correct thing in v8 as well. Tested-by: Steev Klimaszewski
[PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock --- Changes since v1: - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES --- scripts/Makefile.vmlinux | 17 + 1 file changed, 17 insertions(+) diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index c9f3e03124d7f..54095d72f7fd7 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -36,6 +36,23 @@ targets += vmlinux vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE +$(call if_changed_dep,link_vmlinux) +# module.builtin.ranges +# --- +ifdef CONFIG_BUILTIN_MODULE_RANGES +__default: modules.builtin.ranges + +quiet_cmd_modules_builtin_ranges = GEN $@ + cmd_modules_builtin_ranges = \ + $(srctree)/scripts/generate_builtin_ranges.awk \ + $(filter-out FORCE,$+) > $@ + +vmlinux.map: vmlinux + +targets += modules.builtin.ranges +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE + $(call if_changed,modules_builtin_ranges) +endif + # Add FORCE to the prequisites of a target to force it to be always rebuilt. # --- -- 2.43.0
[PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper
Request in-kernel protection domain mapper to be started before starting Qualcomm DSP and release it once DSP is stopped. Once all DSPs are stopped, the PD mapper will be stopped too. Signed-off-by: Dmitry Baryshkov --- drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 ++ drivers/remoteproc/qcom_q6v5_mss.c | 3 ++ drivers/remoteproc/qcom_q6v5_pas.c | 3 ++ drivers/remoteproc/qcom_q6v5_wcss.c | 3 ++ 6 files changed, 109 insertions(+) diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index 03e5f5d533eb..8c8688f99f0a 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -25,6 +26,7 @@ #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev) #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev) #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev) +#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev) #define MAX_NUM_OF_SS 10 #define MAX_REGION_NAME_LENGTH 16 @@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr) } EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev); +static void pdm_dev_release(struct device *dev) +{ + struct auxiliary_device *adev = to_auxiliary_dev(dev); + + kfree(adev); +} + +static int pdm_notify_prepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + struct auxiliary_device *adev; + int ret; + + adev = kzalloc(sizeof(*adev), GFP_KERNEL); + if (!adev) + return -ENOMEM; + + adev->dev.parent = pdm->dev; + adev->dev.release = pdm_dev_release; + adev->name = "pd-mapper"; + adev->id = pdm->index; + + ret = auxiliary_device_init(adev); + if (ret) { + kfree(adev); + return ret; + } + + ret = auxiliary_device_add(adev); + if (ret) { + auxiliary_device_uninit(adev); + return ret; + } + + pdm->adev = adev; + + return 0; +} + + +static void pdm_notify_unprepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + + if (!pdm->adev) + return; + + auxiliary_device_delete(pdm->adev); + auxiliary_device_uninit(pdm->adev); + pdm->adev = NULL; +} + +/** + * qcom_add_pdm_subdev() - register PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Register @pdm so that Protection Device mapper service is started when the + * DSP is started too. + */ +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + pdm->dev = >dev; + pdm->index = rproc->index; + + pdm->subdev.prepare = pdm_notify_prepare; + pdm->subdev.unprepare = pdm_notify_unprepare; + + rproc_add_subdev(rproc, >subdev); +} +EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev); + +/** + * qcom_remove_pdm_subdev() - remove PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Remove the PD Mapper subdevice. + */ +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + rproc_remove_subdev(rproc, >subdev); +} +EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev); + MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h index 9ef4449052a9..b07fbaa091a0 100644 --- a/drivers/remoteproc/qcom_common.h +++ b/drivers/remoteproc/qcom_common.h @@ -34,6 +34,13 @@ struct qcom_rproc_ssr { struct qcom_ssr_subsystem *info; }; +struct qcom_rproc_pdm { + struct rproc_subdev subdev; + struct device *dev; + int index; + struct auxiliary_device *adev; +}; + void qcom_minidump(struct rproc *rproc, unsigned int minidump_id, void (*rproc_dumpfn_t)(struct rproc *rproc, struct rproc_dump_segment *segment, void *dest, size_t offset, @@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr, const char *ssr_name); void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr); +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); + #if IS_ENABLED(CONFIG_QCOM_SYSMON) struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc, const char *name, diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_a
[PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" Cc: Steev Klimaszewski Cc: Alexey Minnekhanov -- Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 17 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1190 insertions(+), 300 deletions(-) --- base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, -- Dmitry Baryshkov
Re: kernel BUG in ptr_stale
On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote: > Hello. > We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. > Recently, our team has discovered a issue in Linux kernel 6.7. Attached to > the email were a PoC file of the issue. This (and several of your others) are fixed in Linus's tree. > > Stack dump: > > bcachefs (loop1): mounting version 1.7: (unknown version) > opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names > ----[ cut here ] > kernel BUG at fs/bcachefs/buckets.h:114! > invalid opcode: [#1] PREEMPT SMP KASAN NOPTI > CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > 04/01/2014 > RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline] > RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188 > Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 > 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 > 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23 > RSP: 0018:c90007c4ec38 EFLAGS: 00010246 > RAX: 0004 RBX: 0080 RCX: c90002679000 > RDX: 0004 RSI: 83ccf3b3 RDI: 0006 > RBP: R08: 0006 R09: 1028 > R10: 0080 R11: R12: 1028 > R13: 88804dee5100 R14: R15: 88805b1a4110 > FS: 7f79ba8ab640() GS:88807ec0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > PKRU: 5554 > Call Trace: > > bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012 > bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215 > bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline] > bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297 > journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322 > journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411 > bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752 > bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32 > bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160 > bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691 > bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978 > bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968 > bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863 > legacy_get_tree+0x109/0x220 fs/fs_context.c:662 > vfs_get_tree+0x93/0x380 fs/super.c:1771 > do_new_mount fs/namespace.c:3337 [inline] > path_mount+0x679/0x1e40 fs/namespace.c:3664 > do_mount fs/namespace.c:3677 [inline] > __do_sys_mount fs/namespace.c:3886 [inline] > __se_sys_mount fs/namespace.c:3863 [inline] > __x64_sys_mount+0x287/0x310 fs/namespace.c:3863 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > RIP: 0033:0x7f79b9a91b3e > Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 > 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5 > RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e > RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90 > RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050 > R10: 0181c050 R11: 0202 R12: 20011a00 > R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0 > > Modules linked in: > ---[ end trace ]--- > > > Thank you for taking the time to read this email and we look forward to > working with you further. > > > > > >
[PATCH] tracing: Fix trace_pid_list_free() kernel-doc
make C=1 reports: kernel/trace/pid_list.c:458: warning: Function parameter or struct member 'pid_list' not described in 'trace_pid_list_free' Add the missing parameter to the trace_pid_list_free() kernel-doc. Signed-off-by: Jeff Johnson --- kernel/trace/pid_list.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c index 95106d02b32d..19b271a12c99 100644 --- a/kernel/trace/pid_list.c +++ b/kernel/trace/pid_list.c @@ -451,6 +451,7 @@ struct trace_pid_list *trace_pid_list_alloc(void) /** * trace_pid_list_free - Frees an allocated pid_list. + * @pid_list: The pid list to free. * * Frees the memory for a pid_list that was allocated. */ --- base-commit: dd5a440a31fae6e459c0d627162825505361 change-id: 20240506-trace_pid_list_free-kdoc-e2bf15be84ee
Re: [PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO
On 4/26/2024 10:39 PM, Hou Tao wrote: > From: Hou Tao > > When trying to insert a 10MB kernel module kept in a virtio-fs with cache > disabled, the following warning was reported: > > [ cut here ] > WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 .. > Modules linked in: > CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) .. > RIP: 0010:__alloc_pages+0x2bf/0x380 > .. > Call Trace: > >? __warn+0x8e/0x150 >? __alloc_pages+0x2bf/0x380 >__kmalloc_large_node+0x86/0x160 >__kmalloc+0x33c/0x480 >virtio_fs_enqueue_req+0x240/0x6d0 >virtio_fs_wake_pending_and_unlock+0x7f/0x190 >queue_request_and_unlock+0x55/0x60 >fuse_simple_request+0x152/0x2b0 >fuse_direct_io+0x5d2/0x8c0 >fuse_file_read_iter+0x121/0x160 >__kernel_read+0x151/0x2d0 >kernel_read+0x45/0x50 >kernel_read_file+0x1a9/0x2a0 >init_module_from_file+0x6a/0xe0 >idempotent_init_module+0x175/0x230 >__x64_sys_finit_module+0x5d/0xb0 >x64_sys_call+0x1c3/0x9e0 >do_syscall_64+0x3d/0xc0 >entry_SYSCALL_64_after_hwframe+0x4b/0x53 >.. > > ---[ end trace ]--- > > The warning is triggered as follows: > SNIP > @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct > iov_iter *iter, > size_t nbytes = min(count, nmax); > > err = fuse_get_user_pages(>ap, iter, , write, > - max_pages); > + max_pages, fc->use_pages_for_kvec_io); > if (err && !nbytes) > break; Just find out that flush_kernel_vmap_range() and invalidate_kernel_vmap_range() should be used before DMA rw operation and after DMA read operation if the kvec IO is backed by vmalloc() area. Will update it in v4. > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index f239196103137..d4f04e19058c1 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -860,6 +860,9 @@ struct fuse_conn { > /** Passthrough support for read/write IO */ > unsigned int passthrough:1; > > + /* Use pages instead of pointer for kernel I/O */ > + unsigned int use_pages_for_kvec_io:1; > + > /** Maximum stack depth for passthrough backing files */ > int max_stack_depth; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 322af827a2329..36984c0e23d14 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1512,6 +1512,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->delete_stale = true; > fc->auto_submounts = true; > fc->sync_fs = true; > + fc->use_pages_for_kvec_io = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
On Mon, May 6, 2024 at 3:00 PM maobibo wrote: > > > > On 2024/5/6 上午9:53, Huacai Chen wrote: > > Hi, Bibo, > > > > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: > >> > >> PARAVIRT option and pv ipi is added on guest kernel side, function > >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function > >> firstly checks whether system runs on VM mode. If kernel runs on VM mode, > >> it will call function kvm_para_available() to detect current hypervirsor > >> type. Now only KVM type detection is supported, the paravirt function can > >> work only if current hypervisor type is KVM, since there is only KVM > >> supported on LoongArch now. > >> > >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With > >> virutal IPI sender, ipi message is stored in DDR memory rather than > >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs > >> at the same time like X86 KVM method. Hypercall method is used for IPI > >> sending. > >> > >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since > >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt > >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message. > >> > >> Signed-off-by: Bibo Mao > >> --- > >> arch/loongarch/Kconfig | 9 ++ > >> arch/loongarch/include/asm/hardirq.h | 1 + > >> arch/loongarch/include/asm/paravirt.h | 27 > >> .../include/asm/paravirt_api_clock.h | 1 + > >> arch/loongarch/kernel/Makefile| 1 + > >> arch/loongarch/kernel/irq.c | 2 +- > >> arch/loongarch/kernel/paravirt.c | 151 ++ > >> arch/loongarch/kernel/smp.c | 4 +- > >> 8 files changed, 194 insertions(+), 2 deletions(-) > >> create mode 100644 arch/loongarch/include/asm/paravirt.h > >> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > >> create mode 100644 arch/loongarch/kernel/paravirt.c > >> > >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > >> index 54ad04dacdee..0a1540a8853e 100644 > >> --- a/arch/loongarch/Kconfig > >> +++ b/arch/loongarch/Kconfig > >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH > >> bool > >> default y > >> > >> +config PARAVIRT > >> + bool "Enable paravirtualization code" > >> + depends on AS_HAS_LVZ_EXTENSION > >> + help > >> + This changes the kernel so it can modify itself when it is run > >> + under a hypervisor, potentially improving performance > >> significantly > >> + over full virtualization. However, when run without a hypervisor > >> + the kernel is theoretically slower and slightly larger. > >> + > >> config ARCH_SUPPORTS_KEXEC > >> def_bool y > >> > >> diff --git a/arch/loongarch/include/asm/hardirq.h > >> b/arch/loongarch/include/asm/hardirq.h > >> index 9f0038e19c7f..b26d596a73aa 100644 > >> --- a/arch/loongarch/include/asm/hardirq.h > >> +++ b/arch/loongarch/include/asm/hardirq.h > >> @@ -21,6 +21,7 @@ enum ipi_msg_type { > >> typedef struct { > >> unsigned int ipi_irqs[NR_IPI]; > >> unsigned int __softirq_pending; > >> + atomic_t message cacheline_aligned_in_smp; > >> } cacheline_aligned irq_cpustat_t; > >> > >> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > >> diff --git a/arch/loongarch/include/asm/paravirt.h > >> b/arch/loongarch/include/asm/paravirt.h > >> new file mode 100644 > >> index ..58f7b7b89f2c > >> --- /dev/null > >> +++ b/arch/loongarch/include/asm/paravirt.h > >> @@ -0,0 +1,27 @@ > >> +/* SPDX-License-Identifier: GPL-2.0 */ > >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H > >> +#define _ASM_LOONGARCH_PARAVIRT_H > >> + > >> +#ifdef CONFIG_PARAVIRT > >> +#include > >> +struct static_key; > >> +extern struct static_key paravirt_steal_enabled; > >> +extern struct static_key paravirt_steal_rq_enabled; > >> + > >> +u64 dummy_steal_clock(int cpu); > >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > >> + > >> +static inline u64 paravirt_steal_clock
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
On 2024/5/6 上午9:53, Huacai Chen wrote: Hi, Bibo, On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 54ad04dacdee..0a1540a8853e 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3a7620b66bc6..c9bfeda89e40 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + se
[PATCH] kernel/module: disable cfi for do_mod_ctors
CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled. CFI failure at do_init_module+0x100/0x384 (target: tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c) Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i](). Signed-off-by: Joey Jiao --- kernel/module/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/module/main.c b/kernel/module/main.c index e1e8a7a9d6c1..d51e63795637 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const struct load_info *info) } /* Call module constructors. */ +__nocfi static void do_mod_ctors(struct module *mod) { #ifdef CONFIG_CONSTRUCTORS -- 2.43.2
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
Hi, Bibo, On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: > > PARAVIRT option and pv ipi is added on guest kernel side, function > pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function > firstly checks whether system runs on VM mode. If kernel runs on VM mode, > it will call function kvm_para_available() to detect current hypervirsor > type. Now only KVM type detection is supported, the paravirt function can > work only if current hypervisor type is KVM, since there is only KVM > supported on LoongArch now. > > PV IPI uses virtual IPI sender and virtual IPI receiver function. With > virutal IPI sender, ipi message is stored in DDR memory rather than > emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs > at the same time like X86 KVM method. Hypercall method is used for IPI > sending. > > With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since > VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt > acknowledge. And IPI message is stored in DDR, no trap in get IPI message. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 ++ > arch/loongarch/include/asm/hardirq.h | 1 + > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/irq.c | 2 +- > arch/loongarch/kernel/paravirt.c | 151 ++ > arch/loongarch/kernel/smp.c | 4 +- > 8 files changed, 194 insertions(+), 2 deletions(-) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 54ad04dacdee..0a1540a8853e 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance significantly > + over full virtualization. However, when run without a hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/hardirq.h > b/arch/loongarch/include/asm/hardirq.h > index 9f0038e19c7f..b26d596a73aa 100644 > --- a/arch/loongarch/include/asm/hardirq.h > +++ b/arch/loongarch/include/asm/hardirq.h > @@ -21,6 +21,7 @@ enum ipi_msg_type { > typedef struct { > unsigned int ipi_irqs[NR_IPI]; > unsigned int __softirq_pending; > + atomic_t message cacheline_aligned_in_smp; > } cacheline_aligned irq_cpustat_t; > > DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..58f7b7b89f2c > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} > + > +int pv_ipi_init(void); > +#else > +static inline int pv_ipi_init(void) > +{ > + return 0; > +} > + > +#endif // CONFIG_PARAVIRT > +#endif > diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > b/arch/loongarch/include/asm/paravirt_api_clock.h > new file mode 100644 > index ..65ac7cee0dad > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt_api_clock.h > @@ -0,0 +1 @@ > +#include > diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile > index 3a7620b66bc6..c9bfeda89e40 100644 > --- a/arch/loongarch/kernel/Makefile > +++ b/arch/loongarch/kernel/Makefile > @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-$(CONFIG_PROC_FS) += proc.o >
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On 4/26/2024 6:36 PM, Dmitry Baryshkov wrote: On Sat, 27 Apr 2024 at 04:03, Chris Lew wrote: On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c index 1d24c9b656a8..02d0c626b03b 100644 --- a/drivers/remoteproc/qcom_q6v5_adsp.c +++ b/drivers/remoteproc/qcom_q6v5_adsp.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) int ret; unsigned int val; - ret = qcom_q6v5_prepare(>q6v5); + ret = qcom_pdm_get(); if (ret) return ret; Would it make sense to try and model this as a rproc subdev? This section of the remoteproc code seems to be focused on making specific calls to setup and enable hardware resources, where as pd mapper is software. sysmon and ssr are also purely software and they are modeled as subdevs in qcom_common. I'm not an expert on remoteproc organization but this was just a thought. Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance Both sysmon and ssr have some kind of global states that they manage too. Each subdev functionality tends to be a mix of per-remoteproc instance management and global state management. If pd-mapper was completely global, pd-mapper would be able to instantiate by itself. Instead, instantiation is dependent on each remoteproc instance properly getting and putting references. The pdm subdev could manage the references to pd-mapper for that remoteproc instance. On the other hand, I think Bjorn recommended this could be moved to probe time in v4. The v4 version was doing the reinitialization-dance, but I think the recommendation could still apply to this version. Thanks! Chris + ret = qcom_q6v5_prepare(>q6v5); + if (ret) + goto put_pdm; + ret = adsp_map_carveout(rproc); if (ret) { dev_err(adsp->dev, "ADSP smmu mapping failed\n"); @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) adsp_unmap_carveout(rproc); disable_irqs: qcom_q6v5_unprepare(>q6v5); +put_pdm: + qcom_pdm_release(); return ret; }
BUG: unable to handle kernel paging request in do_split
Hello. We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the email were a PoC file of the issue. Stack dump: BUG: unable to handle page fault for address: ed110c2fd97f #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 7ffd0067 P4D 7ffd0067 PUD 0 Oops: [#1] PREEMPT SMP KASAN NOPTI CPU: 0 PID: 24082 Comm: syz-executor.3 Not tainted 6.7.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047 Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef RSP: 0018:c90001e9f858 EFLAGS: 00010a02 RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000 RDX: 11110c2fd97f RSI: 823364ab RDI: 0005 RBP: 8880617ecc00 R08: 0005 R09: R10: R11: R12: dc00 R13: R14: R15: 88801ee8d2b0 FS: 7f191402a640() GS:88802c60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 PKRU: 5554 Call Trace: make_indexed_dir+0x1158/0x1540 fs/ext4/namei.c:2342 ext4_add_entry+0xcd0/0xe80 fs/ext4/namei.c:2454 ext4_add_nondir+0x90/0x2b0 fs/ext4/namei.c:2795 ext4_symlink+0x539/0x9e0 fs/ext4/namei.c:3436 vfs_symlink fs/namei.c:4464 [inline] vfs_symlink+0x3f6/0x640 fs/namei.c:4448 do_symlinkat+0x245/0x2f0 fs/namei.c:4490 __do_sys_symlink fs/namei.c:4511 [inline] __se_sys_symlink fs/namei.c:4509 [inline] __x64_sys_symlink+0x79/0xa0 fs/namei.c:4509 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f191329002d Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7f191402a028 EFLAGS: 0246 ORIG_RAX: 0058 RAX: ffda RBX: 7f19133cbf80 RCX: 7f191329002d RDX: RSI: 2e40 RDI: 20001640 RBP: 7f19132f14d0 R08: R09: R10: R11: 0246 R12: R13: 000b R14: 7f19133cbf80 R15: 7f191400a000 Modules linked in: CR2: ed110c2fd97f ---[ end trace ]--- RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047 Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef RSP: 0018:c90001e9f858 EFLAGS: 00010a02 RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000 RDX: 11110c2fd97f RSI: 823364ab RDI: 0005 RBP: 8880617ecc00 R08: 0005 R09: R10: R11: R12: dc00 R13: R14: R15: 88801ee8d2b0 FS: 7f191402a640() GS:88802c60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 PKRU: 5554 Code disassembly (best guess): 0: d2 0f rorb %cl,(%rdi) 2: 85 38 test %edi,(%rax) 4: 0b 00 or (%rax),%eax 6: 00 8b 45 00 89 84 add%cl,-0x7b76ffbb(%rbx) c: 24 84 and$0x84,%al e: 00 00 add%al,(%rax) 10: 00 41 8dadd%al,-0x73(%rcx) 13: 45 ff 48 8d rex.RB decl -0x73(%r8) 17: 1c c3 sbb$0xc3,%al 19: 48 b8 00 00 00 00 00movabs $0xdc00,%rax 20: fc ff df 23: 48 89 damov%rbx,%rdx 26: 48 c1 ea 03 shr$0x3,%rdx * 2a: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx <-- trapping instruction 2e: 48 89 d8mov%rbx,%rax 31: 83 e0 07and$0x7,%eax 34: 83 c0 03add$0x3,%eax 37: 38 d0 cmp%dl,%al 39: 7c 08 jl 0x43 3b: 84 d2 test %dl,%dl 3d: 0f .byte 0xf 3e: 85 ef test
[PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 54ad04dacdee..0a1540a8853e 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3a7620b66bc6..c9bfeda89e40 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); } diff --git a/arch/loongarch/kernel/paravir
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On Sat, 27 Apr 2024 at 04:03, Chris Lew wrote: > > > > On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: > > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c > > b/drivers/remoteproc/qcom_q6v5_adsp.c > > index 1d24c9b656a8..02d0c626b03b 100644 > > --- a/drivers/remoteproc/qcom_q6v5_adsp.c > > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c > > @@ -23,6 +23,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) > > int ret; > > unsigned int val; > > > > - ret = qcom_q6v5_prepare(>q6v5); > > + ret = qcom_pdm_get(); > > if (ret) > > return ret; > > Would it make sense to try and model this as a rproc subdev? This > section of the remoteproc code seems to be focused on making specific > calls to setup and enable hardware resources, where as pd mapper is > software. > > sysmon and ssr are also purely software and they are modeled as subdevs > in qcom_common. I'm not an expert on remoteproc organization but this > was just a thought. Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance > > Thanks! > Chris > > > > > + ret = qcom_q6v5_prepare(>q6v5); > > + if (ret) > > + goto put_pdm; > > + > > ret = adsp_map_carveout(rproc); > > if (ret) { > > dev_err(adsp->dev, "ADSP smmu mapping failed\n"); > > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) > > adsp_unmap_carveout(rproc); > > disable_irqs: > > qcom_q6v5_unprepare(>q6v5); > > +put_pdm: > > + qcom_pdm_release(); > > > > return ret; > > } > -- With best wishes Dmitry
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c index 1d24c9b656a8..02d0c626b03b 100644 --- a/drivers/remoteproc/qcom_q6v5_adsp.c +++ b/drivers/remoteproc/qcom_q6v5_adsp.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) int ret; unsigned int val; - ret = qcom_q6v5_prepare(>q6v5); + ret = qcom_pdm_get(); if (ret) return ret; Would it make sense to try and model this as a rproc subdev? This section of the remoteproc code seems to be focused on making specific calls to setup and enable hardware resources, where as pd mapper is software. sysmon and ssr are also purely software and they are modeled as subdevs in qcom_common. I'm not an expert on remoteproc organization but this was just a thought. Thanks! Chris + ret = qcom_q6v5_prepare(>q6v5); + if (ret) + goto put_pdm; + ret = adsp_map_carveout(rproc); if (ret) { dev_err(adsp->dev, "ADSP smmu mapping failed\n"); @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) adsp_unmap_carveout(rproc); disable_irqs: qcom_q6v5_unprepare(>q6v5); +put_pdm: + qcom_pdm_release(); return ret; }
Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.
Hi LuMingYin, Thanks for finding the problem! But please make a commit message following Documentation/process/submitting-patches.rst On Fri, 26 Apr 2024 10:13:43 +0100 lumingyindet...@126.com wrote: > From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> > > At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables > named code and tmp are defined. At line 1437, a new dynamic memory area is > allocated using the function kcalloc. When the if statement at line 1467 > evaluates to true, the program jumps to the out label at line 1469. Within > this function, there are two labels: out and fail. The difference between > these two labels is that fail additionally frees the dynamic memory area > pointed to by the variable code. Therefore, the program should jump to the > fail label instead of the out label. This commit fixes this bug. > For example, you must line break after about 70 characters. Also, please don't use the line number because the line number is easily changed (function name is OK). Since this bug is very clear mistake, so you can just explain that as following. If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it jumps to 'out' instead of 'fail' by mistake. In the result, in this case the 'tmp' buffer is not freed and leaks its memory. Fix it by jumping to 'fail' in that case. The first paragraph explains what happens, and second one to exaplain how to fix it. Also, please add this Fixes tag. Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser") You can easily find this commit number with git blame. Thank you, > Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> > --- > kernel/trace/trace_probe.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c > index dfe3ee6035ec..42bc0f362226 100644 > --- a/kernel/trace/trace_probe.c > +++ b/kernel/trace/trace_probe.c > @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char > *argv, ssize_t *size, > parg->fmt = kmalloc(len, GFP_KERNEL); > if (!parg->fmt) { > ret = -ENOMEM; > - goto out; > + goto fail; > } > snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype, >parg->count); > -- > 2.25.1 > -- Masami Hiramatsu (Google)
[PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO
From: Hou Tao Hi, The patch set aims to fix the warning related to an abnormal size parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs. Beside the abnormal size parameter for kmalloc, the gfp parameter is also questionable: GFP_ATOMIC is used even when the allocation occurs in a kworker context. Patch #2 fixes it by using GFP_NOFS when the allocation is initiated by the kworker. For more details, please check the individual patches. As usual, comments are always welcome. Change Log: v3: * introduce use_pages_for_kvec_io for virtiofs. When the option is enabled, fuse will use iov_iter_extract_pages() to construct a page array and pass the pages array instead of a pointer to virtiofs. The benefit is twofold: the length of the data passed to virtiofs is limited by max_pages, and there is no memory copy compared with v2. v2: https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/ * limit the length of ITER_KVEC dio by max_pages instead of the newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC dio being consistent with other rw operations. * replace kmalloc-allocated bounce buffer by using a bounce buffer backed by scattered pages when the length of the bounce buffer for KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with fragmented memory, the KVEC_ITER dio can be handled normally by virtiofs. (Bernd Schubert) * merge the GFP_NOFS patch [1] into this patch-set and use memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS (Benjamin Coddington) v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ [1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/ Hou Tao (2): virtiofs: use pages instead of pointer for kernel direct IO virtiofs: use GFP_NOFS when enqueuing request through kworker fs/fuse/file.c | 12 fs/fuse/fuse_i.h| 3 +++ fs/fuse/virtio_fs.c | 25 - 3 files changed, 27 insertions(+), 13 deletions(-) -- 2.29.2
[PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO
From: Hou Tao When trying to insert a 10MB kernel module kept in a virtio-fs with cache disabled, the following warning was reported: [ cut here ] WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 .. Modules linked in: CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) .. RIP: 0010:__alloc_pages+0x2bf/0x380 .. Call Trace: ? __warn+0x8e/0x150 ? __alloc_pages+0x2bf/0x380 __kmalloc_large_node+0x86/0x160 __kmalloc+0x33c/0x480 virtio_fs_enqueue_req+0x240/0x6d0 virtio_fs_wake_pending_and_unlock+0x7f/0x190 queue_request_and_unlock+0x55/0x60 fuse_simple_request+0x152/0x2b0 fuse_direct_io+0x5d2/0x8c0 fuse_file_read_iter+0x121/0x160 __kernel_read+0x151/0x2d0 kernel_read+0x45/0x50 kernel_read_file+0x1a9/0x2a0 init_module_from_file+0x6a/0xe0 idempotent_init_module+0x175/0x230 __x64_sys_finit_module+0x5d/0xb0 x64_sys_call+0x1c3/0x9e0 do_syscall_64+0x3d/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 .. ---[ end trace ]--- The warning is triggered as follows: 1) syscall finit_module() handles the module insertion and it invokes kernel_read_file() to read the content of the module first. 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and passes it to kernel_read(). kernel_read() constructs a kvec iter by using iov_iter_kvec() and passes it to fuse_file_read_iter(). 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes fuse_direct_io(). As for now, the maximal read size for kvec iter is only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so fuse_direct_io() doesn't split the 10MB buffer. It saves the address and the size of the 10MB-sized buffer in out_args[0] of a fuse request and passes the fuse request to virtio_fs_wake_pending_and_unlock(). 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to queue the request. Because virtiofs need DMA-able address, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for all fuse args, copies these args into the bounce buffer and passed the physical address of the bounce buffer to virtiofsd. The total length of these fuse args for the passed fuse request is about 10MB, so copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and it triggers the warning in __alloc_pages(): if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; 5) virtio_fs_enqueue_req() will retry the memory allocation in a kworker, but it won't help, because kmalloc() will always return NULL due to the abnormal size and finit_module() will hang forever. A feasible solution is to limit the value of max_read for virtio-fs, so the length passed to kmalloc() will be limited. However it will affect the maximal read size for normal read. And for virtio-fs write initiated from kernel, it has the similar problem but now there is no way to limit fc->max_write in kernel. So instead of limiting both the values of max_read and max_write in kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use pages instead of pointer to pass the KVEC_IO data. Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Signed-off-by: Hou Tao --- fs/fuse/file.c | 12 fs/fuse/fuse_i.h| 3 +++ fs/fuse/virtio_fs.c | 1 + 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b57ce41576407..82b77c5d8c643 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1471,13 +1471,17 @@ static inline size_t fuse_get_frag_size(const struct iov_iter *ii, static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii, size_t *nbytesp, int write, - unsigned int max_pages) + unsigned int max_pages, + bool use_pages_for_kvec_io) { size_t nbytes = 0; /* # bytes already packed in req */ ssize_t ret = 0; - /* Special case for kernel I/O: can copy directly into the buffer */ - if (iov_iter_is_kvec(ii)) { + /* Special case for kernel I/O: can copy directly into the buffer. +* However if the implementation of fuse_conn requires pages instead of +* pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead. +*/ + if (iov_iter_is_kvec(ii) && !use_pages_for_kvec_io) { unsigned long user_addr = fuse_get_user_addr(ii); size_t frag_size = fuse_get_frag_size(ii, *nbytesp); @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, size_t nbytes = min(count, nmax); err = fuse_get_user_pages(>ap, iter, , write, -
Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
On 24.04.2024 12:27, Dmitry Baryshkov wrote: Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. Unlike previous revisions of the patchset, this iteration uses static configuration per platform, rather than building it dynamically from the list of DSPs being started. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" -- Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 I've tested this series on sdm660 device, with userspace pd-mapper service disabled, and don't see any regressions - e.g Wi-Fi/BT still come online and work as before. Debug logs: https://paste.sr.ht/~minlexx/bd03db4c582a3275078ce4fd05ea76ce46a52b8e Missing cdsp_root and adsp_sensors PDs are not currently an issue, because those are not enabled yet on SDM660 or hard to test, so Tested-by: Alexey Minnekhanov -- Regards, Alexey Minnekhanov postmarketOS developer
[PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.
From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables named code and tmp are defined. At line 1437, a new dynamic memory area is allocated using the function kcalloc. When the if statement at line 1467 evaluates to true, the program jumps to the out label at line 1469. Within this function, there are two labels: out and fail. The difference between these two labels is that fail additionally frees the dynamic memory area pointed to by the variable code. Therefore, the program should jump to the fail label instead of the out label. This commit fixes this bug. Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> --- kernel/trace/trace_probe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index dfe3ee6035ec..42bc0f362226 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size, parg->fmt = kmalloc(len, GFP_KERNEL); if (!parg->fmt) { ret = -ENOMEM; - goto out; + goto fail; } snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype, parg->count); -- 2.25.1
Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
On Thu, 25 Apr 2024 at 10:08, Steev Klimaszewski wrote: > > Hi Dmitry, > > On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov > wrote: > > > > Protection domain mapper is a QMI service providing mapping between > > 'protection domains' and services supported / allowed in these domains. > > For example such mapping is required for loading of the WiFi firmware or > > for properly starting up the UCSI / altmode / battery manager support. > > > > The existing userspace implementation has several issue. It doesn't play > > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > > firmware location is changed (or if the firmware was not available at > > the time pd-mapper was started but the corresponding directory is > > mounted later), etc. > > > > However this configuration is largely static and common between > > different platforms. Provide in-kernel service implementing static > > per-platform data. > > > > Unlike previous revisions of the patchset, this iteration uses static > > configuration per platform, rather than building it dynamically from the > > list of DSPs being started. > > > > To: Bjorn Andersson > > To: Konrad Dybcio > > To: Sibi Sankar > > To: Mathieu Poirier > > Cc: linux-arm-...@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: linux-remotep...@vger.kernel.org > > Cc: Johan Hovold > > Cc: Xilin Wu > > Cc: "Bryan O'Donoghue" > > -- > > > > Changes in v7: > > - Fixed modular build (Steev) > > - Link to v6: > > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > > > Changes in v6: > > - Reworked mutex to fix lockdep issue on deregistration > > - Fixed dependencies between PD-mapper and remoteproc to fix modular > > builds (Krzysztof) > > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > > - Fixed kerneldocs (Krzysztof) > > - Removed extra pr_debug messages (Krzysztof) > > - Fixed wcss build (Krzysztof) > > - Added platforms which do not require protection domain mapping to > > silence the notice on those platforms > > - Link to v5: > > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > > > Changes in v5: > > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > > Lew) > > - pd_mapper: reworked to provide static configuration per platform > > (Bjorn) > > - Link to v4: > > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > > > Changes in v4: > > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > > - Added configuration for sm6350 (Thanks to Luca) > > - Removed RFC tag (Konrad) > > - Link to v3: > > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > > > Changes in RFC v3: > > - Send start / stop notifications when PD-mapper domain list is changed > > - Reworked the way PD-mapper treats protection domains, register all of > > them in a single batch > > - Added SC7180 domains configuration based on TCL Book 14 GO > > - Link to v2: > > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > > > Changes in RFC v2: > > - Swapped num_domains / domains (Konrad) > > - Fixed an issue with battery not working on sc8280xp > > - Added missing configuration for QCS404 > > > > --- > > Dmitry Baryshkov (6): > > soc: qcom: pdr: protect locator_addr with the main mutex > > soc: qcom: pdr: fix parsing of domains lists > > soc: qcom: pdr: extract PDR message marshalling data > > soc: qcom: qmi: add a way to remove running service > > soc: qcom: add pd-mapper implementation > > remoteproc: qcom: enable in-kernel PD mapper > > > > drivers/remoteproc/Kconfig | 4 + > > drivers/remoteproc/qcom_q6v5_adsp.c | 11 +- > > drivers/remoteproc/qcom_q6v5_mss.c | 10 +- > > drivers/remoteproc/qcom_q6v5_pas.c | 12 +- > > drivers/remoteproc/qcom_q6v5_wcss.c | 12 +- > > drivers/soc/qcom/Kconfig| 14 + > > drivers/soc/qcom/Makefile | 2 + > > drivers/soc/qcom/pdr_interface.c| 6 +- > > drivers/soc/qcom/pdr_internal.h | 318 ++--- > > drivers/soc/qcom/qcom_pd_mapper.c | 656 > > > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > > drivers/soc/qcom/qmi_interface.c| 67 > > include/linux/soc/qcom/pd_mapper.h | 28 ++ > > include/linux/soc/qcom/qmi.h| 2 + > > 14 files changed, 1193 insertions(+), 302 deletions(-) > > --- > > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 > > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > > > Best regards, > > -- > > Dmitry Baryshkov > > > > > I've tested this series over a large number of reboots, and the p-d > devices(?) do always seem to come up (with the pd-mapper service > disabled) on my Thinkpad X13s. One less service to run in userland! > Tested-by: Steev Klimaszewski Thank you! -- With best wishes Dmitry
Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
Hi Dmitry, On Wed, Apr 24, 2024 at 4:28 AM Dmitry Baryshkov wrote: > > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > However this configuration is largely static and common between > different platforms. Provide in-kernel service implementing static > per-platform data. > > Unlike previous revisions of the patchset, this iteration uses static > configuration per platform, rather than building it dynamically from the > list of DSPs being started. > > To: Bjorn Andersson > To: Konrad Dybcio > To: Sibi Sankar > To: Mathieu Poirier > Cc: linux-arm-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-remotep...@vger.kernel.org > Cc: Johan Hovold > Cc: Xilin Wu > Cc: "Bryan O'Donoghue" > -- > > Changes in v7: > - Fixed modular build (Steev) > - Link to v6: > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > Changes in v6: > - Reworked mutex to fix lockdep issue on deregistration > - Fixed dependencies between PD-mapper and remoteproc to fix modular > builds (Krzysztof) > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > - Fixed kerneldocs (Krzysztof) > - Removed extra pr_debug messages (Krzysztof) > - Fixed wcss build (Krzysztof) > - Added platforms which do not require protection domain mapping to > silence the notice on those platforms > - Link to v5: > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > Changes in v5: > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > Lew) > - pd_mapper: reworked to provide static configuration per platform > (Bjorn) > - Link to v4: > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > Changes in v4: > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > - Added configuration for sm6350 (Thanks to Luca) > - Removed RFC tag (Konrad) > - Link to v3: > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > Changes in RFC v3: > - Send start / stop notifications when PD-mapper domain list is changed > - Reworked the way PD-mapper treats protection domains, register all of > them in a single batch > - Added SC7180 domains configuration based on TCL Book 14 GO > - Link to v2: > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > Changes in RFC v2: > - Swapped num_domains / domains (Konrad) > - Fixed an issue with battery not working on sc8280xp > - Added missing configuration for QCS404 > > --- > Dmitry Baryshkov (6): > soc: qcom: pdr: protect locator_addr with the main mutex > soc: qcom: pdr: fix parsing of domains lists > soc: qcom: pdr: extract PDR message marshalling data > soc: qcom: qmi: add a way to remove running service > soc: qcom: add pd-mapper implementation > remoteproc: qcom: enable in-kernel PD mapper > > drivers/remoteproc/Kconfig | 4 + > drivers/remoteproc/qcom_q6v5_adsp.c | 11 +- > drivers/remoteproc/qcom_q6v5_mss.c | 10 +- > drivers/remoteproc/qcom_q6v5_pas.c | 12 +- > drivers/remoteproc/qcom_q6v5_wcss.c | 12 +- > drivers/soc/qcom/Kconfig| 14 + > drivers/soc/qcom/Makefile | 2 + > drivers/soc/qcom/pdr_interface.c| 6 +- > drivers/soc/qcom/pdr_internal.h | 318 ++--- > drivers/soc/qcom/qcom_pd_mapper.c | 656 > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > drivers/soc/qcom/qmi_interface.c| 67 > include/linux/soc/qcom/pd_mapper.h | 28 ++ > include/linux/soc/qcom/qmi.h| 2 + > 14 files changed, 1193 insertions(+), 302 deletions(-) > --- > base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > Best regards, > -- > Dmitry Baryshkov > > I've tested this series over a large number of reboots, and the p-d devices(?) do always seem to come up (with the pd-mapper service disabled) on my Thinkpad X13s. One less service to run in userland! Tested-by: Steev Klimaszewski
[PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
Request in-kernel protection domain mapper to be started before starting Qualcomm DSP and release it once DSP is stopped. Once all DSPs are stopped, the PD mapper will be stopped too. Signed-off-by: Dmitry Baryshkov --- drivers/remoteproc/Kconfig | 4 drivers/remoteproc/qcom_q6v5_adsp.c | 11 ++- drivers/remoteproc/qcom_q6v5_mss.c | 10 +- drivers/remoteproc/qcom_q6v5_pas.c | 12 +++- drivers/remoteproc/qcom_q6v5_wcss.c | 12 +++- 5 files changed, 45 insertions(+), 4 deletions(-) diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index 48845dc8fa85..a0ce552f89a1 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -181,6 +181,7 @@ config QCOM_Q6V5_ADSP depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n + depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n select MFD_SYSCON select QCOM_PIL_INFO select QCOM_MDT_LOADER @@ -201,6 +202,7 @@ config QCOM_Q6V5_MSS depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n + depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n select MFD_SYSCON select QCOM_MDT_LOADER select QCOM_PIL_INFO @@ -221,6 +223,7 @@ config QCOM_Q6V5_PAS depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n + depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n select MFD_SYSCON select QCOM_PIL_INFO select QCOM_MDT_LOADER @@ -243,6 +246,7 @@ config QCOM_Q6V5_WCSS depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n + depends on QCOM_PD_MAPPER || QCOM_PD_MAPPER=n select MFD_SYSCON select QCOM_MDT_LOADER select QCOM_PIL_INFO diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c index 1d24c9b656a8..02d0c626b03b 100644 --- a/drivers/remoteproc/qcom_q6v5_adsp.c +++ b/drivers/remoteproc/qcom_q6v5_adsp.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) int ret; unsigned int val; - ret = qcom_q6v5_prepare(>q6v5); + ret = qcom_pdm_get(); if (ret) return ret; + ret = qcom_q6v5_prepare(>q6v5); + if (ret) + goto put_pdm; + ret = adsp_map_carveout(rproc); if (ret) { dev_err(adsp->dev, "ADSP smmu mapping failed\n"); @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) adsp_unmap_carveout(rproc); disable_irqs: qcom_q6v5_unprepare(>q6v5); +put_pdm: + qcom_pdm_release(); return ret; } @@ -478,6 +485,8 @@ static int adsp_stop(struct rproc *rproc) if (handover) qcom_adsp_pil_handover(>q6v5); + qcom_pdm_release(); + return ret; } diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c index 1779fc890e10..791f11e7adbf 100644 --- a/drivers/remoteproc/qcom_q6v5_mss.c +++ b/drivers/remoteproc/qcom_q6v5_mss.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -1581,10 +1582,14 @@ static int q6v5_start(struct rproc *rproc) int xfermemop_ret; int ret; - ret = q6v5_mba_load(qproc); + ret = qcom_pdm_get(); if (ret) return ret; + ret = q6v5_mba_load(qproc); + if (ret) + goto put_pdm; + dev_info(qproc->dev, "MBA booted with%s debug policy, loading mpss\n", qproc->dp_size ? "" : "out"); @@ -1613,6 +1618,8 @@ static int q6v5_start(struct rproc *rproc) reclaim_mpss: q6v5_mba_reclaim(qproc); q6v5_dump_mba_logs(qproc); +put_pdm: + qcom_pdm_release(); return ret; } @@ -1627,6 +1634,7 @@ static int q6v5_stop(struct rproc *rproc) dev_err(qproc->dev, "timed out on wait\n"); q6v5_mba_reclaim(qproc); + qcom_pdm_release(); return 0; } diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c index 54d8005d40a3..653e54f975fc 100644 --- a/drivers/remoteproc/qcom_q6v5_pas.c +++ b/drivers/remoteproc/qcom_q6v5_pas.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -261,10 +262,14 @@ static int adsp_start(struct rproc *rproc) struct qcom_adsp *adsp = rproc->priv; int ret; - ret = qcom_q6v5_prepare(>q6v5); + ret = qcom_pdm_get(); if (ret) return ret; +
[PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation
Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. Unlike previous revisions of the patchset, this iteration uses static configuration per platform, rather than building it dynamically from the list of DSPs being started. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" -- Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (6): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: qmi: add a way to remove running service soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/Kconfig | 4 + drivers/remoteproc/qcom_q6v5_adsp.c | 11 +- drivers/remoteproc/qcom_q6v5_mss.c | 10 +- drivers/remoteproc/qcom_q6v5_pas.c | 12 +- drivers/remoteproc/qcom_q6v5_wcss.c | 12 +- drivers/soc/qcom/Kconfig| 14 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 6 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 656 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ drivers/soc/qcom/qmi_interface.c| 67 include/linux/soc/qcom/pd_mapper.h | 28 ++ include/linux/soc/qcom/qmi.h| 2 + 14 files changed, 1193 insertions(+), 302 deletions(-) --- base-commit: a59668a9397e7245b26e9be85d23f242ff757ae8 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, -- Dmitry Baryshkov
Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
On Thu, Apr 11, 2024 at 03:03:31PM -0700, Andrew Morton wrote: > A large increase in the maximum number of processes. The change from (some) default to effective infinity is the crux of the change. Because that is only a number. (Thus I don't find the number's 12700% increase alone a big change.) Actual maximum amount of processes is "workload dependent" and hence should be determined based on the particular workload. > Or did I misinterpret? I thought you saw an issue with projection of that number into sizings based on the default. Which of them comprises the large change in your eyes? Thanks, Michal
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
On Thu, 11 Apr 2024 17:40:02 +0200 Michal Koutný wrote: > Hello. > > On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton > wrote: > > That seems like a large change. > > In what sense is it large? A large increase in the maximum number of processes. Or did I misinterpret?
[PATCH 3/5] openrisc: traps: Don't send signals to kernel mode threads
OpenRISC exception handling sends signals to user processes on floating point exceptions and trap instructions (for debugging) among others. There is a bug where the trap handling logic may send signals to kernel threads, we should not send these signals to kernel threads, if that happens we treat it as an error. This patch adds conditions to die if the kernel receives these exceptions in kernel mode code. Fixes: 27267655c531 ("openrisc: Support floating point user api") Signed-off-by: Stafford Horne --- arch/openrisc/kernel/traps.c | 48 ++-- 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c index 88fe27e4c10c..211ddaa0c5fa 100644 --- a/arch/openrisc/kernel/traps.c +++ b/arch/openrisc/kernel/traps.c @@ -180,29 +180,39 @@ asmlinkage void unhandled_exception(struct pt_regs *regs, int ea, int vector) asmlinkage void do_fpe_trap(struct pt_regs *regs, unsigned long address) { - int code = FPE_FLTUNK; - unsigned long fpcsr = regs->fpcsr; - - if (fpcsr & SPR_FPCSR_IVF) - code = FPE_FLTINV; - else if (fpcsr & SPR_FPCSR_OVF) - code = FPE_FLTOVF; - else if (fpcsr & SPR_FPCSR_UNF) - code = FPE_FLTUND; - else if (fpcsr & SPR_FPCSR_DZF) - code = FPE_FLTDIV; - else if (fpcsr & SPR_FPCSR_IXF) - code = FPE_FLTRES; - - /* Clear all flags */ - regs->fpcsr &= ~SPR_FPCSR_ALLF; - - force_sig_fault(SIGFPE, code, (void __user *)regs->pc); + if (user_mode(regs)) { + int code = FPE_FLTUNK; + unsigned long fpcsr = regs->fpcsr; + + if (fpcsr & SPR_FPCSR_IVF) + code = FPE_FLTINV; + else if (fpcsr & SPR_FPCSR_OVF) + code = FPE_FLTOVF; + else if (fpcsr & SPR_FPCSR_UNF) + code = FPE_FLTUND; + else if (fpcsr & SPR_FPCSR_DZF) + code = FPE_FLTDIV; + else if (fpcsr & SPR_FPCSR_IXF) + code = FPE_FLTRES; + + /* Clear all flags */ + regs->fpcsr &= ~SPR_FPCSR_ALLF; + + force_sig_fault(SIGFPE, code, (void __user *)regs->pc); + } else { + pr_emerg("KERNEL: Illegal fpe exception 0x%.8lx\n", regs->pc); + die("Die:", regs, SIGFPE); + } } asmlinkage void do_trap(struct pt_regs *regs, unsigned long address) { - force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc); + if (user_mode(regs)) { + force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->pc); + } else { + pr_emerg("KERNEL: Illegal trap exception 0x%.8lx\n", regs->pc); + die("Die:", regs, SIGILL); + } } asmlinkage void do_unaligned_access(struct pt_regs *regs, unsigned long address) -- 2.44.0
Re: Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
Hello. On Mon, Apr 08, 2024 at 01:29:55PM -0700, Andrew Morton wrote: > That seems like a large change. In what sense is it large? I tried to lookup the code parts that depend on this default and either add the other patches or mention the impact (that part could be more thorough) in the commit message. > It isn't clear why we'd want to merge this patchset. Does it improve > anyone's life and if so, how? - kernel devs who don't care about policy - policy should be decided by distros/users, not in kernel - users who need many threads - current default is too low - this is one more place to look at when configuring - users who want to prevent fork-bombs - current default is ineffective (too high), false feeling of safety - i.e. they should configure appropriate mechanism appropriately I thought that the first point alone would be convincing and that only scaling impact might need clarification. Regards, Michal
[PATCH v2 13/13] mailbox: omap: Remove kernel FIFO message queuing
The kernel FIFO queue has a couple issues. The biggest issue is that it causes extra latency in a path that can be used in real-time tasks, such as communication with real-time remote processors. The whole FIFO idea itself looks to be a leftover from before the unified mailbox framework. The current mailbox framework expects mbox_chan_received_data() to be called with data immediately as it arrives. Remove the FIFO and pass the messages to the mailbox framework directly as part of a threaded IRQ handler. Signed-off-by: Andrew Davis --- drivers/mailbox/Kconfig| 9 --- drivers/mailbox/omap-mailbox.c | 107 ++--- 2 files changed, 5 insertions(+), 111 deletions(-) diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index 42940108a1874..78e4c74fbe5c2 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you want to use OMAP2+ Mailbox framework support. -config OMAP_MBOX_KFIFO_SIZE - int "Mailbox kfifo default buffer size (bytes)" - depends on OMAP2PLUS_MBOX - default 256 - help - Specify the default size of mailbox's kfifo buffers (bytes). - This can also be changed at runtime (via the mbox_kfifo_size - module parameter). - config ROCKCHIP_MBOX bool "Rockchip Soc Integrated Mailbox Support" depends on ARCH_ROCKCHIP || COMPILE_TEST diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c index c5d4083125856..46747559b438f 100644 --- a/drivers/mailbox/omap-mailbox.c +++ b/drivers/mailbox/omap-mailbox.c @@ -65,14 +65,6 @@ struct omap_mbox_fifo { u32 intr_bit; }; -struct omap_mbox_queue { - spinlock_t lock; - struct kfifofifo; - struct work_struct work; - struct omap_mbox*mbox; - bool full; -}; - struct omap_mbox_match_data { u32 intr_type; }; @@ -90,7 +82,6 @@ struct omap_mbox_device { struct omap_mbox { const char *name; int irq; - struct omap_mbox_queue *rxq; struct omap_mbox_device *parent; struct omap_mbox_fifo tx_fifo; struct omap_mbox_fifo rx_fifo; @@ -99,10 +90,6 @@ struct omap_mbox { boolsend_no_irq; }; -static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE; -module_param(mbox_kfifo_size, uint, S_IRUGO); -MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)"); - static inline unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs) { @@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, omap_mbox_irq_t irq) mbox_write_reg(mbox->parent, bit, irqdisable); } -/* - * Message receiver(workqueue) - */ -static void mbox_rx_work(struct work_struct *work) -{ - struct omap_mbox_queue *mq = - container_of(work, struct omap_mbox_queue, work); - u32 msg; - int len; - - while (kfifo_len(>fifo) >= sizeof(msg)) { - len = kfifo_out(>fifo, (unsigned char *), sizeof(msg)); - WARN_ON(len != sizeof(msg)); - - mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg); - spin_lock_irq(>lock); - if (mq->full) { - mq->full = false; - omap_mbox_enable_irq(mq->mbox, IRQ_RX); - } - spin_unlock_irq(>lock); - } -} - /* * Mailbox interrupt handler */ @@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox) static void __mbox_rx_interrupt(struct omap_mbox *mbox) { - struct omap_mbox_queue *mq = mbox->rxq; u32 msg; - int len; while (!mbox_fifo_empty(mbox)) { - if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) { - omap_mbox_disable_irq(mbox, IRQ_RX); - mq->full = true; - goto nomem; - } - msg = mbox_fifo_read(mbox); - - len = kfifo_in(>fifo, (unsigned char *), sizeof(msg)); - WARN_ON(len != sizeof(msg)); + mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg); } - /* no more messages in the fifo. clear IRQ source. */ + /* clear IRQ source. */ ack_mbox_irq(mbox, IRQ_RX); -nomem: - schedule_work(>rxq->work); } static irqreturn_t mbox_interrupt(int irq, void *p) @@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p) return IRQ_HANDLED; } -static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox, - void (*work)(struct work_struct *)) -{ - struct omap_mbox_queue *mq; -
[PATCH v2 fs/proc/bootconfig 2/2] fs/proc: Skip bootloader comment if no embedded kernel parameters
From: Masami Hiramatsu If the "bootconfig" kernel command-line argument was specified or if the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are no embedded kernel parameter, omit the "# Parameters from bootloader:" comment from the /proc/bootconfig file. This will cause automation to fall back to the /proc/cmdline file, which will be identical to the comment in this no-embedded-kernel-parameters case. Signed-off-by: Masami Hiramatsu Signed-off-by: Paul E. McKenney --- fs/proc/bootconfig.c | 2 +- include/linux/bootconfig.h | 1 + init/main.c| 5 + 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/proc/bootconfig.c b/fs/proc/bootconfig.c index e5635a6b127b0..87dcaae32ff87 100644 --- a/fs/proc/bootconfig.c +++ b/fs/proc/bootconfig.c @@ -63,7 +63,7 @@ static int __init copy_xbc_key_value_list(char *dst, size_t size) dst += ret; } } - if (ret >= 0 && boot_command_line[0]) { + if (cmdline_has_extra_options() && ret >= 0 && boot_command_line[0]) { ret = snprintf(dst, rest(dst, end), "# Parameters from bootloader:\n# %s\n", boot_command_line); if (ret > 0) diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h index ca73940e26df8..e5ee2c694401e 100644 --- a/include/linux/bootconfig.h +++ b/include/linux/bootconfig.h @@ -10,6 +10,7 @@ #ifdef __KERNEL__ #include #include +bool __init cmdline_has_extra_options(void); #else /* !__KERNEL__ */ /* * NOTE: This is only for tools/bootconfig, because tools/bootconfig will diff --git a/init/main.c b/init/main.c index 2ca52474d0c30..881f6230ee59e 100644 --- a/init/main.c +++ b/init/main.c @@ -487,6 +487,11 @@ static int __init warn_bootconfig(char *str) early_param("bootconfig", warn_bootconfig); +bool __init cmdline_has_extra_options(void) +{ + return extra_command_line || extra_init_args; +} + /* Change NUL term back to "=", to make "param" the whole string. */ static void __init repair_env_string(char *param, char *val) { -- 2.40.1
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
Hi Michal, kernel test robot noticed the following build errors: [auto build test ERROR on fec50db7033ea478773b159e0e2efb135270e3b7] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031 base: fec50db7033ea478773b159e0e2efb135270e3b7 patch link: https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/config) compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8b3b4a92adee40483c27f26c478a384cd69c6f05) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240409/202404090903.3jz667sn-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202404090903.3jz667sn-...@intel.com/ All errors (new ones prefixed by >>): In file included from kernel/sysctl.c:23: In file included from include/linux/mm.h:2208: include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~ ^ ~~~ >> kernel/sysctl.c:1819:14: error: initializing 'void *' with an expression of >> type 'const int *' discards qualifiers >> [-Werror,-Wincompatible-pointer-types-discards-qualifiers] 1819 | .extra2 = _max_max, | ^~~~ 1 warning and 1 error generated. vim +1819 kernel/sysctl.c f461d2dcd511c0 Christoph Hellwig 2020-04-24 1617 f461d2dcd511c0 Christoph Hellwig 2020-04-24 1618 static struct ctl_table kern_table[] = { ^1da177e4c3f41 Linus Torvalds 2005-04-16 1619 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1620 .procname = "panic", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1621 .data = _timeout, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1622 .maxlen = sizeof(int), 49f0ce5f92321c Jerome Marchand 2014-01-21 1623 .mode = 0644, 6d4561110a3e9f Eric W. Biederman 2009-11-16 1624 .proc_handler = proc_dointvec, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1625 }, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1626 #ifdef CONFIG_PROC_SYSCTL ^1da177e4c3f41 Linus Torvalds 2005-04-16 1627 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1628 .procname = "tainted", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1629 .maxlen = sizeof(long), ^1da177e4c3f41 Linus Torvalds 2005-04-16 1630 .mode = 0644, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1631 .proc_handler = proc_taint, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1632 }, 2da02997e08d3e David Rientjes 2009-01-06 1633 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1634 .procname = "sysctl_writes_strict", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1635 .data = _writes_strict, 9e3961a0979817 Prarit Bhargava 2014-12-10 1636 .maxlen = sizeof(int), 2da02997e08d3e David Rientjes 2009-01-06 1637 .mode = 0644, 9e3961a0979817 Prarit Bhargava 2014-12-10 1638 .proc_handler = proc_dointvec_minmax, 78e36f3b0dae58 Xiaoming Ni 2022-01-21 1639 .extra1 = SYSCTL_NEG_ONE, eec4844fae7c03 Matteo Croce2019-07-18 1640 .extra2 = SYSCTL_ONE, 2da02997e08d3e David Rientjes 2009-01-06 1641 }, 964c9dff009189 Alexander Popov 2018-08-17 1642 #endif 1efff914afac8a Theodore Ts'o 2015-03-17 1643 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1644 .procname = "print-fatal-signals", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1645 .data = _fatal_signals, 964c9dff009189 Alexander Popov 2018-08-17 1646 .maxlen = sizeof(int), 1efff914afac8a Theodore Ts'o 2015-03-17 1647 .mode = 0644, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1648 .proc_handler = proc_dointvec, 1efff914afac8a Theodore Ts'o 2015-03-17 1649 }, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1650 #ifdef CONFIG_SPARC ^1da177e4c3f41 Linus Torvalds 2005-04-16 1651 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1652 .procname = "reb
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
Hi Michal, kernel test robot noticed the following build warnings: [auto build test WARNING on fec50db7033ea478773b159e0e2efb135270e3b7] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Koutn/tracing-Remove-dependency-of-saved_cmdlines_buffer-on-PID_MAX_DEFAULT/20240408-230031 base: fec50db7033ea478773b159e0e2efb135270e3b7 patch link: https://lore.kernel.org/r/20240408145819.8787-3-mkoutny%40suse.com patch subject: [PATCH 2/3] kernel/pid: Remove default pid_max value config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/config) compiler: alpha-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240409/202404090849.mgj3z0xi-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202404090849.mgj3z0xi-...@intel.com/ All warnings (new ones prefixed by >>): >> kernel/sysctl.c:1819:35: warning: initialization discards 'const' qualifier >> from pointer target type [-Wdiscarded-qualifiers] 1819 | .extra2 = _max_max, | ^ vim +/const +1819 kernel/sysctl.c f461d2dcd511c0 Christoph Hellwig 2020-04-24 1617 f461d2dcd511c0 Christoph Hellwig 2020-04-24 1618 static struct ctl_table kern_table[] = { ^1da177e4c3f41 Linus Torvalds 2005-04-16 1619 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1620 .procname = "panic", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1621 .data = _timeout, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1622 .maxlen = sizeof(int), 49f0ce5f92321c Jerome Marchand 2014-01-21 1623 .mode = 0644, 6d4561110a3e9f Eric W. Biederman 2009-11-16 1624 .proc_handler = proc_dointvec, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1625 }, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1626 #ifdef CONFIG_PROC_SYSCTL ^1da177e4c3f41 Linus Torvalds 2005-04-16 1627 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1628 .procname = "tainted", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1629 .maxlen = sizeof(long), ^1da177e4c3f41 Linus Torvalds 2005-04-16 1630 .mode = 0644, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1631 .proc_handler = proc_taint, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1632 }, 2da02997e08d3e David Rientjes 2009-01-06 1633 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1634 .procname = "sysctl_writes_strict", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1635 .data = _writes_strict, 9e3961a0979817 Prarit Bhargava 2014-12-10 1636 .maxlen = sizeof(int), 2da02997e08d3e David Rientjes 2009-01-06 1637 .mode = 0644, 9e3961a0979817 Prarit Bhargava 2014-12-10 1638 .proc_handler = proc_dointvec_minmax, 78e36f3b0dae58 Xiaoming Ni 2022-01-21 1639 .extra1 = SYSCTL_NEG_ONE, eec4844fae7c03 Matteo Croce2019-07-18 1640 .extra2 = SYSCTL_ONE, 2da02997e08d3e David Rientjes 2009-01-06 1641 }, 964c9dff009189 Alexander Popov 2018-08-17 1642 #endif 1efff914afac8a Theodore Ts'o 2015-03-17 1643 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1644 .procname = "print-fatal-signals", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1645 .data = _fatal_signals, 964c9dff009189 Alexander Popov 2018-08-17 1646 .maxlen = sizeof(int), 1efff914afac8a Theodore Ts'o 2015-03-17 1647 .mode = 0644, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1648 .proc_handler = proc_dointvec, 1efff914afac8a Theodore Ts'o 2015-03-17 1649 }, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1650 #ifdef CONFIG_SPARC ^1da177e4c3f41 Linus Torvalds 2005-04-16 1651 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1652 .procname = "reboot-cmd", f461d2dcd511c0 Christoph Hellwig 2020-04-24 1653 .data = reboot_command, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1654 .maxlen = 256, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1655 .mode = 0644, f461d2dcd511c0 Christoph Hellwig 2020-04-24 1656 .proc_handler = proc_dostring, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1657 }, ^1da177e4c3f41 Linus Torvalds 2005-04-16 1658 { f461d2dcd511c0 Christoph Hellwig 2020-04-24 1659 .procname
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
On Mon, 8 Apr 2024 16:58:18 +0200 Michal Koutný wrote: > The kernel provides mechanisms, while it should not imply policies -- > default pid_max seems to be an example of the policy that does not fit > all. At the same time pid_max must have some value assigned, so use the > end of the allowed range -- pid_max_max. > > This change thus increases initial pid_max from 32k to 4M (x86_64 > defconfig). That seems like a large change. It isn't clear why we'd want to merge this patchset. Does it improve anyone's life and if so, how?
[PATCH 2/3] kernel/pid: Remove default pid_max value
pid_max is a per-pidns (thus global too) limit on a number of tasks the kernel admits. The knob can be configured by admin in the range between pid_max_min and pid_max_max (sic). The default value sits between those and it typically equals max(32k, 1k*nr_cpus). The nr_cpu scaling was introduced in commit 72680a191b93 ("pids: increase pid_max based on num_possible_cpus") to accommodate kernel's own helper tasks (before workqueues). Generally, 1024 tasks/cpu cap is too much if they were all running and it is also too little when they are idle (memory being bottleneck). The kernel also provides other mechanisms to restrict number of tasks -- threads-max sysctl and RLIMIT_NPROC with memory-scaled defaults and generic pids cgroup controller (the last one being the solution of fork-bombs, with qualified limits set up by admin). The kernel provides mechanisms, while it should not imply policies -- default pid_max seems to be an example of the policy that does not fit all. At the same time pid_max must have some value assigned, so use the end of the allowed range -- pid_max_max. This change thus increases initial pid_max from 32k to 4M (x86_64 defconfig). This has effect on size of structure that alloc_pid/idr_alloc_cyclic eventually uses and structure that kernel tracing uses with 'record-tgid' (~16 MiB). Signed-off-by: Michal Koutný --- include/linux/pid.h | 4 ++-- include/linux/threads.h | 15 ------- kernel/pid.c| 8 +++- 3 files changed, 9 insertions(+), 18 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index a3aad9b4074c..0d191ac02958 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -106,8 +106,8 @@ extern void exchange_tids(struct task_struct *task, struct task_struct *old); extern void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type); -extern int pid_max; -extern int pid_max_min, pid_max_max; +extern int pid_max_min, pid_max; +extern const int pid_max_max; /* * look up a PID in the hash table. Must be called with the tasklist_lock diff --git a/include/linux/threads.h b/include/linux/threads.h index c34173e6c5f1..43f8f38a0c13 100644 --- a/include/linux/threads.h +++ b/include/linux/threads.h @@ -22,25 +22,18 @@ #define MIN_THREADS_LEFT_FOR_ROOT 4 -/* - * This controls the default maximum pid allocated to a process - */ -#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000) - /* * A maximum of 4 million PIDs should be enough for a while. * [NOTE: PID/TIDs are limited to 2^30 ~= 1 billion, see FUTEX_TID_MASK.] */ #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ - (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) + (sizeof(long) > 4 ? 4 * 1024 * 1024 : 0x8000)) /* - * Define a minimum number of pids per cpu. Heuristically based - * on original pid max of 32k for 32 cpus. Also, increase the - * minimum settable value for pid_max on the running system based - * on similar defaults. See kernel/pid.c:pid_idr_init() for details. + * Define a minimum number of pids per cpu. Mainly to accommodate + * smpboot_register_percpu_thread() kernel threads. + * See kernel/pid.c:pid_idr_init() for details. */ -#define PIDS_PER_CPU_DEFAULT 1024 #define PIDS_PER_CPU_MIN 8 #endif diff --git a/kernel/pid.c b/kernel/pid.c index da76ed1873f7..24ae505ac3b0 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -60,10 +60,10 @@ struct pid init_struct_pid = { }, } }; -int pid_max = PID_MAX_DEFAULT; +int pid_max = PID_MAX_LIMIT; int pid_max_min = RESERVED_PIDS + 1; -int pid_max_max = PID_MAX_LIMIT; +const int pid_max_max = PID_MAX_LIMIT; /* * Pseudo filesystems start inode numbering after one. We use Reserved * PIDs as a natural offset. @@ -652,9 +652,7 @@ void __init pid_idr_init(void) /* Verify no one has done anything silly: */ BUILD_BUG_ON(PID_MAX_LIMIT >= PIDNS_ADDING); - /* bump default and minimum pid_max based on number of cpus */ - pid_max = min(pid_max_max, max_t(int, pid_max, - PIDS_PER_CPU_DEFAULT * num_possible_cpus())); + /* bump minimum pid_max based on number of cpus */ pid_max_min = max_t(int, pid_max_min, PIDS_PER_CPU_MIN * num_possible_cpus()); pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min); -- 2.44.0
[PATCH 0/3] kernel/pid: Remove default pid_max value
TL;DR excerpt from commit 02/03: The kernel provides mechanisms, while it should not imply policies -- default pid_max seems to be an example of the policy that does not fit all. At the same time pid_max must have some value assigned, so use the end of the allowed range -- pid_max_max. More details are in that commit's message. The other two commits are related preparation and less related refresh in code that somewhat references pid_max. Michal Koutný (3): tracing: Remove dependency of saved_cmdlines_buffer on PID_MAX_DEFAULT kernel/pid: Remove default pid_max value tracing: Compare pid_max against pid_list capacity include/linux/pid.h | 4 ++-- include/linux/threads.h | 15 --- kernel/pid.c | 8 +++- kernel/trace/pid_list.c | 6 +++--- kernel/trace/pid_list.h | 4 ++-- kernel/trace/trace_sched_switch.c | 11 ++- 6 files changed, 20 insertions(+), 28 deletions(-) base-commit: fec50db7033ea478773b159e0e2efb135270e3b7 -- 2.44.0
Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing
On 4/1/24 6:39 PM, Hari Nagalla wrote: On 3/25/24 12:20, Andrew Davis wrote: The kernel FIFO queue has a couple issues. The biggest issue is that it causes extra latency in a path that can be used in real-time tasks, such as communication with real-time remote processors. The whole FIFO idea itself looks to be a leftover from before the unified mailbox framework. The current mailbox framework expects mbox_chan_received_data() to be called with data immediately as it arrives. Remove the FIFO and pass the messages to the mailbox framework directly. Yes, this would definitely speed up the message receive path. With RT linux, the irq runs in thread context, so that is Ok. But with non-RT the whole receive path runs in interrupt context. So, i think it would be appropriate to use a threaded_irq()? I was thinking the same at first, but seems some mailbox drivers use threaded, others use non-threaded context. Since all we do in the IRQ context anymore is call mbox_chan_received_data(), which is supposed to be IRQ safe, then it should be fine either way. So for now I just kept this using the regular IRQ context as before. If that does turn out to be an issue then let's switch to threaded. Andrew
Re: [PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing
On 3/25/24 12:20, Andrew Davis wrote: The kernel FIFO queue has a couple issues. The biggest issue is that it causes extra latency in a path that can be used in real-time tasks, such as communication with real-time remote processors. The whole FIFO idea itself looks to be a leftover from before the unified mailbox framework. The current mailbox framework expects mbox_chan_received_data() to be called with data immediately as it arrives. Remove the FIFO and pass the messages to the mailbox framework directly. Yes, this would definitely speed up the message receive path. With RT linux, the irq runs in thread context, so that is Ok. But with non-RT the whole receive path runs in interrupt context. So, i think it would be appropriate to use a threaded_irq()?
[PATCH 13/13] mailbox: omap: Remove kernel FIFO message queuing
The kernel FIFO queue has a couple issues. The biggest issue is that it causes extra latency in a path that can be used in real-time tasks, such as communication with real-time remote processors. The whole FIFO idea itself looks to be a leftover from before the unified mailbox framework. The current mailbox framework expects mbox_chan_received_data() to be called with data immediately as it arrives. Remove the FIFO and pass the messages to the mailbox framework directly. Signed-off-by: Andrew Davis --- drivers/mailbox/Kconfig| 9 --- drivers/mailbox/omap-mailbox.c | 103 + 2 files changed, 3 insertions(+), 109 deletions(-) diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index 42940108a1874..78e4c74fbe5c2 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -68,15 +68,6 @@ config OMAP2PLUS_MBOX OMAP2/3; or IPU, IVA HD and DSP in OMAP4/5. Say Y here if you want to use OMAP2+ Mailbox framework support. -config OMAP_MBOX_KFIFO_SIZE - int "Mailbox kfifo default buffer size (bytes)" - depends on OMAP2PLUS_MBOX - default 256 - help - Specify the default size of mailbox's kfifo buffers (bytes). - This can also be changed at runtime (via the mbox_kfifo_size - module parameter). - config ROCKCHIP_MBOX bool "Rockchip Soc Integrated Mailbox Support" depends on ARCH_ROCKCHIP || COMPILE_TEST diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c index c5d4083125856..4e7e0e2f537b0 100644 --- a/drivers/mailbox/omap-mailbox.c +++ b/drivers/mailbox/omap-mailbox.c @@ -65,14 +65,6 @@ struct omap_mbox_fifo { u32 intr_bit; }; -struct omap_mbox_queue { - spinlock_t lock; - struct kfifofifo; - struct work_struct work; - struct omap_mbox*mbox; - bool full; -}; - struct omap_mbox_match_data { u32 intr_type; }; @@ -90,7 +82,6 @@ struct omap_mbox_device { struct omap_mbox { const char *name; int irq; - struct omap_mbox_queue *rxq; struct omap_mbox_device *parent; struct omap_mbox_fifo tx_fifo; struct omap_mbox_fifo rx_fifo; @@ -99,10 +90,6 @@ struct omap_mbox { boolsend_no_irq; }; -static unsigned int mbox_kfifo_size = CONFIG_OMAP_MBOX_KFIFO_SIZE; -module_param(mbox_kfifo_size, uint, S_IRUGO); -MODULE_PARM_DESC(mbox_kfifo_size, "Size of omap's mailbox kfifo (bytes)"); - static inline unsigned int mbox_read_reg(struct omap_mbox_device *mdev, size_t ofs) { @@ -202,30 +189,6 @@ static void omap_mbox_disable_irq(struct omap_mbox *mbox, omap_mbox_irq_t irq) mbox_write_reg(mbox->parent, bit, irqdisable); } -/* - * Message receiver(workqueue) - */ -static void mbox_rx_work(struct work_struct *work) -{ - struct omap_mbox_queue *mq = - container_of(work, struct omap_mbox_queue, work); - u32 msg; - int len; - - while (kfifo_len(>fifo) >= sizeof(msg)) { - len = kfifo_out(>fifo, (unsigned char *), sizeof(msg)); - WARN_ON(len != sizeof(msg)); - - mbox_chan_received_data(mq->mbox->chan, (void *)(uintptr_t)msg); - spin_lock_irq(>lock); - if (mq->full) { - mq->full = false; - omap_mbox_enable_irq(mq->mbox, IRQ_RX); - } - spin_unlock_irq(>lock); - } -} - /* * Mailbox interrupt handler */ @@ -238,27 +201,15 @@ static void __mbox_tx_interrupt(struct omap_mbox *mbox) static void __mbox_rx_interrupt(struct omap_mbox *mbox) { - struct omap_mbox_queue *mq = mbox->rxq; u32 msg; - int len; while (!mbox_fifo_empty(mbox)) { - if (unlikely(kfifo_avail(>fifo) < sizeof(msg))) { - omap_mbox_disable_irq(mbox, IRQ_RX); - mq->full = true; - goto nomem; - } - msg = mbox_fifo_read(mbox); - - len = kfifo_in(>fifo, (unsigned char *), sizeof(msg)); - WARN_ON(len != sizeof(msg)); + mbox_chan_received_data(mbox->chan, (void *)(uintptr_t)msg); } - /* no more messages in the fifo. clear IRQ source. */ + /* clear IRQ source. */ ack_mbox_irq(mbox, IRQ_RX); -nomem: - schedule_work(>rxq->work); } static irqreturn_t mbox_interrupt(int irq, void *p) @@ -274,57 +225,15 @@ static irqreturn_t mbox_interrupt(int irq, void *p) return IRQ_HANDLED; } -static struct omap_mbox_queue *mbox_queue_alloc(struct omap_mbox *mbox, - void (*work)(struct work_struct *)) -{ - struct omap_mbox_queue *mq; - unsigned int size; - -
[PATCH -next] fs: Fix kernel-doc comments to functions
This commit fix kernel-doc style comments with complete parameter descriptions for the lookup_file(),lookup_dir_entry() and lookup_file_dentry(). Signed-off-by: Yang Li --- fs/tracefs/event_inode.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index dc067eeb6387..894c6ca1e500 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -336,6 +336,7 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode, /** * lookup_file - look up a file in the tracefs filesystem + * @parent_ei: Pointer to the eventfs_inode that represents parent of the file * @dentry: the dentry to look up * @mode: the permission that the file should have. * @attr: saved attributes changed by user @@ -389,6 +390,7 @@ static struct dentry *lookup_file(struct eventfs_inode *parent_ei, /** * lookup_dir_entry - look up a dir in the tracefs filesystem * @dentry: the directory to look up + * @pei: Pointer to the parent eventfs_inode if available * @ei: the eventfs_inode that represents the directory to create * * This function will look up a dentry for a directory represented by @@ -478,16 +480,20 @@ void eventfs_d_release(struct dentry *dentry) /** * lookup_file_dentry - create a dentry for a file of an eventfs_inode + * @dentry: The parent dentry under which the new file's dentry will be created * @ei: the eventfs_inode that the file will be created under * @idx: the index into the entry_attrs[] of the @ei - * @parent: The parent dentry of the created file. - * @name: The name of the file to create * @mode: The mode of the file. * @data: The data to use to set the inode of the file with on open() * @fops: The fops of the file to be created. * - * Create a dentry for a file of an eventfs_inode @ei and place it into the - * address located at @e_dentry. + * This function creates a dentry for a file associated with an + * eventfs_inode @ei. It uses the entry attributes specified by @idx, + * if available. The file will have the specified @mode and its inode will be + * set up with @data upon open. The file operations will be set to @fops. + * + * Return: Returns a pointer to the newly created file's dentry or an error + * pointer. */ static struct dentry * lookup_file_dentry(struct dentry *dentry, -- 2.20.1.7.g153144c
[PATCH v7 6/7] LoongArch: Add pv ipi support on guest kernel side
PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index b274784c2e26..a1fccaf117aa 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -578,6 +578,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); } diff --git a/arch/loongarch/kernel/paravir
[PATCH v6 6/7] LoongArch: Add pv ipi support on guest kernel side
PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 929f68926b34..fdaae9a0435c 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); } diff --git a/arch/loongarch/kernel/paravir
Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/24 下午5:15, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao wrote: Paravirt interface pv_ipi_init() is added here for guest kernel, it firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current VMM type. Now only KVM VMM type is detected,the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. There is not effective with pv_ipi_init() now, it is dummy function. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 1 + 7 files changed, 87 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 929f68926b34..fdaae9a0435c 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index d48f993ae206..af5d677a9052 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypercall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypercall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..5cf794e8490f --- /dev/null +++ b/arch/loongarch/kernel/paravirt.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); + +static bool kvm_para_available(void) +{ + static int hypervisor_type; + int config; + + if (!hypervisor_type) { + config = read_cpucfg(CPUCFG_KVM_SIG); + if (!memcmp(, KVM_SIGNATURE, 4)) + hypervisor_type = HYPERVISOR_KVM; + } + + return hypervisor_type == HYPERVISOR_KVM; +} + +int __init pv_ipi_init(void) +{ + if (!cp
Re: [PATCH] ftrace: fix most kernel-doc warnings
On Thu, 22 Feb 2024 21:48:33 -0800 Randy Dunlap wrote: > Reduce the number of kernel-doc warnings from 52 down to 10, i.e., > fix 42 kernel-doc warnings by (a) using the Returns: format for > function return values or (b) using "@var:" instead of "@var -" > for function parameter descriptions. > > Fix one return values list so that it is formatted correctly when > rendered for output. > > Spell "non-zero" with a hyphen in several places. Looks good to me. Acked-by: Masami Hiramatsu (Google) Thanks! > > Signed-off-by: Randy Dunlap > Reported-by: kernel test robot > Link: > https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/ > Cc: Steven Rostedt > Cc: Masami Hiramatsu > Cc: Mathieu Desnoyers > Cc: Mark Rutland > Cc: linux-trace-ker...@vger.kernel.org > --- > This patch addresses most of the reported kernel-doc warnings but does > not fix all of them, so I did not use "Closes:" for the Link: tag. > > kernel/trace/ftrace.c | 90 ++++---- > 1 file changed, 46 insertions(+), 44 deletions(-) > > diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h > * Search a given @hash to see if a given instruction pointer (@ip) > * exists in it. > * > - * Returns the entry that holds the @ip if found. NULL otherwise. > + * Returns: the entry that holds the @ip if found. NULL otherwise. > */ > struct ftrace_func_entry * > ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip) > @@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct > > /** > * ftrace_free_filter - remove all filters for an ftrace_ops > - * @ops - the ops to remove the filters from > + * @ops: the ops to remove the filters from > */ > void ftrace_free_filter(struct ftrace_ops *ops) > { > @@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns > * @end: end of range to search (inclusive). @end points to the last byte > * to check. > * > - * Returns rec->ip if the related ftrace location is a least partly within > + * Returns: rec->ip if the related ftrace location is a least partly within > * the given address range. That is, the first address of the instruction > * that is either a NOP or call to the function tracer. It checks the ftrace > * internal tables to determine if the address belongs or not. > @@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi > * ftrace_location - return the ftrace location > * @ip: the instruction pointer to check > * > - * If @ip matches the ftrace location, return @ip. > - * If @ip matches sym+0, return sym's ftrace location. > - * Otherwise, return 0. > + * Returns: > + * * If @ip matches the ftrace location, return @ip. > + * * If @ip matches sym+0, return sym's ftrace location. > + * * Otherwise, return 0. > */ > unsigned long ftrace_location(unsigned long ip) > { > @@ -1639,7 +1640,7 @@ out: > * @start: start of range to search > * @end: end of range to search (inclusive). @end points to the last byte to > check. > * > - * Returns 1 if @start and @end contains a ftrace location. > + * Returns: 1 if @start and @end contains a ftrace location. > * That is, the instruction that is either a NOP or call to > * the function tracer. It checks the ftrace internal tables to > * determine if the address belongs or not. > @@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l > * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS > * is not set, then it wants to convert to the normal callback. > * > - * Returns the address of the trampoline to set to > + * Returns: the address of the trampoline to set to > */ > unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec) > { > @@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct > * a function that saves all the regs. Basically the '_EN' version > * represents the current state of the function. > * > - * Returns the address of the trampoline that is currently being called > + * Returns: the address of the trampoline that is currently being called > */ > unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec) > { > @@ -2719,7 +2720,7 @@ struct ftrace_rec_iter { > /** > * ftrace_rec_iter_start - start up iterating over traced functions > * > - * Returns an iterator handle that is used to iterate over all > + * Returns: an iterator handle that is used to iterate over all > * the records that represent address locations where functions > * are traced. > * > @@ -2751,7 +2
Re: [PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel
Hi, Bibo, On Thu, Feb 22, 2024 at 11:28 AM Bibo Mao wrote: > > Paravirt interface pv_ipi_init() is added here for guest kernel, it > firstly checks whether system runs on VM mode. If kernel runs on VM mode, > it will call function kvm_para_available() to detect current VMM type. > Now only KVM VMM type is detected,the paravirt function can work only if > current VMM is KVM hypervisor, since there is only KVM hypervisor > supported on LoongArch now. > > There is not effective with pv_ipi_init() now, it is dummy function. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 > arch/loongarch/include/asm/kvm_para.h | 7 > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/paravirt.c | 41 +++ > arch/loongarch/kernel/setup.c | 1 + > 7 files changed, 87 insertions(+) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 929f68926b34..fdaae9a0435c 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance significantly > + over full virtualization. However, when run without a hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index d48f993ae206..af5d677a9052 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -2,6 +2,13 @@ > #ifndef _ASM_LOONGARCH_KVM_PARA_H > #define _ASM_LOONGARCH_KVM_PARA_H > > +/* > + * Hypercall code field > + */ > +#define HYPERVISOR_KVM 1 > +#define HYPERVISOR_VENDOR_SHIFT8 > +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) > + code) > + > /* > * LoongArch hypercall return code > */ > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..58f7b7b89f2c > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} > + > +int pv_ipi_init(void); > +#else > +static inline int pv_ipi_init(void) > +{ > + return 0; > +} > + > +#endif // CONFIG_PARAVIRT > +#endif > diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > b/arch/loongarch/include/asm/paravirt_api_clock.h > new file mode 100644 > index ..65ac7cee0dad > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt_api_clock.h > @@ -0,0 +1 @@ > +#include > diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile > index 3c808c680370..662e6e9de12d 100644 > --- a/arch/loongarch/kernel/Makefile > +++ b/arch/loongarch/kernel/Makefile > @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-$(CONFIG_PROC_FS) += proc.o > +obj-$(CONFIG_PARAVIRT) += paravirt.o > > obj-$(CONFIG_SMP) += smp.o > > diff --git a/arch/loongarch/kernel/paravirt.c > b/arch/loongarch/kernel/paravirt.c > new file mode 100644 > index ..5cf794e8490f > --- /dev/null > +++ b/arch/loongarch/kernel/paravirt.c > @@ -0,0 +1,41 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct static_key paravirt_steal_enabled; &
[PATCH] ftrace: fix most kernel-doc warnings
Reduce the number of kernel-doc warnings from 52 down to 10, i.e., fix 42 kernel-doc warnings by (a) using the Returns: format for function return values or (b) using "@var:" instead of "@var -" for function parameter descriptions. Fix one return values list so that it is formatted correctly when rendered for output. Spell "non-zero" with a hyphen in several places. Signed-off-by: Randy Dunlap Reported-by: kernel test robot Link: https://lore.kernel.org/oe-kbuild-all/202312180518.x6frydsn-...@intel.com/ Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Cc: Mark Rutland Cc: linux-trace-ker...@vger.kernel.org --- This patch addresses most of the reported kernel-doc warnings but does not fix all of them, so I did not use "Closes:" for the Link: tag. kernel/trace/ftrace.c | 90 1 file changed, 46 insertions(+), 44 deletions(-) diff -- a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1160,7 +1160,7 @@ __ftrace_lookup_ip(struct ftrace_hash *h * Search a given @hash to see if a given instruction pointer (@ip) * exists in it. * - * Returns the entry that holds the @ip if found. NULL otherwise. + * Returns: the entry that holds the @ip if found. NULL otherwise. */ struct ftrace_func_entry * ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip) @@ -1282,7 +1282,7 @@ static void free_ftrace_hash_rcu(struct /** * ftrace_free_filter - remove all filters for an ftrace_ops - * @ops - the ops to remove the filters from + * @ops: the ops to remove the filters from */ void ftrace_free_filter(struct ftrace_ops *ops) { @@ -1587,7 +1587,7 @@ static struct dyn_ftrace *lookup_rec(uns * @end: end of range to search (inclusive). @end points to the last byte * to check. * - * Returns rec->ip if the related ftrace location is a least partly within + * Returns: rec->ip if the related ftrace location is a least partly within * the given address range. That is, the first address of the instruction * that is either a NOP or call to the function tracer. It checks the ftrace * internal tables to determine if the address belongs or not. @@ -1607,9 +1607,10 @@ unsigned long ftrace_location_range(unsi * ftrace_location - return the ftrace location * @ip: the instruction pointer to check * - * If @ip matches the ftrace location, return @ip. - * If @ip matches sym+0, return sym's ftrace location. - * Otherwise, return 0. + * Returns: + * * If @ip matches the ftrace location, return @ip. + * * If @ip matches sym+0, return sym's ftrace location. + * * Otherwise, return 0. */ unsigned long ftrace_location(unsigned long ip) { @@ -1639,7 +1640,7 @@ out: * @start: start of range to search * @end: end of range to search (inclusive). @end points to the last byte to check. * - * Returns 1 if @start and @end contains a ftrace location. + * Returns: 1 if @start and @end contains a ftrace location. * That is, the instruction that is either a NOP or call to * the function tracer. It checks the ftrace internal tables to * determine if the address belongs or not. @@ -2574,7 +2575,7 @@ static void call_direct_funcs(unsigned l * wants to convert to a callback that saves all regs. If FTRACE_FL_REGS * is not set, then it wants to convert to the normal callback. * - * Returns the address of the trampoline to set to + * Returns: the address of the trampoline to set to */ unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec) { @@ -2615,7 +2616,7 @@ unsigned long ftrace_get_addr_new(struct * a function that saves all the regs. Basically the '_EN' version * represents the current state of the function. * - * Returns the address of the trampoline that is currently being called + * Returns: the address of the trampoline that is currently being called */ unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec) { @@ -2719,7 +2720,7 @@ struct ftrace_rec_iter { /** * ftrace_rec_iter_start - start up iterating over traced functions * - * Returns an iterator handle that is used to iterate over all + * Returns: an iterator handle that is used to iterate over all * the records that represent address locations where functions * are traced. * @@ -2751,7 +2752,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_ * ftrace_rec_iter_next - get the next record to process. * @iter: The handle to the iterator. * - * Returns the next iterator after the given iterator @iter. + * Returns: the next iterator after the given iterator @iter. */ struct ftrace_rec_iter *ftrace_rec_iter_next(struct ftrace_rec_iter *iter) { @@ -2776,7 +2777,7 @@ struct ftrace_rec_iter *ftrace_rec_iter_ * ftrace_rec_iter_record - get the record at the iterator location * @iter: The current iterator location * - * Returns the record that the current @iter is at. + * Returns: the record that the current @iter is at. */ st
[PATCH v5 4/6] LoongArch: Add paravirt interface for guest kernel
Paravirt interface pv_ipi_init() is added here for guest kernel, it firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current VMM type. Now only KVM VMM type is detected,the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. There is not effective with pv_ipi_init() now, it is dummy function. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 1 + 7 files changed, 87 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 929f68926b34..fdaae9a0435c 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -587,6 +587,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index d48f993ae206..af5d677a9052 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypercall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypercall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..5cf794e8490f --- /dev/null +++ b/arch/loongarch/kernel/paravirt.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); + +static bool kvm_para_available(void) +{ + static int hypervisor_type; + int config; + + if (!hypervisor_type) { + config = read_cpucfg(CPUCFG_KVM_SIG); + if (!memcmp(, KVM_SIGNATURE, 4)) + hypervisor_type = HYPERVISOR_KVM; + } + + return hypervisor_type == HYPERVISOR_KVM; +} + +int __init pv_ipi_init(void) +{ + if (!cpu_has_hypervisor) + return 0; + if (!kvm_para_available()) + return 0; + + return 1; +} diff --git
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 下午5:38, Huacai Chen wrote: On Mon, Feb 19, 2024 at 5:21 PM maobibo wrote: On 2024/2/19 下午4:48, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } OK, then keep it. + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE)
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On Mon, Feb 19, 2024 at 5:21 PM maobibo wrote: > > > > On 2024/2/19 下午4:48, Huacai Chen wrote: > > On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: > >> > >> > >> > >> On 2024/2/19 上午10:42, Huacai Chen wrote: > >>> Hi, Bibo, > >>> > >>> On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > >>>> > >>>> The patch adds paravirt interface for guest kernel, function > >>>> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > >>>> runs on VM mode, it will call function kvm_para_available() to detect > >>>> whether current VMM is KVM hypervisor. And the paravirt function can work > >>>> only if current VMM is KVM hypervisor, since there is only KVM hypervisor > >>>> supported on LoongArch now. > >>>> > >>>> This patch only adds paravirt interface for guest kernel, however there > >>>> is not effective pv functions added here. > >>>> > >>>> Signed-off-by: Bibo Mao > >>>> --- > >>>>arch/loongarch/Kconfig| 9 ++++ > >>>>arch/loongarch/include/asm/kvm_para.h | 7 ++++ > >>>>arch/loongarch/include/asm/paravirt.h | 27 > >>>>.../include/asm/paravirt_api_clock.h | 1 + > >>>>arch/loongarch/kernel/Makefile| 1 + > >>>> arch/loongarch/kernel/paravirt.c | 41 +++ > >>>>arch/loongarch/kernel/setup.c | 2 + > >>>>7 files changed, 88 insertions(+) > >>>>create mode 100644 arch/loongarch/include/asm/paravirt.h > >>>>create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > >>>>create mode 100644 arch/loongarch/kernel/paravirt.c > >>>> > >>>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > >>>> index 10959e6c3583..817a56dff80f 100644 > >>>> --- a/arch/loongarch/Kconfig > >>>> +++ b/arch/loongarch/Kconfig > >>>> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > >>>> bool > >>>> default y > >>>> > >>>> +config PARAVIRT > >>>> + bool "Enable paravirtualization code" > >>>> + depends on AS_HAS_LVZ_EXTENSION > >>>> + help > >>>> + This changes the kernel so it can modify itself when it is run > >>>> + under a hypervisor, potentially improving performance > >>>> significantly > >>>> + over full virtualization. However, when run without a > >>>> hypervisor > >>>> + the kernel is theoretically slower and slightly larger. > >>>> + > >>>>config ARCH_SUPPORTS_KEXEC > >>>> def_bool y > >>>> > >>>> diff --git a/arch/loongarch/include/asm/kvm_para.h > >>>> b/arch/loongarch/include/asm/kvm_para.h > >>>> index 9425d3b7e486..41200e922a82 100644 > >>>> --- a/arch/loongarch/include/asm/kvm_para.h > >>>> +++ b/arch/loongarch/include/asm/kvm_para.h > >>>> @@ -2,6 +2,13 @@ > >>>>#ifndef _ASM_LOONGARCH_KVM_PARA_H > >>>>#define _ASM_LOONGARCH_KVM_PARA_H > >>>> > >>>> +/* > >>>> + * Hypcall code field > >>>> + */ > >>>> +#define HYPERVISOR_KVM 1 > >>>> +#define HYPERVISOR_VENDOR_SHIFT8 > >>>> +#define HYPERCALL_CODE(vendor, code) ((vendor << > >>>> HYPERVISOR_VENDOR_SHIFT) + code) > >>>> + > >>>>/* > >>>> * LoongArch hypcall return code > >>>> */ > >>>> diff --git a/arch/loongarch/include/asm/paravirt.h > >>>> b/arch/loongarch/include/asm/paravirt.h > >>>> new file mode 100644 > >>>> index ..b64813592ba0 > >>>> --- /dev/null > >>>> +++ b/arch/loongarch/include/asm/paravirt.h > >>>> @@ -0,0 +1,27 @@ > >>>> +/* SPDX-License-Identifier: GPL-2.0 */ > >>>> +#ifndef _ASM_LOONGARCH_PARAVIRT_H > >>>> +#define _ASM_LOONGARCH_PARAVIRT_H > >>>> + > >>>> +#ifdef CONFIG_PARAVIRT > >>>> +#include > &g
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 下午4:48, Huacai Chen wrote: On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } OK, then keep it. + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On Mon, Feb 19, 2024 at 12:11 PM maobibo wrote: > > > > On 2024/2/19 上午10:42, Huacai Chen wrote: > > Hi, Bibo, > > > > On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > >> > >> The patch adds paravirt interface for guest kernel, function > >> pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > >> runs on VM mode, it will call function kvm_para_available() to detect > >> whether current VMM is KVM hypervisor. And the paravirt function can work > >> only if current VMM is KVM hypervisor, since there is only KVM hypervisor > >> supported on LoongArch now. > >> > >> This patch only adds paravirt interface for guest kernel, however there > >> is not effective pv functions added here. > >> > >> Signed-off-by: Bibo Mao > >> --- > >> arch/loongarch/Kconfig| 9 > >> arch/loongarch/include/asm/kvm_para.h | 7 > >> arch/loongarch/include/asm/paravirt.h | 27 > >> .../include/asm/paravirt_api_clock.h | 1 + > >> arch/loongarch/kernel/Makefile| 1 + > >> arch/loongarch/kernel/paravirt.c | 41 +++ > >> arch/loongarch/kernel/setup.c | 2 + > >> 7 files changed, 88 insertions(+) > >> create mode 100644 arch/loongarch/include/asm/paravirt.h > >> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > >> create mode 100644 arch/loongarch/kernel/paravirt.c > >> > >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > >> index 10959e6c3583..817a56dff80f 100644 > >> --- a/arch/loongarch/Kconfig > >> +++ b/arch/loongarch/Kconfig > >> @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > >> bool > >> default y > >> > >> +config PARAVIRT > >> + bool "Enable paravirtualization code" > >> + depends on AS_HAS_LVZ_EXTENSION > >> + help > >> + This changes the kernel so it can modify itself when it is run > >> + under a hypervisor, potentially improving performance > >> significantly > >> + over full virtualization. However, when run without a hypervisor > >> + the kernel is theoretically slower and slightly larger. > >> + > >> config ARCH_SUPPORTS_KEXEC > >> def_bool y > >> > >> diff --git a/arch/loongarch/include/asm/kvm_para.h > >> b/arch/loongarch/include/asm/kvm_para.h > >> index 9425d3b7e486..41200e922a82 100644 > >> --- a/arch/loongarch/include/asm/kvm_para.h > >> +++ b/arch/loongarch/include/asm/kvm_para.h > >> @@ -2,6 +2,13 @@ > >> #ifndef _ASM_LOONGARCH_KVM_PARA_H > >> #define _ASM_LOONGARCH_KVM_PARA_H > >> > >> +/* > >> + * Hypcall code field > >> + */ > >> +#define HYPERVISOR_KVM 1 > >> +#define HYPERVISOR_VENDOR_SHIFT8 > >> +#define HYPERCALL_CODE(vendor, code) ((vendor << > >> HYPERVISOR_VENDOR_SHIFT) + code) > >> + > >> /* > >>* LoongArch hypcall return code > >>*/ > >> diff --git a/arch/loongarch/include/asm/paravirt.h > >> b/arch/loongarch/include/asm/paravirt.h > >> new file mode 100644 > >> index ..b64813592ba0 > >> --- /dev/null > >> +++ b/arch/loongarch/include/asm/paravirt.h > >> @@ -0,0 +1,27 @@ > >> +/* SPDX-License-Identifier: GPL-2.0 */ > >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H > >> +#define _ASM_LOONGARCH_PARAVIRT_H > >> + > >> +#ifdef CONFIG_PARAVIRT > >> +#include > >> +struct static_key; > >> +extern struct static_key paravirt_steal_enabled; > >> +extern struct static_key paravirt_steal_rq_enabled; > >> + > >> +u64 dummy_steal_clock(int cpu); > >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > >> + > >> +static inline u64 paravirt_steal_clock(int cpu) > >> +{ > >> + return static_call(pv_steal_clock)(cpu); > >> +} > > The steal time code can be removed in this patch, I think. > > > Originally I want to remove this piece of code, but it fails to compile > if CONFIG_PARAVIRT is selected. Here is reference code, function > paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. > > static __always_inline u64 steal_account_process_time(u64 maxtime) > { > #ifdef CON
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
On 2024/2/19 上午10:42, Huacai Chen wrote: Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} The steal time code can be removed in this patch, I think. Originally I want to remove this piece of code, but it fails to compile if CONFIG_PARAVIRT is selected. Here is reference code, function paravirt_steal_clock() must be defined if CONFIG_PARAVIRT is selected. static __always_inline u64 steal_account_process_time(u64 maxtime) { #ifdef CONFIG_PARAVIRT if (static_key_false(_steal_enabled)) { u64 steal; steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; steal = min(steal, maxtime); account_steal_time(steal); this_rq()->prev_steal_time += steal; return steal; } #endif return 0; } + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..21d0
Re: [PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
Hi, Bibo, On Thu, Feb 1, 2024 at 11:19 AM Bibo Mao wrote: > > The patch adds paravirt interface for guest kernel, function > pv_guest_initi() firstly checks whether system runs on VM mode. If kernel > runs on VM mode, it will call function kvm_para_available() to detect > whether current VMM is KVM hypervisor. And the paravirt function can work > only if current VMM is KVM hypervisor, since there is only KVM hypervisor > supported on LoongArch now. > > This patch only adds paravirt interface for guest kernel, however there > is not effective pv functions added here. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 > arch/loongarch/include/asm/kvm_para.h | 7 > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/paravirt.c | 41 +++ > arch/loongarch/kernel/setup.c | 2 + > 7 files changed, 88 insertions(+) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 10959e6c3583..817a56dff80f 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance significantly > + over full virtualization. However, when run without a hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/kvm_para.h > b/arch/loongarch/include/asm/kvm_para.h > index 9425d3b7e486..41200e922a82 100644 > --- a/arch/loongarch/include/asm/kvm_para.h > +++ b/arch/loongarch/include/asm/kvm_para.h > @@ -2,6 +2,13 @@ > #ifndef _ASM_LOONGARCH_KVM_PARA_H > #define _ASM_LOONGARCH_KVM_PARA_H > > +/* > + * Hypcall code field > + */ > +#define HYPERVISOR_KVM 1 > +#define HYPERVISOR_VENDOR_SHIFT8 > +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) > + code) > + > /* > * LoongArch hypcall return code > */ > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..b64813592ba0 > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} The steal time code can be removed in this patch, I think. > + > +int pv_guest_init(void); > +#else > +static inline int pv_guest_init(void) > +{ > + return 0; > +} > + > +#endif // CONFIG_PARAVIRT > +#endif > diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > b/arch/loongarch/include/asm/paravirt_api_clock.h > new file mode 100644 > index ..65ac7cee0dad > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt_api_clock.h > @@ -0,0 +1 @@ > +#include > diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile > index 3c808c680370..662e6e9de12d 100644 > --- a/arch/loongarch/kernel/Makefile > +++ b/arch/loongarch/kernel/Makefile > @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-$(CONFIG_PROC_FS) += proc.o > +obj-$(CONFIG_PARAVIRT) += paravirt.o > > obj-$(CONFIG_SMP) += smp.o > > diff --git a/arch/loongarch/kernel/paravirt.c > b/arch/loongarch/kernel/paravirt.c > new file mode 100644 > index ..21d01d05791a > --- /dev/null > +++ b/arch/loongarch/kernel/paravirt.c > @@ -0,0 +1,41 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > +#include > +#include
Re: [PATCH v2] x86/sgx: fix kernel-doc comment misuse
On Sun Feb 11, 2024 at 8:24 AM EET, Randy Dunlap wrote: > Don't use "/**" for a non-kernel-doc comment. This prevents a warning > from scripts/kernel-doc: > > main.c:740: warning: expecting prototype for A section metric is concatenated > in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() > instead > > Cc: Jarkko Sakkinen > Cc: Dave Hansen > Cc: linux-...@vger.kernel.org > Cc: x...@kernel.org > Reviewed-by: Kai Huang > Signed-off-by: Randy Dunlap > --- > v2: add Rev-by: Kai Huang > > arch/x86/kernel/cpu/sgx/main.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -731,7 +731,7 @@ out: > return 0; > } > > -/** > +/* > * A section metric is concatenated in a way that @low bits 12-31 define the > * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the > * metric. Reviewed-by: Jarkko Sakkinen BR, Jarkko
Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options
syzbot suspects this issue was fixed by commit: commit ad579864637af46447208254719943179b69d41a Author: Steven Rostedt (Google) Date: Tue Jan 2 20:12:49 2024 + tracefs: Check for dentry->d_inode exists in set_gid() bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17659d2418 start commit: 453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne.. git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4 dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e52409e8 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: tracefs: Check for dentry->d_inode exists in set_gid() For information about bisection process see: https://goo.gl/tpsmEJ#bisection
[PATCH v2] x86/sgx: fix kernel-doc comment misuse
Don't use "/**" for a non-kernel-doc comment. This prevents a warning from scripts/kernel-doc: main.c:740: warning: expecting prototype for A section metric is concatenated in a way that @low bits 12(). Prototype was for sgx_calc_section_metric() instead Cc: Jarkko Sakkinen Cc: Dave Hansen Cc: linux-...@vger.kernel.org Cc: x...@kernel.org Reviewed-by: Kai Huang Signed-off-by: Randy Dunlap --- v2: add Rev-by: Kai Huang arch/x86/kernel/cpu/sgx/main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -- a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -731,7 +731,7 @@ out: return 0; } -/** +/* * A section metric is concatenated in a way that @low bits 12-31 define the * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the * metric.
Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()
On Thu, Feb 01, 2024 at 03:27:54PM +0100, Marco Pagani wrote: > > On 2024-01-30 21:47, Luis Chamberlain wrote: > > > > It very much sounds like there is a desire to have this but without a > > user, there is no justification. > > I was working on a set of patches to fix an issue in the fpga subsystem > when I came across your commit 557aafac1153 ("kernel/module: add > documentation for try_module_get()") that made me realize we also had a > safety problem. > > To solve this problem for the fpga manager, we had to add a mutex to > ensure the low-level module still exists before calling > try_module_get(). However, having a safer version of try_module_get() > would have simplified the code and made it more robust against changes. > > https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/ > > I suspect there may be other cases where try_module_get() is > inadvertently called without ensuring that the module still exists > that may benefit from a safer implementation. Maybe so, however I'm not yet sure if this is safe from deadlocks. Please work on a series of selftest simple modules which demonstrate its use / and a simple bash script selftest loader which verifies this won't bust. Consider you may have third party modules which also race with this too, and other users without this new API. > >> +bool try_module_get_safe(struct module *module) > >> +{ > >> + struct module *mod; > >> + bool ret = true; > >> + > >> + if (!module) > >> + goto out; > >> + > >> + mutex_lock(_mutex); > > > > If a user comes around then this should be mutex_lock_interruptible(), > > and add might_sleep() > > Would it be okay to return false if it gets interrupted, or should I > change the return type to int to propagate -EINTR? My concern with > changing the signature is that it would be less straightforward to > use the function in place of try_module_get(). Since we want a safe mechanism we might as well not allow a simple drop in replacement but a more robust one so that users take care of the return value properly. Luis
Re: [PATCH] lib/test_kmod: fix kernel-doc warnings
On Fri, Nov 03, 2023 at 09:20:44PM -0700, Randy Dunlap wrote: > Fix all kernel-doc warnings in test_kmod.c: > - Mark some enum values as private so that kernel-doc is not needed > for them > - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments > - add kernel-doc info for @task_sync > > test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in > enum 'kmod_test_case' > test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum > 'kmod_test_case' > test_kmod.c:100: warning: Function parameter or member 'task_sync' not > described in 'kmod_test_device_info' > test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not > described in 'kmod_test_device' > > Signed-off-by: Randy Dunlap > Cc: Luis Chamberlain > Cc: linux-modu...@vger.kernel.org Applied and pushed, thanks! Luis
Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()
On 2024-01-30 21:47, Luis Chamberlain wrote: > On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote: >> The current implementation of try_module_get() requires the module to >> exist and be live as a precondition. While this may seem intuitive at >> first glance, enforcing the precondition can be tricky, considering that >> modules can be unloaded at any time if not previously taken. For >> instance, the caller could be preempted just before calling >> try_module_get(), and while preempted, the module could be unloaded and >> freed. More subtly, the module could also be unloaded at any point while >> executing try_module_get() before incrementing the refount with >> atomic_inc_not_zero(). >> >> Neglecting the precondition that the module must exist and be live can >> cause unexpected race conditions that can lead to crashes. However, >> ensuring that the precondition is met may require additional locking >> that increases the complexity of the code and can make it more >> error-prone. >> >> This patch adds a slower yet safer implementation of try_module_get() >> that checks if the module is valid by looking into the mod_tree before >> taking the module's refcount. This new function can be safely called on >> stale and invalid module pointers, relieving developers from the burden >> of ensuring that the module exists and is live before attempting to take >> it. >> >> The tree lookup and refcount increment are executed after taking the >> module_mutex to prevent the module from being unloaded after looking up >> the tree. >> >> Signed-off-by: Marco Pagani > > It very much sounds like there is a desire to have this but without a > user, there is no justification. I was working on a set of patches to fix an issue in the fpga subsystem when I came across your commit 557aafac1153 ("kernel/module: add documentation for try_module_get()") that made me realize we also had a safety problem. To solve this problem for the fpga manager, we had to add a mutex to ensure the low-level module still exists before calling try_module_get(). However, having a safer version of try_module_get() would have simplified the code and made it more robust against changes. https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/ I suspect there may be other cases where try_module_get() is inadvertently called without ensuring that the module still exists that may benefit from a safer implementation. >> +bool try_module_get_safe(struct module *module) >> +{ >> +struct module *mod; >> +bool ret = true; >> + >> +if (!module) >> +goto out; >> + >> +mutex_lock(_mutex); > > If a user comes around then this should be mutex_lock_interruptible(), > and add might_sleep() Would it be okay to return false if it gets interrupted, or should I change the return type to int to propagate -EINTR? My concern with changing the signature is that it would be less straightforward to use the function in place of try_module_get(). >> + >> +/* >> + * Check if the address points to a valid live module and take >> + * the refcount only if it points to the module struct. >> + */ >> +mod = __module_address((unsigned long)module); >> +if (mod && mod == module && module_is_live(mod)) >> +__module_get(mod); >> +else >> +ret = false; >> + >> +mutex_unlock(_mutex); >> + >> +out: >> +return ret; >> +} >> +EXPORT_SYMBOL(try_module_get_safe); > > And EXPORT_SYMBOL_GPL() would need to be used. Okay, I initially used EXPORT_SYMBOL() to be compatible with try_module_get(). > > I'd also expect selftests to be expanded for this case, but again, > without a user, this is just trying to resolve a problem which does not > exist. I can add selftests in the next versions. Thanks, Marco
Re: [PATCH] lib/test_kmod: fix kernel-doc warnings
Hi, Any comments on this patch? Thanks. On 11/3/23 21:20, Randy Dunlap wrote: > Fix all kernel-doc warnings in test_kmod.c: > - Mark some enum values as private so that kernel-doc is not needed > for them > - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments > - add kernel-doc info for @task_sync > > test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in > enum 'kmod_test_case' > test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum > 'kmod_test_case' > test_kmod.c:100: warning: Function parameter or member 'task_sync' not > described in 'kmod_test_device_info' > test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not > described in 'kmod_test_device' > > Signed-off-by: Randy Dunlap > Cc: Luis Chamberlain > Cc: linux-modu...@vger.kernel.org > --- > lib/test_kmod.c |6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff -- a/lib/test_kmod.c b/lib/test_kmod.c > --- a/lib/test_kmod.c > +++ b/lib/test_kmod.c > @@ -58,11 +58,14 @@ static int num_test_devs; > * @need_mod_put for your tests case. > */ > enum kmod_test_case { > + /* private: */ > __TEST_KMOD_INVALID = 0, > + /* public: */ > > TEST_KMOD_DRIVER, > TEST_KMOD_FS_TYPE, > > + /* private: */ > __TEST_KMOD_MAX, > }; > > @@ -82,6 +85,7 @@ struct kmod_test_device; > * @ret_sync: return value if request_module() is used, sync request for > * @TEST_KMOD_DRIVER > * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE > + * @task_sync: kthread's task_struct or %NULL if not running > * @thread_idx: thread ID > * @test_dev: test device test is being performed under > * @need_mod_put: Some tests (get_fs_type() is one) requires putting the > module > @@ -108,7 +112,7 @@ struct kmod_test_device_info { > * @dev: pointer to misc_dev's own struct device > * @config_mutex: protects configuration of test > * @trigger_mutex: the test trigger can only be fired once at a time > - * @thread_lock: protects @done count, and the @info per each thread > + * @thread_mutex: protects @done count, and the @info per each thread > * @done: number of threads which have completed or failed > * @test_is_oom: when we run out of memory, use this to halt moving forward > * @kthreads_done: completion used to signal when all work is done -- #Randy
[PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel
The patch adds paravirt interface for guest kernel, function pv_guest_initi() firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. This patch only adds paravirt interface for guest kernel, however there is not effective pv functions added here. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..21d01d05791a --- /dev/null +++ b/arch/loongarch/kernel/paravirt.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); + +static bool kvm_para_available(void) +{ + static int hypervisor_type; + int config; + + if (!hypervisor_type) { + config = read_cpucfg(CPUCFG_KVM_SIG); + if (!memcmp(, KVM_SIGNATURE, 4)) + hypervisor_type = HYPERVISOR_KVM; + } + + return hypervisor_type == HYPERVISOR_KVM; +} + +int __init pv_guest_init(void) +{ + if (!cpu_has_hypervisor) + return 0; + if (!kvm_para_available()) +
Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()
On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote: > The current implementation of try_module_get() requires the module to > exist and be live as a precondition. While this may seem intuitive at > first glance, enforcing the precondition can be tricky, considering that > modules can be unloaded at any time if not previously taken. For > instance, the caller could be preempted just before calling > try_module_get(), and while preempted, the module could be unloaded and > freed. More subtly, the module could also be unloaded at any point while > executing try_module_get() before incrementing the refount with > atomic_inc_not_zero(). > > Neglecting the precondition that the module must exist and be live can > cause unexpected race conditions that can lead to crashes. However, > ensuring that the precondition is met may require additional locking > that increases the complexity of the code and can make it more > error-prone. > > This patch adds a slower yet safer implementation of try_module_get() > that checks if the module is valid by looking into the mod_tree before > taking the module's refcount. This new function can be safely called on > stale and invalid module pointers, relieving developers from the burden > of ensuring that the module exists and is live before attempting to take > it. > > The tree lookup and refcount increment are executed after taking the > module_mutex to prevent the module from being unloaded after looking up > the tree. > > Signed-off-by: Marco Pagani It very much sounds like there is a desire to have this but without a user, there is no justification. > +bool try_module_get_safe(struct module *module) > +{ > + struct module *mod; > + bool ret = true; > + > + if (!module) > + goto out; > + > + mutex_lock(_mutex); If a user comes around then this should be mutex_lock_interruptible(), and add might_sleep() > + > + /* > + * Check if the address points to a valid live module and take > + * the refcount only if it points to the module struct. > + */ > + mod = __module_address((unsigned long)module); > + if (mod && mod == module && module_is_live(mod)) > + __module_get(mod); > + else > + ret = false; > + > + mutex_unlock(_mutex); > + > +out: > + return ret; > +} > +EXPORT_SYMBOL(try_module_get_safe); And EXPORT_SYMBOL_GPL() would need to be used. I'd also expect selftests to be expanded for this case, but again, without a user, this is just trying to resolve a problem which does not exist. Luis
[RFC PATCH] kernel/module: add a safer implementation of try_module_get()
The current implementation of try_module_get() requires the module to exist and be live as a precondition. While this may seem intuitive at first glance, enforcing the precondition can be tricky, considering that modules can be unloaded at any time if not previously taken. For instance, the caller could be preempted just before calling try_module_get(), and while preempted, the module could be unloaded and freed. More subtly, the module could also be unloaded at any point while executing try_module_get() before incrementing the refount with atomic_inc_not_zero(). Neglecting the precondition that the module must exist and be live can cause unexpected race conditions that can lead to crashes. However, ensuring that the precondition is met may require additional locking that increases the complexity of the code and can make it more error-prone. This patch adds a slower yet safer implementation of try_module_get() that checks if the module is valid by looking into the mod_tree before taking the module's refcount. This new function can be safely called on stale and invalid module pointers, relieving developers from the burden of ensuring that the module exists and is live before attempting to take it. The tree lookup and refcount increment are executed after taking the module_mutex to prevent the module from being unloaded after looking up the tree. Signed-off-by: Marco Pagani --- include/linux/module.h | 15 +++ kernel/module/main.c | 27 +++ 2 files changed, 42 insertions(+) diff --git a/include/linux/module.h b/include/linux/module.h index 08364d5cbc07..86b6ea43d204 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -695,6 +695,19 @@ extern void __module_get(struct module *module); */ extern bool try_module_get(struct module *module); +/** + * try_module_get_safe() - safely take the refcount of a module. + * @module: address of the module to be taken. + * + * Safer version of try_module_get(). Check first if the module exists and is alive, + * and then take its reference count. + * + * Return: + * * %true - module exists and its refcount has been incremented or module is NULL. + * * %false - module does not exist. + */ +extern bool try_module_get_safe(struct module *module); + /** * module_put() - release a reference count to a module * @module: the module we should release a reference count for @@ -815,6 +828,8 @@ static inline bool try_module_get(struct module *module) return true; } +#define try_module_get_safe(module) try_module_get(module) + static inline void module_put(struct module *module) { } diff --git a/kernel/module/main.c b/kernel/module/main.c index 98fedfdb8db5..22376b69778c 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -842,6 +842,33 @@ bool try_module_get(struct module *module) } EXPORT_SYMBOL(try_module_get); +bool try_module_get_safe(struct module *module) +{ + struct module *mod; + bool ret = true; + + if (!module) + goto out; + + mutex_lock(_mutex); + + /* +* Check if the address points to a valid live module and take +* the refcount only if it points to the module struct. +*/ + mod = __module_address((unsigned long)module); + if (mod && mod == module && module_is_live(mod)) + __module_get(mod); + else + ret = false; + + mutex_unlock(_mutex); + +out: + return ret; +} +EXPORT_SYMBOL(try_module_get_safe); + void module_put(struct module *module) { int ret; base-commit: 4515d08a742c76612b65d2f47a87d12860519842 -- 2.43.0
[PATCH v3 4/6] LoongArch: Add paravirt interface for guest kernel
The patch add paravirt interface for guest kernel, function pv_guest_init firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, since there is only KVM hypervisor supported on LoongArch now. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 10959e6c3583..817a56dff80f 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..21d01d05791a --- /dev/null +++ b/arch/loongarch/kernel/paravirt.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); + +static bool kvm_para_available(void) +{ + static int hypervisor_type; + int config; + + if (!hypervisor_type) { + config = read_cpucfg(CPUCFG_KVM_SIG); + if (!memcmp(, KVM_SIGNATURE, 4)) + hypervisor_type = HYPERVISOR_KVM; + } + + return hypervisor_type == HYPERVISOR_KVM; +} + +int __init pv_guest_init(void) +{ + if (!cpu_has_hypervisor) + return 0; + if (!kvm_para_available()) + return 0; + + return 1; +} diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c in
BUG: unable to handle kernel paging request in __skb_flow_dissect
Hello. We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. Recently, our team has discovered a issue in Linux kernel 6.7.0-g052d534373b7. Attached to the email were a POC file of the issue. Stack dump: [ 185.664167][ T8332] BUG: unable to handle page fault for address: ed1029c40001 [ 185.665134][ T8332] #PF: supervisor read access in kernel mode [ 185.665877][ T8332] #PF: error_code(0x) - not-present page [ 185.666481][ T8332] PGD 7ffd0067 P4D 7ffd0067 PUD 3fff5067 PMD 0 [ 185.667129][ T8332] Oops: [#1] PREEMPT SMP KASAN [ 185.667719][ T8332] CPU: 1 PID: 8332 Comm: poc Not tainted 6.7.0-g052d534373b7 #19 [ 185.668641][ T8332] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 185.669639][ T8332] RIP: 0010:__skb_flow_dissect (net/core/flow_dissector.c:1170 (discriminator 1)) [ 185.682210][ T8332] Call Trace: [ 185.682595][ T8332] [ 185.717256][ T8332] __skb_get_hash (net/core/flow_dissector.c:1737 net/core/flow_dissector.c:1770 net/core/flow_dissector.c:1794 net/core/flow_dissector.c:1856) [ 185.721978][ T8332] ip_tunnel_xmit (./include/linux/skbuff.h:1566 net/ipv4/ip_tunnel.c:748) [ 185.727788][ T8332] ipip_tunnel_xmit (net/ipv4/ipip.c:308) [ 185.728396][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 net/core/dev.c:3547 net/core/dev.c:3563) [ 185.729082][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 net/core/dev.c:4352) [ 185.736814][ T8332] neigh_connected_output (./include/linux/netdevice.h:3171 net/core/neighbour.c:1592) [ 185.737536][ T8332] ip_finish_output2 (./include/net/neighbour.h:542 net/ipv4/ip_output.c:235) [ 185.742239][ T8332] __ip_finish_output (net/ipv4/ip_output.c:313 net/ipv4/ip_output.c:295) [ 185.742943][ T8332] ip_finish_output (net/ipv4/ip_output.c:323) [ 185.743556][ T8332] ip_mc_output (./include/linux/netfilter.h:303 net/ipv4/ip_output.c:420) [ 185.744137][ T8332] ip_local_out (./include/net/dst.h:451 net/ipv4/ip_output.c:129) [ 185.744746][ T8332] iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84 (discriminator 4)) [ 185.745390][ T8332] ip_tunnel_xmit (net/ipv4/ip_tunnel.c:833) [ 185.750430][ T8332] dev_hard_start_xmit (./include/linux/netdevice.h:5004 net/core/dev.c:3547 net/core/dev.c:3563) [ 185.751114][ T8332] __dev_queue_xmit (./include/linux/netdevice.h:3367 net/core/dev.c:4352) [ 185.759138][ T8332] __bpf_redirect (./include/linux/netdevice.h:3367 net/core/filter.c:2136 net/core/filter.c:2165 net/core/filter.c:2188) [ 185.759757][ T8332] bpf_clone_redirect (net/core/filter.c:2459 net/core/filter.c:2431) [ 185.761088][ T8332] ___bpf_prog_run (kernel/bpf/core.c:1986) [ 185.762499][ T8332] __bpf_prog_run512 (kernel/bpf/core.c:2227) [ 185.778478][ T8332] bpf_test_run (./include/linux/bpf.h:1231 ./include/linux/filter.h:651 ./include/linux/filter.h:658 net/bpf/test_run.c:423) [ 185.783715][ T8332] bpf_prog_test_run_skb (net/bpf/test_run.c:1057) [ 185.786538][ T8332] __sys_bpf (kernel/bpf/syscall.c:4107 kernel/bpf/syscall.c:5475) [ 185.793454][ T8332] __x64_sys_bpf (kernel/bpf/syscall.c:5559) [ 185.794810][ T8332] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) [ 185.795399][ T8332] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 185.796182][ T8332] RIP: 0033:0x7f4f8955df29 Analyze of the issue: The issue code in the __skb_flow_dissect function(net/core/flow_dissector.c:1170). The code are blow: iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph); if (!iph || iph->ihl < 5) { fdret = FLOW_DISSECT_RET_OUT_BAD; break; } It looks like the function __skb_header_pointer will return a invalid address, and iph->ihl will read the invalid address to get value. So, I think the issue is lack of check the iph is valid or no. Thank you for taking the time to read this email and we look forward to working with you further. Ubisectech Sirius Team Web: www.ubisectech.com Email: bugrep...@ubisectech.com 横板竖版组合LOGO_画板 1.png Description: Binary data poc.c Description: Binary data
[PATCH v2 4/6] LoongArch: Add paravirt interface for guest kernel
The patch add paravirt interface for guest kernel, function pv_guest_init firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available to detect whether current VMM is KVM hypervisor. And the paravirt function can work only if current VMM is KVM hypervisor, and there is only KVM hypervisor supported on LoongArch now. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 arch/loongarch/include/asm/kvm_para.h | 7 arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/paravirt.c | 41 +++ arch/loongarch/kernel/setup.c | 2 + 7 files changed, 88 insertions(+) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index ee123820a476..d8ccaf46a50d 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -564,6 +564,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h index 9425d3b7e486..41200e922a82 100644 --- a/arch/loongarch/include/asm/kvm_para.h +++ b/arch/loongarch/include/asm/kvm_para.h @@ -2,6 +2,13 @@ #ifndef _ASM_LOONGARCH_KVM_PARA_H #define _ASM_LOONGARCH_KVM_PARA_H +/* + * Hypcall code field + */ +#define HYPERVISOR_KVM 1 +#define HYPERVISOR_VENDOR_SHIFT8 +#define HYPERCALL_CODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code) + /* * LoongArch hypcall return code */ diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..b64813592ba0 --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_guest_init(void); +#else +static inline int pv_guest_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3c808c680370..662e6e9de12d 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c new file mode 100644 index ..21d01d05791a --- /dev/null +++ b/arch/loongarch/kernel/paravirt.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct static_key paravirt_steal_enabled; +struct static_key paravirt_steal_rq_enabled; + +static u64 native_steal_clock(int cpu) +{ + return 0; +} + +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock); + +static bool kvm_para_available(void) +{ + static int hypervisor_type; + int config; + + if (!hypervisor_type) { + config = read_cpucfg(CPUCFG_KVM_SIG); + if (!memcmp(, KVM_SIGNATURE, 4)) + hypervisor_type = HYPERVISOR_KVM; + } + + return hypervisor_type == HYPERVISOR_KVM; +} + +int __init pv_guest_init(void) +{ + if (!cpu_has_hypervisor) + return 0; + if (!kvm_para_available()) + return 0; + + return 1; +} diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index d183a745fb
Re: [syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options
On Wed, 03 Jan 2024 13:41:31 -0800 syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne.. > git tree: upstream > console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8 > kernel config: https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4 > dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) > 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e52409e8 > > Downloadable assets: > disk image: > https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz > vmlinux: > https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz > kernel image: > https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz > > The issue was bisected to: > > commit 7e8358edf503e87236c8d07f69ef0ed846dd5112 > Author: Steven Rostedt (Google) > Date: Fri Dec 22 00:07:57 2023 + > > eventfs: Fix file and directory uid and gid ownership > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8 > final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8 > console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com > Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership") > > BUG: unable to handle page fault for address: fff0 > #PF: supervisor read access in kernel mode > #PF: error_code(0x) - not-present page > PGD d734067 P4D d734067 PUD d736067 PMD 0 > Oops: [#1] PREEMPT SMP KASAN > CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted > 6.7.0-rc7-syzkaller-00049-g453f5db0619e #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 11/17/2023 > RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline] > RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337 > Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 > 00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 > 83 e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75 > RSP: 0018:c900040ffca8 EFLAGS: 00010246 > RAX: 1ffe RBX: fff0 RCX: 888014bf5940 > RDX: RSI: 0004 RDI: c900040ffc20 > RBP: dc00 R08: 0003 R09: f5200081ff84 > R10: dc00 R11: f5200081ff84 R12: 88801d743888 > R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810 > FS: 557dd480() GS:8880b980() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: fff0 CR3: 1ec48000 CR4: 003506f0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > > tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353 > reconfigure_super+0x440/0x870 fs/super.c:1143 > do_remount fs/namespace.c:2884 [inline] This is the same bug that was fixed by: https://lore.kernel.org/linux-trace-kernel/20240102151249.05da2...@gandalf.local.home/ And just waiting to be applied: https://lore.kernel.org/all/20240102210731.1f1c5...@gandalf.local.home/ Thanks, -- Steve > path_mount+0xc24/0xfa0 fs/namespace.c:3656 > do_mount fs/namespace.c:3677 [inline] > __do_sys_mount fs/namespace.c:3886 [inline] > __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x63/0x6b > RIP: 0033:0x7fec326e8d99 > Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5 > RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99 > RDX: RSI: 20c0 RDI: > RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80 > R10: 02200022 R11: 0246 R12: > R13: 7ffc8103e068 R14: 0001 R15: 0001 > > Modules linked in: > CR2: fff0 > ---[ end trace ]--- > RIP: 0010:set_gid fs/tracefs/inode.c:224
[syzbot] [fs?] [trace?] BUG: unable to handle kernel paging request in tracefs_apply_options
Hello, syzbot found the following issue on: HEAD commit:453f5db0619e Merge tag 'trace-v6.7-rc7' of git://git.kerne.. git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=10ec3829e8 kernel config: https://syzkaller.appspot.com/x/.config?x=f8e72bae38c079e4 dashboard link: https://syzkaller.appspot.com/bug?extid=f8a023e0c6beabe2371a compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1414af31e8 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e52409e8 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/38b92a7149e8/disk-453f5db0.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/4f872267133f/vmlinux-453f5db0.xz kernel image: https://storage.googleapis.com/syzbot-assets/587572061791/bzImage-453f5db0.xz The issue was bisected to: commit 7e8358edf503e87236c8d07f69ef0ed846dd5112 Author: Steven Rostedt (Google) Date: Fri Dec 22 00:07:57 2023 + eventfs: Fix file and directory uid and gid ownership bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=108cd519e8 final oops: https://syzkaller.appspot.com/x/report.txt?x=128cd519e8 console output: https://syzkaller.appspot.com/x/log.txt?x=148cd519e8 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+f8a023e0c6beabe23...@syzkaller.appspotmail.com Fixes: 7e8358edf503 ("eventfs: Fix file and directory uid and gid ownership") BUG: unable to handle page fault for address: fff0 #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD d734067 P4D d734067 PUD d736067 PMD 0 Oops: [#1] PREEMPT SMP KASAN CPU: 0 PID: 5056 Comm: syz-executor170 Not tainted 6.7.0-rc7-syzkaller-00049-g453f5db0619e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline] RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337 Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75 RSP: 0018:c900040ffca8 EFLAGS: 00010246 RAX: 1ffe RBX: fff0 RCX: 888014bf5940 RDX: RSI: 0004 RDI: c900040ffc20 RBP: dc00 R08: 0003 R09: f5200081ff84 R10: dc00 R11: f5200081ff84 R12: 88801d743888 R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810 FS: 557dd480() GS:8880b980() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: fff0 CR3: 1ec48000 CR4: 003506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: tracefs_remount+0x78/0x80 fs/tracefs/inode.c:353 reconfigure_super+0x440/0x870 fs/super.c:1143 do_remount fs/namespace.c:2884 [inline] path_mount+0xc24/0xfa0 fs/namespace.c:3656 do_mount fs/namespace.c:3677 [inline] __do_sys_mount fs/namespace.c:3886 [inline] __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3863 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fec326e8d99 Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7ffc8103ddf8 EFLAGS: 0246 ORIG_RAX: 00a5 RAX: ffda RBX: 7ffc8103de00 RCX: 7fec326e8d99 RDX: RSI: 20c0 RDI: RBP: 7ffc8103de08 R08: 2140 R09: 7fec326b5b80 R10: 02200022 R11: 0246 R12: R13: 7ffc8103e068 R14: 0001 R15: 0001 Modules linked in: CR2: fff0 ---[ end trace ]--- RIP: 0010:set_gid fs/tracefs/inode.c:224 [inline] RIP: 0010:tracefs_apply_options+0x4d0/0xa40 fs/tracefs/inode.c:337 Code: 24 10 49 8b 1e 48 83 c3 f0 74 3d 48 89 d8 48 c1 e8 03 48 bd 00 00 00 00 00 fc ff df 80 3c 28 00 74 08 48 89 df e8 70 ff 88 fe <48> 8b 1b 48 89 de 48 83 e6 02 31 ff e8 bf fe 2c fe 48 83 e3 02 75 RSP: 0018:c900040ffca8 EFLAGS: 00010246 RAX: 1ffe RBX: fff0 RCX: 888014bf5940 RDX: RSI: 0004 RDI: c900040ffc20 RBP: dc00 R08: 0003 R09: f5200081ff84 R10: dc00 R11: f5200081ff84 R12: 88801d743888 R13: 88801b0c3710 R14: 88801d7437e8 R15: 88801d743810 FS: 557dd480() GS:8880b980() knlGS: CS: 0010 DS: ES: 00